A Pattern for Consolidating Data from Multiple Operational Environments


July 28, 2020  |  By Aaron Watkins

In a previous article, I spoke about a common pattern that we see implemented on the Reekoh Accelerate™ platform, which lands data in an unstructured database prior to secondary processing and subsequent application integration. I spoke about how the raw data collected from operational sensors may lack standardisation and could benefit from transformation into a series of domain based Common Data Formats (CDFs) prior to storage and processing.

This led us to a pipeline that was designed along the lines of the following:


Original Pipeline Design

However, many customers in this sort of environment also operate multiple sites (eg. multiple factories), the number of sites for which organically grew over the years and led to different versions or even different vendors for both SCADA systems and relevant sensing devices. For situations like an agricultural group, it’s possible that the sites (ie. the farms) may all be owned by different entities and the technology involved could be extremely disparate.

Depending on the level of complexity, a range of options are possible. The simplest approach is to just add an extra ingress point into the existing pipeline along the lines of the following:

Expanding ingress

However, an alternative approach also lays the groundwork for later operational efficiencies should the business wish to augment this normalised data set with that from IoT sensors or some other data points. An integration pattern I’ve used before is to separate the production of normalised data from the consumption of normalised data by putting these different components into separate pipelines and facilitating communication between the two.

For our use case of multi-producer, single-consumer, we might do this through a webhook on the production side and a HTTP Gateway on the consuming side. Of course, you could use your preferred communication medium, but HTTP suffices for this discussion.

For multi-producer, multi-consumer systems, a pub-sub mechanism as the communication medium is particularly effective, which could be achieved with our MQTT Gateway operating in cloud broker mode. More details on such an approach will come in a later article.

But looking at the multi-producer, single-consumer approach, we may end up with a design like the following, with the egress pipeline effectively acting as a shared continuation of the various ingress pipelines.

Separating ingress from egress

Looking at this, there is some reduction in duplication – that outbound database connection is only defined once. But the real value becomes evident if you think of that intermediate link as a type of shared bus and you enable the business to use it, but in a controlled manner.

Consider the fact that many businesses desire to start experimenting with new technologies, whether that be new versions of existing technologies (like water pumps) or completely new sources of data that might augment the existing data set (perhaps allowing greater insights in later machine learning models or other analytics). As a technology department within the business, it would be great to leverage the work that you have done to date to enable this experimentation, but without significant overhead to your team.

The next post in this series will explore approaches a technique that can help you operationalise this experimentation, whilst at the same time minimising effort in your core team for supporting these initiatives.

This article was originally posted on LinkedIn on April 17, 2020.