Architect a Datastream app( pipe and filters approach)

Mohaned Mashaly
3 min readJul 7, 2021

In this article, we will discuss how to design datastream applications or other applications using pipes and filters patterns.

Pipe and filters architecture break every module into series of steps, where each step is called a pipe and each pipe is connected to the next pipe through a channel called filter, each filter is an independent processing unit that can have separate computational power and memory.

We will talk in the upcoming sections about the advantages of the Pipe and filters pattern compared to a monolithic architecture.

Fig1.1 Pipe and filter pattern

Fig 1.1 shows the pipe and filter pattern. It is easier to think of this figure as a preprocessing operation for an application, where filter 1 is the first step of preprocessing and filter 2 is the second step of preprocessing and so on, Lets say that this preprocessing pipeline is for a Natural Language Processing model, processing millions of data points encoded as text, the first filter tokenizes all these data points into tokens, the second filter removes stop words among the tokens, and the last filter lemmatize the text.

Applying pipe and filters pattern on the preprocessing stage of the Natural Language Processing model

Why are we doing this, why do we choose this pattern over other more familiar patterns like monolithic or microservice. Despite that, some of these patterns are well known in the businesses world and sounds more fashionable.

Advantages of Pipe and filter pattern

  1. It provides more flexibility in terms of computational power and memory compared to Monolithic, every filter is like a single unit with its independent computational resources and memory, for example, tokenization is an intensive operation in terms of complexity compared to removing stop words, giving all functions or filters the same resources like a monolithic architecture do instead of assigning each filter different resources based on it needs.
  2. It can save the efforts of the development team, different teams can work on different filters at the same time, since there is no dependency between the filters and the communication between pipelines is easy and straightforwad and not specific to the functionality of the filters or preprocessing functions, unlike monolithic where communication is hardcoded.
  3. It achieves sepration of concern and high cohesion which is a good practice in the software engineering.

And there are many more advantages and disadvantages to this architectural approach. I was scratching the surface and introduce it to those who are not familiar with it, and in the end, I would like to say there is nothing such as a perfect solution or a solution that fits all problems. It depends on your problem and other factors to determine the best solution for you.

References

--

--

Mohaned Mashaly

Loves Computer Science with focused interest in (Back-end Development, Data Structures and Algorithms ,Machine Learning)