How Does Inherent Parallelism in Processing Improve Throughput

From the very beginning of developing the product, our idea has always been to develop something which is flexible, extensible, and user-friendly. But if you think of integration between disparate and diverse applications, joining them together is not an easy task. You need a BPA tool with advanced features that can enable you to develop faster and better integration between applications. One such capability which you can use during execution is Parallelism.

What is Parallelism?

Parallelism is a concept where a process is broken into multiple smaller chunks such that each of the processes can work in parallel to perform a bigger task. For example, let us suppose you want to sync 1000 data at one point in time. Now, you cannot download all the data at a time, and it requires you to create batches, such that it downloads each chunk one after another. Now if you make a batch of 100 items, it requires the application to download 10 times. Here we need a loop. But if it is done sequentially, then that means it will first download the first 100 items, process them, and then start processing the second batch, process the next 100 items, and so on. Without Parallelism, if you implement such a concept, your data processing throughput will be much slower.

If you consider our example, the Sequential Execution of a process will look like the diagram above, where only after batch one is completed, the next batch will start. But if we consider Parallel Execution, the example will look like the diagram below.

Now here in Parallel Execution, if you see how your process is getting executed, you will notice, that it will take a much less amount of time to process all the batches. The batches, once downloaded and split into smaller chunks are executed in parallel and the sync operation is completed much quicker.

How do we Achieve Parallelism in ProcessFlow?

When working with ProcessFlow, we allow users to implement Parallel Execution / Parallelism without knowing the underlying complexity. The ProcessFlow allows data to be freely flown from one node to another using links. Links act as branches, which means if you have two branches, the data will flow to both pipelines one after another. But there is a special link that allows you to configure parallel execution, which is a self-loop. It helps the ProcessFlow parallelize the same pipeline multiple times.

Here the ProcessFlow will execute the Magento node repeatedly using the Web API. The power of the self-loop is that, after every Magento data download, the pipeline which contains the rest of the nodes, like Mapper and SAP B1 will be executed in parallel.

Even though for the users it might seem very simple but using a self-loop will parallelize the node executions automatically.

How does Parallelism Actually Work?

If you think of ProcessFlow, it is a program that is built using tools. Each Node in ProcessFlow has an inherent implementation and does special operations on the data it receives and passes it to the linked branches. Now if you think internally, a program is run by its main thread. To understand how a program works, let us see the diagram below.

So, a typical program starts with an Entry Point, where it finds a specific function present in the code (in most cases it is called Main), and then it creates a Thread to execute that function. The function in turn executes the code sequentially until it reaches the end of the function. When the function ends, the program terminates.

Like that concept, a ProcessFlow design creates a script that starts from the Start Node (which is our Entry Point), and then executes all the nodes in the flow and finally, it reaches the End node to terminate the process. The ProcessFlow also uses Thread, as in the real world no code can be executed without a CPU Thread.

So, here when the ProcessFlow Engine executes, it first starts with the Start Node which acts as an Entry Point. After that, it goes on to execute the Magento Node, which downloads the Magento data, from iteration 1 to N.

If you notice here, the Main Thread is keeping on executing the Magento Node only, and it ends when all the data is downloaded successfully. Every time it gets an instance of data, it puts it into storage and triggers a message into the queues as a work item for the Thread pool to pick up, which is then transferred to the Worker Threads through the Dispatcher. Here, you can see we have 3 threads that are running in parallel to process the data sent from the first three Magento iterations. All other work items are waiting for the Thread to finish.

Now, what happens if there are no Thread Pools available to execute 3 jobs in parallel?

As it is queued, if there are only 2 cores available for processing, then you might end up executing Thread 1 and Thread 2 only.

Here, for a 2 Core machine, ideally, the Thread Pool should process physically on both threads in parallel (Though we would say it is not that simple because logically programs can Thread switch). In such a case, when both Thread1 and Thread2 are getting executed, the third batch of Magento data will wait until Thread1 finishes execution.

This way, ProcessFlow gives you maximum throughput and makes the identification of bottlenecks during a ProcessFlow execution obvious.

Conclusion

Now, that you know how a ProcessFlow Engine takes care of your execution units, you can optimally define self-loops to reduce bottlenecks in your data executions. You can not only apply a loop on data-generated nodes like App nodes, but you can also apply it to other nodes where you want to process multiple data sets.

Remember, as an implementer, putting everything in separate threads will also kill performance. You need to think and implement your processes in such a way so that you are optimally using the physical resources.

Thank you for reading. Let us know your feedback.

If you have multiple systems employed in your business and want to seamlessly integrate all the data and processes between them, connect them under one single platform with our hybrid iPaaS APPSeCONNECT. Automate business processes and improve your productivity and efficiency with us!

Go live instantly!

How Does Inherent Parallelism in Processing Improve Throughput?

INSIDE THIS ARTICLE

What is Parallelism?

How do we Achieve Parallelism in ProcessFlow?

How does Parallelism Actually Work?

Conclusion

Let’s start integrating!