venerdì 22 luglio 2016

Basics of Parallelization – Part One 0 Comments

parallel processing
Basic tricks for parallelization code to take advantage of multiple cores and processors
In today’s multicore systems, many programmers are leaving performance on the table. This multipart series shows how to get the most out of your hardware with multithreading and vectorization. Read on to see how Slashdot Media Contributing Editor Rick Leinecker gets started with parallelization basics.
Basics of Parallelization – Part 1
Rick Leinecker, July 2016
This blog begins a discussion of parallelization by showing one of the most basic tricks for parallelization, and that is multithreaded processing of data sets. The next part 2 blog will introduce the next most important aspect of parallelization, that of vectorization.
Let’s say that you have a large data array, maybe of 20,000,000 elements that you need to perform some gnarly math on. The calculations might be an aggregation of transcendental functions such as cosine and tangent, with addition functions such as square root. In many cases this isn’t a problem since programs customarily take time to crunch numbers, and users are used to waiting for such number crunching. But let’s say that the execution time of the math is important for some reason. Maybe it provides a degraded user experience, or maybe it is happening on a server where a client piece of software has to wait. In these cases it is imperative to reduce the calculation time.
Multithreading isn’t new, but applying it in the context of parallelization is fairly new. Processing a list with multiple threads is fairly straight forward. It is a matter of spinning up several threads, dividing the task up amongst the threads, kicking them off to perform their respective tasks, and synchronize by waiting for all threads to complete.
For an example, let’s take the previous example of a data array with 20,000,000 elements as follows.
Capture1a
The next part of the process is to spin up multiple threads to deal with the processing. In most cases, the number of threads is dependent on the available resources. For instance, if a system has two cores, then two is the largest effective value to create. If a system has eight cores, it is possible that creating 6, 7, or 8 cores is the best recommendation. The algorithms to determine how many threads to create based on the resources is fairly complex, but is take care of by using parallelization techniques such as OpenMP, Cilk, or Threaded Building Blocks.
Once the threads are created, they are kicked off in order for them to each perform their assigned processing. The following figure illustrates the process.
Capture1
Figure 1: The loop parallelization shown here uses four distinct processes to split up a sequential task.
Finally, a synchronization mechanism waits for all threads to complete. This is important because one thread may linger, taking more time than the others. If the main application thread is counting on all calculations to be complete while one of the threads is still working, then the application may not function correctly.
Adding loop parallelization is easier than you probably think. It’s not a matter of writing a lot of thread code and thread procedures. The easiest way I have found is to use OpenMP, which is based on an industry-wide standard. Start by considering the following code.
Capture2
When I ran this in release mode, the loop executed in 2587 milliseconds. Note that this code is completely sequential, and has no parallelization. In order to use the OpenMP approach, all you need to do is decorate the loop with a special pragma directive as follows, and the compiler does its magic behind the scenes.
Capture3
Note: In Visual Studio, I set for OpenMP by opening the project properties, drilling down in the C/C++ section, selecting the language section, and finally selecting Yes for OpenMP support.
The parallelized code ran in 297 milliseconds on my eight core development system. This represents a significant reduction in processing time. All software can benefit from such an optimization.
Conclusion
If you parallelize loops, congratulations and keep at it. If you haven’t parallelized any loops, get started now. You will find it easier than you think, with a whopping payout.
Posted on July 19, 2016 by Rick Leinecker, Slashdot Media Contributing Editor 
 
https://goparallel.sourceforge.net/basics-parallelization-part-one/

Nessun commento:

Posta un commento