Introduction to Parallel Processing with Python
1 What is Parallel Computing?
Serial Computing
2 Source - https://computing.llnl.gov/tutorials/parallel_comp/ What is Parallel Computing?
Parallel Computing: Breaking a problem into multiple pieces and processing each piece in parallel through multiple processors
3 Parallelized Hardware
Nearly all processors now have parallelized processing architectures
Eight-core CPUs on now selling for mainstream consumers
Intel® Core™ i7-5960X: $1000 (2014)
AMD Ryzen 2700X: $300 (2018)
4 HPCs – Built for Parallelization
• HPCs employ often 2-4 server-grade CPUs per node • 8 – 16 processor cores per CPU • Shared memory on each node for all processors
• Distributed memory architecture • Nodes are connected via a 56-100 Gbps network • Memory is shared between nodes through some API • MPI is most commonly used
5 Global Interpreter Lock
6 Global Interpreter Lock
• The Python interpreter is not fully thread-safe.
• In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects.
• Without the lock, even the simplest operations could cause problems in a multi-threaded program
• For example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.
• Therefore, only one thread is run at a time.
7 So how can one effectively parallelize their code?
Enter: multiprocessing
Time to switch over to Jupyter Notebook
8 Installing a Conda Environment for Keras and TensorFlow with Jupyter Support
$ module load python/3.6.1-2-anaconda
$ conda create --name py3.6-multiprocess -–clone root $ source activate py3.6-multiprocess $ conda install –c conda-forge multiprocess
$ ipython kernel install --user --name py3.6-multiprocess --display- name=“Custom"
9