Introduction to Parallel Processing with Python

1 What is ?

Serial Computing

2 Source - https://computing.llnl.gov/tutorials/parallel_comp/ What is Parallel Computing?

Parallel Computing: Breaking a problem into multiple pieces and processing each piece in parallel through multiple processors

3 Parallelized Hardware

Nearly all processors now have parallelized processing architectures

Eight-core CPUs on now selling for mainstream consumers

Intel® Core™ i7-5960X: $1000 (2014)

AMD Ryzen 2700X: $300 (2018)

4 HPCs – Built for Parallelization

• HPCs employ often 2-4 server-grade CPUs per node • 8 – 16 processor cores per CPU • Shared memory on each node for all processors

• Distributed memory architecture • Nodes are connected via a 56-100 Gbps network • Memory is shared between nodes through some API • MPI is most commonly used

5 Global

6 Global Interpreter Lock

• The Python interpreter is not fully -safe.

• In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects.

• Without the lock, even the simplest operations could cause problems in a multi-threaded program

• For example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.

• Therefore, only one thread is run at a time.

7 So how can one effectively parallelize their code?

Enter: multiprocessing

Time to switch over to Jupyter Notebook

8 Installing a Conda Environment for Keras and TensorFlow with Jupyter Support

$ module load python/3.6.1-2-anaconda

$ conda create --name py3.6-multiprocess -–clone root $ source activate py3.6-multiprocess $ conda install – conda-forge multiprocess

$ ipython kernel install --user --name py3.6-multiprocess --display- name=“Custom"

9