PARALLEL PROCESSING IN PYTHON

COSMOS - 1/28/2020 BY JOSEPH KREADY LAYOUT

What is Parallel Parallel Processing Processing and Python

History of Google Colab Parallel Example Computation

Parallel Processing: Multiple objects at a time

Serial Processing: One object at a time

SERIAL PROCESSING VS. PARALLEL PROCESSING

CPU ARCHITECTURE Serial Computation The method used for improving performance from 1980s – 2000s

FREQUENCY Frequency Scaling Equation: Power consumption = capacitance * SCALING voltage^2 * frequency

The increase to power consumption led to the demise of frequency scaling SUPERSCALAR ARCHITECTURE

Executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the . Each CPU is made of independent ‘Cores’ that can access the same memory concurrently

MULTI-CORE Moore’s Law: The number of cores per PROCESSORS processor doubles every 18-24 months

Operating systems ensure programs run on available cores, but developers must design their programs to take advantage of parallel processing. VS. VS. MULTI-THREADING VS. HYPER-THREADING

¡ Processes are made of threads ¡ Threads of the same processes share memory. ¡ Processes run in separate memory. ¡ Hyper-threading: allows scheduling of 2 processes on 1 CPU core

¡ Multiple Instructions operate on separate data in parallel

THE GLOBAL !!!

¡ Global Interpreter lock (GIL): All Python processes must go through the GIL to execute; thus threads execute 1 at a time ¡ It is faster in the single-threaded case. ¡ It is faster in the multi-threaded case for i/o bound programs. (they are not GIL locked) ¡ It is faster in the multi-threaded case for CPU-bound programs that do their compute-intensive work in libraries, ie Numpy. ¡ GIL only becomes a problem when doing CPU-intensive work in pure Python. ¡ Not all versions of Python use a GIL: , IronPython, PyPy-STM YOUR CHOICES

MULTI-THREADING MULTI-PROCESSING

¡ Threads act as sub-tasks of a single process ¡ Separate processes act as individual jobs ¡ Threads share the same memory space ¡ Processes run in their own memory space

¡ Great for background tasks / waiting for ¡ Great for complex calculations / running multiple asynchronous functions instances of a whole project

¡ Can lead to conflicts (Race Conditions), when ¡ Higher overhead, but more secure writing to the same memory location at the same time WHY DON’T WE JUST USE TREADS?

Problem ¡ Race Conditions: Multiple threads reading and writing to the same object Will cause unexpected results

¡ The handles the processing of threads dynamically. There’s no way to ensure the compute order. Solution

¡ Synchronization using Lock: You can define one part of your function ‘with thread.lock:’ which requires 1 thread processing at a time.

¡ This can lead to : where the lock is not released properly, or you call sub functions that are already locked in another thread WHY DON’T WE JUST USE PROCESSES? Problem ¡ Serialization using Pickles: converting python objects to byte streams ¡ Individual processes run in separate memory spaces and need a way to communicate. This is done through Pickles. ¡ Certain limitations, like limited function arguments and no class supported with pickling. Therefore, you must design your functions with pickling in mind. Solution ¡ Serialization using Dill ¡ Dill extends Pickles, allowing to send arbitrary classes and functions as byte streams. Dill can Pickle all the python objects! ¡ Pathos.Multiprocessing library is a fork of python’s multi- processing that uses Dills instead of Pickles. AMDAHL’S LAW

¡ The small part of a program that cannot be parallelized will limit the overall ¡ S-latency is the potential speedup in latency of the execution of the whole task; ¡ s is the speedup in latency of the execution of the parallelizable part of the task; ¡ p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization. https://colab.research.google.com/drive/1 TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh Real Python. “An Intro to Threading in Python.” Real Python, Real Python, 25 May 2019,

Works Cited realpython.com/intro-to-python-threading/. Accessed 31 Jan. 2020.

“An Introduction to Parallel Programming Using Python's Multiprocessing Module.” Dr.

Sebastian Raschka, 20 June 2014, Rocklin, Matthew. “Parallelism and Serialization How Poor Pickling Breaks Multiprocessing.” sebastianraschka.com/Articles/2014_multiprocessing.html. Accessed 31 Jan. 2020.

“Dill.” PyPI, pypi.org/project/dill/. Accessed 31 Jan. 2020. Parallelism and Serialization, matthewrocklin.com/blog/work/2013/12/05/Parallelism-

FomiteFomite 2, et al. “Why Was Python Written with the GIL?” Engineering Stack and-Serialization. Accessed 31 Jan. 2020. Exchange, 1 Feb. 1963, softwareengineering.stackexchange.com/questions/186889/why-

was-python-written-with-the-gil. Accessed 31 Jan. 2020. “.” Wikipedia, Wikimedia Foundation, 7 May 2019, “Has the Python GIL Been Slain?” By, hackernoon.com/has-the-python-gil-been-slain- 9440d28fa93d. Accessed 31 Jan. 2020. en.wikipedia.org/wiki/Superscalar_processor. Accessed 31 Jan. 2020.

“Hyper-Threading.” Wikipedia, Wikimedia Foundation, 19 Jan. 2020,

en.wikipedia.org/wiki/Hyper-threading. Accessed 31 Jan. 2020.

“Multithreading ().” Wikipedia, Wikimedia Foundation, 2 Jan. 2020,

en.wikipedia.org/wiki/Multithreading_(computer_architecture). Accessed 31 Jan. 2020.

.” Wikipedia, Wikimedia Foundation, 26 Dec. 2019, en.wikipedia.org/wiki/Parallel_computing. Accessed 31 Jan. 2020. REFERENCES “Pickle - Python Object Serialization¶.” Pickle - Python Object Serialization - Python 3.8.1

Documentation, docs.python.org/3/library/pickle.html. Accessed 31 Jan. 2020.