PARALLEL PROCESSING IN PYTHON
COSMOS - 1/28/2020 BY JOSEPH KREADY LAYOUT
What is Parallel Parallel Processing Processing and Python
History of Google Colab Parallel Example Computation
Parallel Processing: Multiple objects at a time
Serial Processing: One object at a time
SERIAL PROCESSING VS. PARALLEL PROCESSING
CPU ARCHITECTURE Serial Computation The method used for improving computer performance from 1980s – 2000s
FREQUENCY Frequency Scaling Equation: Power consumption = capacitance * SCALING voltage^2 * frequency
The increase to power consumption led to the demise of frequency scaling SUPERSCALAR ARCHITECTURE
Executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. Each CPU is made of independent ‘Cores’ that can access the same memory concurrently
MULTI-CORE Moore’s Law: The number of cores per PROCESSORS processor doubles every 18-24 months
Operating systems ensure programs run on available cores, but developers must design their programs to take advantage of parallel processing. PROCESS VS. THREAD VS. MULTI-THREADING VS. HYPER-THREADING
¡ Processes are made of threads ¡ Threads of the same processes share memory. ¡ Processes run in separate memory. ¡ Hyper-threading: allows scheduling of 2 processes on 1 CPU core
¡ Multiple Instructions operate on separate data in parallel
THE GLOBAL INTERPRETER LOCK!!!
¡ Global Interpreter lock (GIL): All Python processes must go through the GIL to execute; thus threads execute 1 at a time ¡ It is faster in the single-threaded case. ¡ It is faster in the multi-threaded case for i/o bound programs. (they are not GIL locked) ¡ It is faster in the multi-threaded case for CPU-bound programs that do their compute-intensive work in C libraries, ie Numpy. ¡ GIL only becomes a problem when doing CPU-intensive work in pure Python. ¡ Not all versions of Python use a GIL: Jython, IronPython, PyPy-STM YOUR CHOICES
MULTI-THREADING MULTI-PROCESSING
¡ Threads act as sub-tasks of a single process ¡ Separate processes act as individual jobs ¡ Threads share the same memory space ¡ Processes run in their own memory space
¡ Great for background tasks / waiting for ¡ Great for complex calculations / running multiple asynchronous functions instances of a whole project
¡ Can lead to conflicts (Race Conditions), when ¡ Higher overhead, but more secure writing to the same memory location at the same time WHY DON’T WE JUST USE TREADS?
Problem ¡ Race Conditions: Multiple threads reading and writing to the same object Will cause unexpected results
¡ The operating system handles the processing of threads dynamically. There’s no way to ensure the compute order. Solution
¡ Synchronization using Lock: You can define one part of your function ‘with thread.lock:’ which requires 1 thread processing at a time.
¡ This can lead to Deadlocks: where the lock is not released properly, or you call sub functions that are already locked in another thread WHY DON’T WE JUST USE PROCESSES? Problem ¡ Serialization using Pickles: converting python objects to byte streams ¡ Individual processes run in separate memory spaces and need a way to communicate. This is done through Pickles. ¡ Certain limitations, like limited function arguments and no class supported with pickling. Therefore, you must design your functions with pickling in mind. Solution ¡ Serialization using Dill ¡ Dill extends Pickles, allowing to send arbitrary classes and functions as byte streams. Dill can Pickle all the python objects! ¡ Pathos.Multiprocessing library is a fork of python’s multi- processing that uses Dills instead of Pickles. AMDAHL’S LAW
¡ The small part of a program that cannot be parallelized will limit the overall speedup ¡ S-latency is the potential speedup in latency of the execution of the whole task; ¡ s is the speedup in latency of the execution of the parallelizable part of the task; ¡ p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization. https://colab.research.google.com/drive/1 TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh Real Python. “An Intro to Threading in Python.” Real Python, Real Python, 25 May 2019,
Works Cited realpython.com/intro-to-python-threading/. Accessed 31 Jan. 2020.
“An Introduction to Parallel Programming Using Python's Multiprocessing Module.” Dr.
Sebastian Raschka, 20 June 2014, Rocklin, Matthew. “Parallelism and Serialization How Poor Pickling Breaks Multiprocessing.” sebastianraschka.com/Articles/2014_multiprocessing.html. Accessed 31 Jan. 2020.
“Dill.” PyPI, pypi.org/project/dill/. Accessed 31 Jan. 2020. Parallelism and Serialization, matthewrocklin.com/blog/work/2013/12/05/Parallelism-
FomiteFomite 2, et al. “Why Was Python Written with the GIL?” Software Engineering Stack and-Serialization. Accessed 31 Jan. 2020. Exchange, 1 Feb. 1963, softwareengineering.stackexchange.com/questions/186889/why-
was-python-written-with-the-gil. Accessed 31 Jan. 2020. “Superscalar Processor.” Wikipedia, Wikimedia Foundation, 7 May 2019, “Has the Python GIL Been Slain?” By, hackernoon.com/has-the-python-gil-been-slain- 9440d28fa93d. Accessed 31 Jan. 2020. en.wikipedia.org/wiki/Superscalar_processor. Accessed 31 Jan. 2020.
“Hyper-Threading.” Wikipedia, Wikimedia Foundation, 19 Jan. 2020,
en.wikipedia.org/wiki/Hyper-threading. Accessed 31 Jan. 2020.
“Multithreading (Computer Architecture).” Wikipedia, Wikimedia Foundation, 2 Jan. 2020,
en.wikipedia.org/wiki/Multithreading_(computer_architecture). Accessed 31 Jan. 2020.
“Parallel Computing.” Wikipedia, Wikimedia Foundation, 26 Dec. 2019, en.wikipedia.org/wiki/Parallel_computing. Accessed 31 Jan. 2020. REFERENCES “Pickle - Python Object Serialization¶.” Pickle - Python Object Serialization - Python 3.8.1
Documentation, docs.python.org/3/library/pickle.html. Accessed 31 Jan. 2020.