Parallel Processing in Python

PARALLEL PROCESSING IN PYTHON COSMOS - 1/28/2020 BY JOSEPH KREADY LAYOUT What is Parallel Parallel Processing Processing and Python History of Google Colab Parallel Example Computation Parallel Processing: Multiple objects at a time Serial Processing: One object at a time SERIAL PROCESSING VS. PARALLEL PROCESSING CPU ARCHITECTURE Serial Computation The method used for improving computer performance from 1980s – 2000s FREQUENCY Frequency Scaling Equation: Power consumption = capacitance * SCALING voltage^2 * frequency The increase to power consumption led to the demise of frequency scaling SUPERSCALAR ARCHITECTURE Executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. Each CPU is made of independent ‘Cores’ that can access the same memory concurrently MULTI-CORE Moore’s Law: The number of cores per PROCESSORS processor doubles every 18-24 months Operating systems ensure programs run on available cores, but developers must design their programs to take advantage of parallel processing. PROCESS VS. THREAD VS. MULTI-THREADING VS. HYPER-THREADING ¡ Processes are made of threads ¡ Threads of the same processes share memory. ¡ Processes run in separate memory. ¡ Hyper-threading: allows scheduling of 2 processes on 1 CPU core ¡ Multiple Instructions operate on separate data in parallel THE GLOBAL INTERPRETER LOCK!!! ¡ Global Interpreter lock (GIL): All Python processes must go through the GIL to execute; thus threads execute 1 at a time ¡ It is faster in the single-threaded case. ¡ It is faster in the multi-threaded case for i/o bound programs. (they are not GIL locked) ¡ It is faster in the multi-threaded case for CPU-bound programs that do their compute-intensive work in C libraries, ie Numpy. ¡ GIL only becomes a problem when doing CPU-intensive work in pure Python. ¡ Not all versions of Python use a GIL: Jython, IronPython, PyPy-STM YOUR CHOICES MULTI-THREADING MULTI-PROCESSING ¡ Threads act as sub-tasks of a single process ¡ Separate processes act as individual jobs ¡ Threads share the same memory space ¡ Processes run in their own memory space ¡ Great for background tasks / waiting for ¡ Great for complex calculations / running multiple asynchronous functions instances of a whole project ¡ Can lead to conflicts (Race Conditions), when ¡ Higher overhead, but more secure writing to the same memory location at the same time WHY DON’T WE JUST USE TREADS? Problem ¡ Race Conditions: Multiple threads reading and writing to the same object Will cause unexpected results ¡ The operating system handles the processing of threads dynamically. There’s no way to ensure the compute order. Solution ¡ Synchronization using Lock: You can define one part of your function ‘with thread.lock:’ which requires 1 thread processing at a time. ¡ This can lead to Deadlocks: where the lock is not released properly, or you call sub functions that are already locked in another thread WHY DON’T WE JUST USE PROCESSES? Problem ¡ Serialization using Pickles: converting python objects to byte streams ¡ Individual processes run in separate memory spaces and need a way to communicate. This is done through Pickles. ¡ Certain limitations, like limited function arguments and no class supported with pickling. Therefore, you must design your functions with pickling in mind. Solution ¡ Serialization using Dill ¡ Dill extends Pickles, allowing to send arbitrary classes and functions as byte streams. Dill can Pickle all the python objects! ¡ Pathos.Multiprocessing library is a fork of python’s multiprocessing that uses Dills instead of Pickles. AMDAHL’S LAW ¡ The small part of a program that cannot be parallelized will limit the overall speedup ¡ S-latency is the potential speedup in latency of the execution of the whole task; ¡ s is the speedup in latency of the execution of the parallelizable part of the task; ¡ p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization. https://colab.research.google.com/drive/1 TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh Real Python. “An Intro to Threading in Python.” Real Python, Real Python, 25 May 2019, Works Cited realpython.com/intro-to-python-threading/. Accessed 31 Jan. 2020. “An Introduction to Parallel Programming Using Python's Multiprocessing Module.” Dr. Sebastian Raschka, 20 June 2014, Rocklin, Matthew. “Parallelism and Serialization How Poor Pickling Breaks Multiprocessing.” sebastianraschka.com/Articles/2014_multiprocessing.html. Accessed 31 Jan. 2020. “Dill.” PyPI, pypi.org/project/dill/. Accessed 31 Jan. 2020. Parallelism and Serialization, matthewrocklin.com/blog/work/2013/12/05/Parallelism- FomiteFomite 2, et al. “Why Was Python Written with the GIL?” Software Engineering Stack and-Serialization. Accessed 31 Jan. 2020. Exchange, 1 Feb. 1963, softwareengineering.stackexchange.com/questions/186889/why- was-python-written-with-the-gil. Accessed 31 Jan. 2020. “Superscalar Processor.” Wikipedia, Wikimedia Foundation, 7 May 2019, “Has the Python GIL Been Slain?” By, hackernoon.com/has-the-python-gil-been-slain- 9440d28fa93d. Accessed 31 Jan. 2020. en.wikipedia.org/wiki/Superscalar_processor. Accessed 31 Jan. 2020. “Hyper-Threading.” Wikipedia, Wikimedia Foundation, 19 Jan. 2020, en.wikipedia.org/wiki/Hyper-threading. Accessed 31 Jan. 2020. “Multithreading (Computer Architecture).” Wikipedia, Wikimedia Foundation, 2 Jan. 2020, en.wikipedia.org/wiki/Multithreading_(computer_architecture). Accessed 31 Jan. 2020. “Parallel Computing.” Wikipedia, Wikimedia Foundation, 26 Dec. 2019, en.wikipedia.org/wiki/Parallel_computing. Accessed 31 Jan. 2020. REFERENCES “Pickle - Python Object Serialization¶.” Pickle - Python Object Serialization - Python 3.8.1 Documentation, docs.python.org/3/library/pickle.html. Accessed 31 Jan. 2020. .

Load more