Chapter 19. Concurrency Models in Python

concurrency vs parallelism; informally speaking
- concurrency: dealing with multiple things done at once \(\implies\) it’s about structure of a solution
  the structure provided by concurrent solutions may help solve a problem (though not necessarily) in a parallelized fashion.
- parallelism: doing lots of things at once \(\implies\) execution of the solution
  in this informal view, it’s a special case of concurrency, so parallel \(\implies\) concurrent
Python’s three approaches to concurrency: threads, processes, and native coroutine.
python’s fitness for concurrent and parallel computing is not limited to what the std lib provides. Python can scale.

What’s New in This Chapter #

The Big Picture #

factor of difficulty when writing concurrent programs: starting threads or processes is easy enough, but how do you keep track of them?
non concurrent programs, function call is blocking so useful for us
concurrent programs, non blocking, need to rely on some form of communication to get back results or errors
starting a thread is not cheap \(\implies\) amortize costs by using “worker” threads/procs \(\implies\) coordinating them is tough e.g. how to terminate?
resolved using messages and queues still
coroutines are useful:
- cheap to start
- returns values
- can be safely cancelled
- specific area to catch exceptions
But they have problems:
- they’re handled by the async framework \(\implies\) hard to monitor as threads / procs
- not good for CPU-intensive tasks

A Bit of Jargon #

Concurrency: ability to handle multiple pending tasks (each eventually succeeding or failing) \(\implies\) can multitask
Parallelism: ability to compute multiple computations at the same time \(\implies\) multicore CPU, multiple CPU, GPU, multiple computers in a cluster
Execution Unit: objects executing concurrent code. Each has independent state and call stack
Python execution units:
1. processes
  - definition:
    instance of computer program while it’s running, using memory and CPU time-slices, all of which has its own private memory space
  - communication:
    objects communicated as raw bytes (so must be serialised) to pass from one proc to another. Communicated via pipes, sockets or memory-mapped files
  - spawning:
    can spawn child procs which are all isolated from the parent
  - scheduling:
    can be pre-emptively scheduled, supposed to be that a frozen proc won’t freeze the whole system
2. threads
  - definition:
    execution unit within a single process
    consumes less resources than a process (if they both did the same job)
  - lifecycle:
    @ start of process, there’s a single thread. Procs can create more threads by calling OS APIs
  - Shared Memory management:
    Threads within a process share the same memory space \(\implies\) holds live Python object. Shared memory may be corrupted via read/write race conditions
  - Supervision:
    Also supervised by OS Scheduler, threads can enable pre-emptive multitasking
3. coroutines
  - Definition:
    A function that can suspend itself and resume later.
    Classic Coroutines: built from generator functions
    Native Coroutines: defined using async def
  - Supervising coroutines:
    Typically, coroutines run within a single thread, supervised by an event loop that is in the same thread.
    Async frameworks provide an event loop and supporting libs that support nonblocking, coroutine-based I/)
  - Scheduling & Cooperative Multitasking:
    each coroutine must explicitly cede control with the yield or await keyword, so that another may proceed concurrently (but not in parallel).
    so if there’s any blocking code in a coroutine block, it would block the execution of the event loop and hence all other coroutines
    this contrasts preemptive multitasking supported by procs and threads.
    nevertheless, coroutine consumes less resources than a thread or proc doing the same job
Mechanisms useful to us:
1. Queue:
  - purpose:
    allow separate execution units to exchange application data and control messages, such as error codes and signals to terminate.
  - implementation:
    depends on concurrency model:
    - python stdlib queue gives queue classes to support threads
      this also provides non-FIFO queues like LifoQueue and PriorityQueue
    - multiprocessing, asyncio packages have their own queue classes
      asyncio also provides non-FIFO queues like LifoQueue and PriorityQueue
2. Lock:
  - purpose:
    Sync mechanism object for execution units to sync actions and avoid data corruption
    While updating a shared data structure, the running code should hold an associated lock.
  - implementation:
    depends on the concurrency model
    simplest form of a lock is just a mutex
3. Contention: dispute over a limited asset
  - Resource Contention
    When multiple exeuction units try to access a shared resoruce (e.g. a lock / storage)
  - CPU Contention
    Compute-intensive procs / threads must wait for the OS scheduler to give them a share of CPU time

Processes, Threads, and Python’s Infamous GIL #

Here’s 10 points that consolidate info about python’s concurrency support:

Instance of python interpreter \(\implies\) a process
We can create additional Python processes \(\leftarrow\) use multiprocessing / concurrent.futures libraries
We can also start sub-processes that run any other external programs. \(\leftarrow\) using subprocess library
Interpreter runs user program and the GC in a single thread. We can start additional threads using threading / concurrent.futures libraries.
GIL (Global Interpreter Lock) controls internal interpreter state (process state shared across threads) and access to object ref counts.
Only one python thread can hold the GIL at any time \(\implies\) only one thread can execute Python code at any time, regardless the number of CPU cores.
GIL is NOT part of the python language definition, it’s a CPython Implementation detail. This is critical for portability reasons.
Default release of the GIL @ an interval:
Prevents any particular thread from holding the GIL indefinitely.
It’s the bytecode interpreter that pauses the current thread every 5ms default (can be changed) and the OS Scheduler picks who (which thread) gets access to the GIL next (could be the same thread that just released the GIL also).
Python source code can’t control the GIL but extension / builtin written in C (or lang that interfaces at the Python/C API level) can release the GIL when it’s running time-consuming tasks.
Every python stdlib that does a syscall (for kernel services) will release the GIL. This avoids contention of resources (mem as well as CPU)
- functions that perform I/O operations (disk, network, sleep)
- functions that are CPU-intensive (e.g. NumPy / SciPy), compressing/decompressing functions (e.g. zlib, bz2)
GIL-free threads:
- can only be launched by extensions that integrate at the Python/C API level
- can’t change python objects generally, but can R/W to memory objects that support buffer protocols (bytearray, array.array, NumPy arrays)
- GIL-free python is under experimentation at the moment (but not mainstream)
Network I/O is GIL-insensitive
GIL minimally affects network programming because Network I/O is higher latency than memory I/O.
Each individual thread would have spent long time waiting anyway so interleaving their execution doesn’t majory impact the overall throughput.
Compute-intensive python threads \(\implies\) will be slowed down by GIL contention.
Better to use sequential, single-threaded code here. Faster and simpler.
CPU-intensive python code to be ran on multiple cores requires multiple python processes.

Extra Notes: #

Coroutines are not affected by the GIL
by default they share the same Python thread among themselves and with the supervising event loop provided by an asynchronous framework, therefore the GIL does not affect them.
We technically can use multiple therads in an async program. This is not best practice.
Typically, we have one coordinating thread running the event loops, which delegates to additional threads that carry out specific tasks.
KIV “delegating tasks to executors”

A Concurrent Hello World #

a demo of how python can “walk and chew gum”, using multiple approaches: multiprocessing, threading, asyncio

Spinner with Threads #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# spinner_thread.py

# credits: Adapted from Michele Simionato's
# multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_THREAD_TOP[]
import itertools
import time
from threading import Thread, Event

def spin(msg: str, done: Event) -> None:  # <1> this fn runs in a separate thread, Event instance is for syncing of threads
    for char in itertools.cycle(r'\|/-'):  # <2> infinite loop because infinite iterator
        status = f'\r{char} {msg}'  # <3> the carriage return TRICK
        print(status, end='', flush=True)
        if done.wait(.1):  # <4> ??? the timeout value sets the "framerate" of the animation (0.1s => 10FPS)
            break  # <5> break inf loop
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')  # <6> clears the status line

def slow() -> int: # called by the main thread
    time.sleep(3)  # <7> this is a blocking syscall, so GIL is released, which will allow other threads to be executed
    return 42
# end::SPINNER_THREAD_TOP[]

# tag::SPINNER_THREAD_REST[]
def supervisor() -> int:  # <1> eventually returns the result of =slow=
    done = Event()  # <2> to coordinate =main= and =spinner= thread
    spinner = Thread(target=spin, args=('thinking!', done))  # <3> spawn thread
    print(f'spinner object: {spinner}')  # <4> displays as <Thread(Thread-1, initial)> ; initial means the thread not started yet
    spinner.start()  # <5>
    result = slow()  # <6> call slow, blocks the =main= thread, while the secondary =spinner= thread still runs the animation
    done.set()  # <7> signals spin function to exit, terminates the fot loop inside the spin function
    spinner.join()  # <8> wait until spinner finishes (fork-join!)
    return result

def main() -> None:
    result = supervisor()  # <9> just a didatic purpose, to make it similar to the asyncio version
    print(f'Answer: {result}')

if __name__ == '__main__':
    main()

# end::SPINNER_THREAD_REST[]

Notes:

within slow(), time.sleep blocks the calling thread but releases the GIL, so other Python threads (in this case our secondary thread for spinner) can run.
spin and slow executed concurrently, the supervisor coordinates the threads using an instance of threading.Event
creating threads:
create a new Thread, provide a function as the target keyword argument, and positional arguments to the target as a tuple passed via args spinner = Thread(target=spin, args=('thinking!', done)) # <3> spawn thread
we can also pass in kwargs using kwargs named parameter to Thread constructor

threading.Event:
1. Python’s simplest signalling mechanism to coordinate threads.
2. Event instance has an internal boolean flag that starts as False. Calling Event.set() sets the flag to True.
  - when flag is False (unset):
    - if a thread calls Event.wait(), the thread is blocked until another thread calls Event.set(). When this happens, Event.wait() returns True
    - If timeout is provided Event.wait(s), the call returns False when timeout elapses.
      As soon as another thread calls Event.set() then the wait function will return True.
TRICK: for text-mode animation: move the cursor back to the start of the line with the carriage return ASCII control character ('\r').

Spinner with Processes #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# spinner_proc.py

# credits: Adapted from Michele Simionato's
# multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_PROC_IMPORTS[]
import itertools
import time
from multiprocessing import Process, Event  # <1>
from multiprocessing import synchronize     # <2> this import supports the type hinting for the Event

def spin(msg: str, done: synchronize.Event) -> None:  # <3> attention to the typehint
# end::SPINNER_PROC_IMPORTS[]
    for char in itertools.cycle(r'\|/-'):
        status = f'\r{char} {msg}'
        print(status, end='', flush=True)
        if done.wait(.1):
            break
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')

def slow() -> int:
    time.sleep(3)
    return 42

# tag::SPINNER_PROC_SUPER[]
def supervisor() -> int:
    done = Event()
    spinner = Process(target=spin,               # <4>
                      args=('thinking!', done))
    print(f'spinner object: {spinner}')          # <5> displays <Process name='Process-1' parent=14868 initial> so it tells you the PID and the initial state.
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result
# end::SPINNER_PROC_SUPER[]

def main() -> None:
    result = supervisor()
    print(f'Answer: {result}')


if __name__ == '__main__':
    main()

multiprocessing package supports running concurrent tasks in separate Python processes instead of threads.
each instance has its own python interpreter, procs will be working in the background.
Each proc has its own GIL \(\implies\) we can exploit our multicore CPU well because of this (depends on the OS scheduler though)
multiprocessing API emulates the threading API \(\implies\) can easily convert between them.
Comparing multiprocessing and therading APIs
- similarities
  1. Event objects are similar in how they function with the bit setting / unsetting
  2. Event objects can wait on timeouts
- differences:
  1. Event is of different type between them multiprocessing.Event is a function (not a class like threading.Event)
  2. multiprocessing has a larger API because it’s more complex
    e.g. python objects that would need to be communicated across process need to be serialized/deserialized because it’s an OS-level isolation (of processes). This adds overhead.
    the Event state is the only cross-proccess state being shared, it’s implemented via an OS semaphore
    memory sharing can be done via multiprocessing.shared_memory. Only raw bytes, can use a ShareableList (mutable sequence) with a fixed number of items of some primitives up to 10MB per item.

Spinner with Coroutines #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# spinner_async.py

# credits: Example by Luciano Ramalho inspired by
# Michele Simionato's multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_ASYNC_TOP[]
import asyncio
import itertools

async def spin(msg: str) -> None:  # <1> no need Event as a coordinating mechanism
    for char in itertools.cycle(r'\|/-'):
        status = f'\r{char} {msg}'
        print(status, flush=True, end='')
        try:
            await asyncio.sleep(.1)  # <2> this is a non-blocking pause
        except asyncio.CancelledError:  # <3> when the cancelled method is called on the task that is controlling this coroutine
            break
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')

async def slow() -> int:
    await asyncio.sleep(3)  # <4> also uses the non blocking sleep
    return 42
# end::SPINNER_ASYNC_TOP[]

# tag::SPINNER_ASYNC_START[]
def main() -> None:  # <1> only regular function here, rest are coroutines
    result = asyncio.run(supervisor())  # <2> supervisor coroutine will block the main function
    print(f'Answer: {result}')

async def supervisor() -> int:  # <3> native coroutine definition
    spinner = asyncio.create_task(spin('thinking!'))  # <4> returns a Task, schedules the eventual execution of spin
    print(f'spinner object: {spinner}')  # <5> <Task pending name='Task-2' coro=<spin() running at /path/to/spinner_async.py:11>>
    result = await slow()  # <6> =await= calls slow, blocks =supervisor= until =slow= returns
    spinner.cancel()  # <7> Task.cancel() raises =CancelledError= inside the coro task
    return result

if __name__ == '__main__':
    main()
# end::SPINNER_ASYNC_START[]

who manages the event loop?
- for threads and processes, it’s the OS Scheduler
- for coroutines, it’s app-level event loop
  drives coroutines one by one, manages queue of pending coroutines, passes control back to corresponding coroutine when each event happens
  all of these execute in a single thread: event loop, library coroutines, user coroutines
  that’s why coroutines logic is blocking
Concurrency is achieved by control passing from one coroutine to another.
Python code using asyncio has only one flow of execution, unless you’ve explicitly started additional threads or processes.
means only one coroutine executes at any point in time.
Concurrency is achieved by control passing from one coroutine to another. This happens when we use the await keyword.
Remember when using asyncio coroutines, if we ever need some time for NOOPs, to use non-blocking sleep (asyncio.sleep(DELAY)) instead of blocking sleep (time.sleep())
explaining the example
- asyncio.run starts the event loop, drives the coroutine (supervisor) that sets other coroutines in motion.
  supervisor will block the main function until it’s done
  asyncio.run returns what supervisor returns
- await calls slow, blocks supervisor until slow returns
  I think it’s easier to see it as a control flow handover to slow. That’s why it’s blocking and that’s why when the control flow returns, we carry on with the assignment operator.
- Task.cancel() raises CancelledError inside the coro task

NOTE: if we directly invoke a coro like coro() it immediately returns (because it’s async) but doesn’t return the body of the coro function
the coro needs to be driven by an event loop.
We see 3 ways to run a coro (driven by an event loop):
1. asyncio.run(coro())
  - a regular function will call this
  - usually the first coro is the entry point, that supervisor
  - return value of run is whatever the body of coro returns
2. asyncio.create_task(coro())
  - called from a coroutine, returns a Task instance. Task wraps the coro and provides methods to control and query its state.
  - schedules another coroutine to be eventually run
  - does not suspend current coroutine
3. await coro()
  - transfers control from current coro to coro returned by coro()
  - suspends the current coro until the other coro returns
  - value of await expression is whatever the body of the coro returns

Supervisors Side-by-Side #

asyncio.Task vs threading.Thread (roughly equivalent)
- Task trives a coroutine object, Thread invokes a callable
- yielding control: coroutine yields explicitly with await
- we don’t instantiate Task objects ourselves , we get them by using asyncio.create_task()
- explicit scheduling:
  - create_task gives a Task object that is already waiting to run, Thread instance must be explicitly told to run via start
- Termination:
  - threads can’t be terminated from the outside, we can only pass in a signal (eg. setting done in Event)
  - tasks Task.cancel() can be cancelled from the outside, raises CancelledError at the await expression where the coro body is currently suspended
    this can happen because coros are always in-sync because only one of them is running at any time, that’s why the outside can come and cancel it vs outside suggesting to terminate via a signal.
Instead of holding locks to synchronize the operations of multiple threads, coroutines are “synchronized” by definition: only one of them is running at any time.
coroutines, code is protected against interruption by default because we’re in charge of driving the event loop

The Real Impact of the GIL #

Quick Quiz #

the main question here is that are the mechanisms interruptable by the entity that coordinates the control flow.
processes are controlled by OS scheduler so this is interruptable \(\implies\) the multiprocessing version will still carry on as usual
threads are controlled by the OS scheduler as well and the GIL lock can be released at a default interval, so this is useful to us \(\implies\) the threading approach will not have a noticeable difference.
this has negligible effect only because the number of threads were minimal (2). If any more, it may be visible.

the asyncio coroutine version will be blocked by this compute-intensive call.

we can try doing this hack though: make the is_prime a coroutine and await asyncio.sleep(0) to yield control flow.

This is slow though

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
  # spinner_prime_async_nap.py

  # credits: Example by Luciano Ramalho inspired by
  # Michele Simionato's multiprocessing example in the python-list:
  # https://mail.python.org/pipermail/python-list/2009-February/675659.html

  import asyncio
  import itertools
  import math
  import functools

  # tag::PRIME_NAP[]
  async def is_prime(n):
      if n < 2:
          return False
      if n == 2:
          return True
      if n % 2 == 0:
          return False

      root = math.isqrt(n)
      for i in range(3, root + 1, 2):
          if n % i == 0:
              return False
          if i % 100_000 == 1:
              await asyncio.sleep(0)  # <1>
      return True
  # end::PRIME_NAP[]


  async def spin(msg: str) -> None:
      for char in itertools.cycle(r'\|/-'):
          status = f'\r{char} {msg}'
          print(status, flush=True, end='')
          try:
              await asyncio.sleep(.1)
          except asyncio.CancelledError:
              break
      blanks = ' ' * len(status)
      print(f'\r{blanks}\r', end='')

  async def check(n: int) -> int:
      return await is_prime(n)

  async def supervisor(n: int) -> int:
      spinner = asyncio.create_task(spin('thinking!'))
      print('spinner object:', spinner)
      result = await check(n)
      spinner.cancel()
      return result

  def main() -> None:
      n = 5_000_111_000_222_021
      result = asyncio.run(supervisor(n))
      msg = 'is' if result else 'is not'
      print(f'{n:,} {msg} prime')

  if __name__ == '__main__':
      main()

Using await asyncio.sleep(0) should be considered a stopgap measure before you refactor your asynchronous code to delegate CPU-intensive computations to another process.

A Homegrown Process Pool #

Process-Based Solution #

starts a number of worker processes equal to the number of CPU cores, as determined by multiprocessing.cpu_count()
some overhead in spinning up processes and in inter-process communication

Understanding the Elapsed Times #

Code for the Multicore Prime Checker #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/env python3

"""
procs.py: shows that multiprocessing on a multicore machine
can be faster than sequential code for CPU-intensive work.
"""

# tag::PRIMES_PROC_TOP[]
import sys
from time import perf_counter
from typing import NamedTuple
from multiprocessing import Process, SimpleQueue, cpu_count  # <1> use the SimpleQueue to build a queue
from multiprocessing import queues  # <2> use queues.SimpleQueue for typing

from primes import is_prime, NUMBERS

class PrimeResult(NamedTuple):  # <3>
    n: int
    prime: bool
    elapsed: float

JobQueue = queues.SimpleQueue[int]  # <4> TypeAlias for a simple queue to send numbers to the procs that will do the job

ResultQueue = queues.SimpleQueue[PrimeResult]  # <5> TypeAlias for simple queue for building the results

def check(n: int) -> PrimeResult:  # <6>
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

def worker(jobs: JobQueue, results: ResultQueue) -> None:  # <7> gets a queue to read from (jobs) and a queue to write to (results)
    while n := jobs.get():  # <8> uses n = 0 as the poison pill
        results.put(check(n))  # <9> primality check then enqueue the result
    results.put(PrimeResult(0, False, 0.0))  # <10> signals that the worker is done

def start_jobs(
    procs: int, jobs: JobQueue, results: ResultQueue  # <11> procs: number of parallel prime checks
) -> None:
    for n in NUMBERS:
        jobs.put(n)  # <12> enqueue number to be checked
    for _ in range(procs):
        proc = Process(target=worker, args=(jobs, results))  # <13> Fork a child proc for each worker, runs until it fetches a 0 from jobs queue
        proc.start()  # <14> starting the child proc
        jobs.put(0)  # <15> poision pill it after starting, will be read after all the actual jobs get read
# end::PRIMES_PROC_TOP[]

# tag::PRIMES_PROC_MAIN[]
def main() -> None:
    if len(sys.argv) < 2:  # <1>
        procs = cpu_count()
    else:
        procs = int(sys.argv[1])

    print(f'Checking {len(NUMBERS)} numbers with {procs} processes:')
    t0 = perf_counter()
    jobs: JobQueue = SimpleQueue()  # <2>
    results: ResultQueue = SimpleQueue()
    start_jobs(procs, jobs, results)  # <3> starts the workers
    checked = report(procs, results)  # <4>
    elapsed = perf_counter() - t0
    print(f'{checked} checks in {elapsed:.2f}s')  # <5>

def report(procs: int, results: ResultQueue) -> int: # <6>
    checked = 0
    procs_done = 0
    while procs_done < procs:  # <7>
        n, prime, elapsed = results.get()  # <8>
        if n == 0:  # <9>
            procs_done += 1
        else:
            checked += 1  # <10>
            label = 'P' if prime else ' '
            print(f'{n:16}  {label} {elapsed:9.6f}s')
    return checked

if __name__ == '__main__':
    main()
# end::PRIMES_PROC_MAIN[]

when delegating computing to threads / procs, code doesn’t call the worker function directly
the worker is driven by the thread or proc library
the worker eventually produces a result that is stored somewhere
worker coordination & result collection are common uses of queues in concurrent programming
IDIOM: loops, sentinels and poison pills:
- worker function useful for showing common concurrent programming pattern:
  - we loop indefinitely while taking items from a queue and processing each with a fn that does the actual work (check)
  - we end the loop when the queue produces a sentinel value
    the sentinel value that shuts down a worker is often called a poison pill
    - TRICK/IDIOM: poison pilling to signal the worker to finish
      notice the use of the poison-pill in point 8 of the code above
    - common sentinels: (here’s a comment thread on sentinels)
      - None, but may not work if the data stream legitimately may produce None
      - object() is a common sentinel but Python objects must be serialised for IPC, so when we pickle.dump and pickle.load and object, the unpickled instance is distinct from the original and doesn’t compare equal.
      - ⭐️ ... Ellipsis builtin is a good option, it will survive serialisation without losing its identity.
Debugging concurrent code is always hard, and debugging multiprocessing is even harder because of all the complexity behind the thread-like façade.

Experimenting with More or Fewer Processes #

typically after the number of cores available to us, we should expect runtime to increase because of CPU Contention

Thread-Based Nonsolution #

Due to the GIL and the compute-intensive nature of is_prime, the threaded version is slower than the sequential code
it gets slower as the number of threads increase, because of CPU contention and the cost of context switching.
OS contention: all the stack frame changes required is what causes the extra overhead
KIV managing threads and processes using concurrent.futures (chapter 20) and doing async programming using asyncio (chapter 21)

Python in the Multicore World #

GIL makes the interpreter faster when running on a single core, and its implementation simpler. It was a no-brainer when CPU performance didn’t hinge on concurrency.
Despite the GIL, Python is thriving in applications that require concurrent or parallel execution, thanks to libraries and software architectures that work around the limitations of CPython.

System Administration #

use cases: manage hardware like NAS, use it for SDN (software defined networking), hacking
python scripts help with these tasks, commanding remote machines \(\implies\) aren’t really CPU bound operations \(\implies\) Threads & Coroutines are Good for this
we can use the concurrent futures to perform the same operation on multiple remote machines at the same time without much complexity

Data Science #

compute-intensive applications, supported by an ecosystem of libs that can leverage multicore machines, GPUs / distribued parallel computing in heterogeneous clusters
some libs:
- project jupyter
- tensorflow (Google) and pytorch (Facebook)
- dask: parallel computing lib to cordinate work on clusters

Server-Side Web/Mobile Development #

both for app caches and HTTP caches (CDNs)

WSGI Application Servers #

WSGI a standard API for a Python framework or application to receive requests from an HTTP server and send responses to it.
WSGI apps manage one or more procs running your application, maximising the use of available CPUs
main point: all of these application servers can potentially use all CPU cores on the server by forking multiple Python processes to run traditional web apps written in good old sequential code in Django, Flask, Pyramid, etc. This explains why it’s been possible to earn a living as a Python web developer without ever studying the threading, multiprocessing, or asyncio modules: the application server handles concurrency transparently.

Distributed Task Queues #

Distributed Task Queues wrap a message queue and offer a high-level API for delegating tasks to workers, possibly running on different machines.
use cases:
- run background jobs
- trigger jobs after responding to the web request
- async retries to ensure something is done
- scheduled jobs
e.g. Django view handler produces job requests which are put in the queue to be consumed by one or more PDF rendering processes
Supports horizontal scalability
producers and consumers are decoupled
I’ve used Celery before!!

Chapter Summary #

the demo on the effect of the GIL
demonstrated graphically that CPU-intensive functions must be avoided in asyncio, as they block the event loop.
the prime demo highlighted the difference between multiprocessing and threading, proving that only processes allow Python to benefit from multicore CPUs.
GIL makes threads worse than sequential code for heavy computations.