My unedited org-noter notes from the classic book “Fluent Python – Clear, Concise, and Effective Programming” by Luciano Ramalho.

The notes for this are messy, sorry about that. There are some chapters I could not get time to finish so they are left as TODOs for now.

Outline and Notes

Each chapter’s summary page is really useful. We should always start with the summary if we were to review these topics in the future, having already read them at least once before.

The things that are useful and I want to create a habit for, I write a coment with the word: “TO_HABIT” so that we can search it easily.

Part I. Data Structures

Chapter 1. The Python Data Model

Seeing python as a “framework”

This gives us some use cases / purpose for implementing special methods to interface w python as a “framework”

the special methods are dunder methods

We implement special methods when we want our objects to support and interact with fundamental language constructs such as: • Collections • Attribute access • Iteration (including asynchronous iteration using async for) • Operator overloading • Function and method invocation • String representation and formatting • Asynchronous programming using await • Object creation and destruction • Managed contexts using the with or async with statements

What’s New in This Chapter

A Pythonic Card Deck

this is a demonstrative example on how we can adapt to the “interface” for the “framework” that is python.

Class Composition and how Delegation pattern in the data model helps

our getitem delegates to the [] operator of self._cards, our deck automatically supports slicing. Here’s

  • The use of base classes allows OOP benefits for us such as being able to delegate functionality.
  • Delegation is different from forwarding
  • this python example is closer to the concept of “forwarding” actually
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Yes, the concept of "delegation" referenced in *Fluent Python* when discussing how `__getitem__` delegates to the `[]` operator of `self._cards` is directly related to a general programming language principle—specifically, the **delegation pattern** in object-oriented programming.

## Delegation in Programming

**Delegation** is a software design pattern where an object (the *delegator*) passes responsibility for a particular task to another object (the *delegate*) that is better suited to handle it. The delegator maintains the context and control, but leverages the delegate's specialized capabilities to fulfill certain operations[1][2][3].

In the context of Python's data model, when you write:

```python
def __getitem__(self, position):
    return self._cards[position]
```

You are *delegating* the item access operation to the underlying list (`self._cards`), which already knows how to handle slicing and indexing. This is a classic example of the delegation pattern: your class does not implement slicing logic itself, but instead relies on a delegate (the list) to do so.

## Delegation vs. Forwarding

- **Delegation**: The delegator object maintains the context and can pass itself (or relevant context) to the delegate. In some languages, this means that `self` (or `this`) inside the delegate's methods refers back to the original object, not the delegate[3][2].
- **Forwarding**: The delegator simply passes the request to the delegate, and the delegate operates in its own context. The delegate does not know about the original sender[3][2].

In Python, the example above is technically closer to *forwarding* because the list (`self._cards`) does not receive any special context from the delegator. However, in the broader sense, this is often referred to as delegation in Python literature.

## Delegation Pattern in Object-Oriented Programming

The delegation pattern allows composition to achieve the same code reuse as inheritance. Instead of inheriting behavior from a parent class, you compose your class with instances of other classes (the delegates) and forward certain operations to them[2][1]. This makes your code more modular and flexible.

## Language Support

Some languages have built-in support for delegation (e.g., Kotlin's `by` keyword), but in Python, delegation is typically implemented manually by calling methods on composed objects[2].

## Summary Table

| Concept      | Description                                                                 | Example in Python                |
|--------------|-----------------------------------------------------------------------------|----------------------------------|
| Delegation   | Object passes responsibility to another, often with context preservation    | `self._cards[position]` (loosely)|
| Forwarding   | Object passes request to another, delegate operates in its own context      | `self._cards[position]` (strict) |

## Key Takeaway

Delegation is a general programming pattern for code reuse and modularity, where one object relies on another to perform certain tasks. In Python, this is often seen when a class delegates an operation to an internal object, as in the `__getitem__` example from *Fluent Python*[1][2][3].

[1] https://dev.to/kalashin1/delegation-4h29
[2] https://handwiki.org/wiki/Delegation_pattern
[3] https://en.wikipedia.org/wiki/Delegation_(object-oriented_programming)
[4] https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/
[5] https://en.wikipedia.org/wiki/Delegation_(computing)
[6] https://www.cs.unc.edu/~dewan/comp401/current/Class%20Notes/22_Delegation_Notes.pdf
[7] https://developer.apple.com/library/archive/documentation/General/Conceptual/CocoaEncyclopedia/DelegatesandDataSources/DelegatesandDataSources.html
[8] https://langdev.stackexchange.com/questions/3586/are-delegates-still-relevant-in-modern-c
[9] https://www.reddit.com/r/Kotlin/comments/1928bzd/can_someone_explain_me_the_concept_of_delegations/
[10] https://perso.limsi.fr/pointal/python:delegator

How Special Methods Are Used

  • NOTE: built-ins that are variable sized under the hood have an ob_size attribute that holds the size of that collection. This makes it faster to call len(my_object) since it’s not really a function call, the interpreter just reads off the pointer.
    1. Emulating Numeric Types
    • it’s all about implementing the number-class related dunder methods, then anything can behave like a number
    1. String Representation __repr__
    • repr is different from string in the sense that it’s supposed to be a visual representation of the creation of that object. Therefore, it should be unambiguous, and if possible, match source code necessary to recreate the represented object
    • repr is not really for display purposes, that’s what str builtin is for
    • implement the special function repr first then str
    1. Boolean Value of a Custom Type
    • default: forward to bool() elif len()

      By default, instances of user-defined classes are considered truthy, unless either bool or len is implemented. Basically, bool(x) calls x._bool_() and uses the result. If bool is not implemented, Python tries to invoke x._len_(), and if that returns zero, bool returns False. Otherwise bool returns True.

    1. Collection API

    The collections api is new, and it unifies the 3 following interfaces: • Iterable to support for, unpacking, and other forms of iteration • Sized to support the len built-in function • Container to support the in operator

    There’s no need to inherit from these ABCs specifically, as long as the dunder methods are implemented then it’s considered as satisfying the ABC

    • Specialisations of Collection

      Three very important specializations of Collection are: • Sequence, formalizing the interface of built-ins like list and str • Mapping, implemented by dict, collections.defaultdict, etc. • Set, the interface of the set and frozenset built-in types

      I want to use the vocabulary here when describing what primitives I want to use.

    • python dicts are “ordered” in the sense that the insertion order is preserved

      • there’s nothing else we can do about the ordering property (e.g. manipulating[rearranging] the order and such)

Overview of Special Methods

  • there’s a bunch, the latest ones are more on the async support side, they will be covered throughout the book

Why len Is Not a Method

“Practicality beats purity”.

  • there’s no method call for len(x) when x is a CPython built-in because it’s a direct read of a C-struct
  • for custom objects, we can implement the dunder method __len__
  • it kinda looks like a functional style (since len is a fn) in a OOP-styled language. To reconcile this, we can think of abs and len as unary functions!

Chapter Summary

Further Reading

Python’s DataModel can be seen as a MetaObject Protocol

Metaobjects The Art of the Metaobject Protocol (AMOP) is my favorite computer book title. But I mention it because the term metaobject protocol is useful to think about the Python Data Model and similar features in other languages. The metaobject part refers to the objects that are the building blocks of the language itself. In this context, protocol is a synonym of interface. So a metaobject protocol is a fancy synonym for object model: an API for core language constructs.

Chapter 2. An Array of Sequences

What’s New in This Chapter

Overview of Built-in Sequences

  • two factors to group sequences by:
    1. by container (heterogeneous) / flat (homogeneous) sequences
      1. Container sequences: can be heterogeneous
        • holds references (“pointers”)
      2. Flat sequences: are homogeneous
        • holds values
    2. by mutability / immutability
  • things like generators can be seen in the context of sequences themselves “To fill up sequences of any type”
  • Mem Representation for Python Objects: have a header (with metadata) and value

    example of meta fields (using float as a reference):

    1. refcount
    2. type
    3. value

    Every Python object in memory has a header with metadata. The simplest Python object, a float, has a value field and two metadata fields: • ob_refcnt: the object’s reference count • ob_type: a pointer to the object’s type • ob_fval: a C double holding the value of the float On a 64-bit Python build, each of those fields takes 8 bytes. That’s why an array of floats is much more compact than a tuple of floats: the array is a single object holding the raw values of the floats, while the tuple consists of several objects—the tuple itself and each float object contained in it.

List Comprehensions and Generator Expressions

  • List Comprehensions and Readability

    • a loop has generic purpose, but a listcomp’s purpose is always singular: to build a list
    • we should stick to this purpose and not introduce abuse mechanisms like adding in side-effects from listcomp evaluations
    • List comprehensions build lists from sequences or any other iterable type by filtering and transforming items.
    • Scope: listcomps have a local scope, use walrus operator to expand the scope to its outer frame

      ``Local Scope Within Comprehensions and Generator Expressions''

      if that name is modified using global or nonlocal, then the scope is accordingly set

      defines the scope of the target of := as the enclos‐ ing function, unless there is a global or nonlocal declaration for that target.

  • Listcomps Versus map and filter
  • Cartesian Products

    This is the part where we have more than one iterable within the listcomp

  • Generator Expressions

Tuples Are Not Just Immutable Lists

The immutable list part is definitely one of the main features.

It should also be seen as a nameless record.

  • Tuples as Records

    • some examples of tuple unpacking:
      1. the loop constructs automatically support unpacking, we can assign vars even for each iteration of the loop
      2. the % formatting operator will also unpack values within the tuple when doing string formats
  • Tuples as Immutable Lists

    2 benefits:

    1. clarity: the length of tuple is fixed thanks to its immutability
    2. performance: memory use is a little better, also allows for some optimisations
    • Warning: the immutability is w.r.t references contained within the tuple, not values

      So tuples containing mutable items can be a source of bugs Also, unhashable tuple => can’t be inserted as a dict key or set

    • Tuple’s Performance Efficiency Reasons

      Tuples are more efficient because:

      1. bytecode: tuple has simpler bytecode required: Python compiler generates bytecode for a tuple constant in one operation; but for a list literal, the generated bytecode pushes each element as a separate constant to the data stack, and then builds the list.
      2. constructor: tuple construction from existing doesn’t need any copying, it’s the same reference:
        • the list constructor returns a copy of a given list if its list(l)
        • tuple constructor returns a reference to the same t if we do tuple(t) (because they’re immutable anyway so why not same reference)
      3. amortisation: tuple, since fixed size, doesn’t need to account for future size changes by amortising that operation
      4. no extra layer of indirection The references to the items in a tuple are stored in an array in the tuple struct,while a list holds a pointer to an array of references stored elsewhere. The indirection is necessary because when a list grows beyond the space currently allocated, Python needs to reallocate the array of references to make room. The extra indirection makes CPU caches less effective.
  • Comparing Tuple and List Methods

Unpacking Sequences and Iterables

  • safer extraction of elements from sequences
  • works with any iterable object as the datasource, including iterators.
    • for the iterable case, as long as the iterable yields exactly one item per variable in the receiving end (or * is used to do a glob capture)
  • Parallel assignment

    This is the multi-name assignments that we do, and how involves sequence unpacking

    most visible form of unpacking is parallel assignment; that is, assigning items from an iterable to a tuple of variables, as you can see in this example: >>> lax_coordinates = (33.9425, -118.408056) >>> latitude, longitude = lax_coordinates # unpacking >>> latitude

  • Using * to Grab Excess Items

    • classic case is the use of the grabbing part for varargs
    • context of parallel assignment, the * prefix can be applied to exactly one variable, but it can appear in any position
  • Unpacking with * in Function Calls and Sequence Literals

    • the use of the unpacking operator is context-dependent, so in the context of function calls and the creation of sequences, they can be used multiple times. In the context of parallel asisgnment, it’s a singular use (else there’s going to be ambiguity on how to partition values in the sequence)
  • Nested Unpacking
  • GOTCHA: single-item tuple syntax may have silent bugs if used improperly

    Both of these could be written with tuples, but don’t forget the syntax quirk that single-item tuples must be written with a trailing comma. So the first target would be (record,) and the second ((field,),). In both cases you get a silent bug if you forget a comma.

Pattern Matching with Sequences

Pattern matching is an example of declarative programming: the code describes “what” you want to match, instead of “how” to match it. The shape of the code follows the shape of the data, as Table 2-2 illustrates.

  • here’s the OG writeup for structural pattern matching. Some points from it:
    • Therefore, an important exception is that patterns don’t match iterators. Also, to prevent a common mistake, sequence patterns don’t match strings.
    • the matching primitives allow us to use guards on the match conditions (see here)
    • there’s support for defining sub-patterns like so:
      1
      
          case (Point(x1, y1), Point(x2, y2) as p2): ...
      
  • here’s a more comprehensive tutorial PEP 636 - Structural Pattern Matching
  • Pattern-matching is declarative

    Pattern matching is an example of declarative programming: the code describes “what” you want to match, instead of “how” to match it. The shape of the code fol‐ lows the shape of the data, as Table 2-2 illustrates.

  • python’s match goes beyond just being a switch statement because it supports destructuring similar to elixir

    • random thought: this features is really useful if we were to write out a toy interpreter for some source code. Here’s lis.py

    On the surface, match/case may look like the switch/case statement from the C lan‐ guage—but that’s only half the story.4 One key improvement of match over switch is destructuring—a more advanced form of unpacking. Destructuring is a new word in the Python vocabulary, but it is commonly used in the documentation of languages that support pattern matching—like Scala and Elixir. As a first example of destructuring, Example 2-10 shows part of Example 2-8 rewrit‐ ten with match/case.

  • class-patterns gift us the ability to do runtime type checks

    1
    
    case [str(name), _, _, (float(lat), float(lon))]:
    
    • the constructor-like syntax is not a constructor, it’s a runtime check

    • the names (name, lat, lon) are binded here and are available for referencing thereafter within the codeblock

    • this is really interesting, it’s in the context of patterns that the syntax does runtime type checking and the code does

    The expressions str(name) and float(lat) look like constructor calls, which we’d use to convert name and lat to str and float. But in the context of a pattern, that syntax performs a runtime type check: the preceding pattern will match a four-item sequence in which item 0 must be a str, and item 3 must be a pair of floats. Additionally, the str in item 0 will be bound to the name variable, and the floats in item 3 will be bound to lat and lon, respectively. So, although str(name) borrows the syntax of a constructor call, the semantics are completely different in the context of a pattern. Using arbitrary classes in patterns is covered in “Pattern Matching Class Instances” on page 192.

  • Pattern Matching Sequences in an Interpreter

    • it’s interesting how the python 2 code was described as “a fan of pattern matching” because it matches on the first element and then the tree of control flow paths does their job, so it’s really like a switch

    • this switch-like pattern-matching style is something abstract even more so than in it’s concrete programming language implementation that we have been discussing so far

    • the catch-all is used for error-handling purposes here. In general there should always be a fallthrough case instead of going for no-ops which will end up being more silent

Slicing

  • Why Slices and Ranges Exclude the Last Item

    this refers to the fact that one end of the range is closed (inclusive) and the other is open (exclusive).

    • easy to calculate lengths
    • easy to split / partition without creating overlaps
  • Slice Objects

    • useful to know this because it lets you assign names to slices, like spreadsheets allow the naming of cell-ranges
  • Multidimensional Slicing and Ellipsis

    This is more useful in the context of numpy lib, the book doens’t include examples here for the python stdlib

    • built-ins are single dim, except for memoryview Except for memoryview, the built-in sequence types in Python are one-dimensional, so they support only one index or slice, and not a tuple of them.
    • Multiple indexes or slices get passed in as tuples a[i,j] is evaluated as a.__getitem__((i,j)) e.g. numpy multi-dim array accesses
    • ellipsis class is a singleton, the sole object being Elipsis
      • a similar case is bool class and True, False
    • so in numpy, if x is a four-dimensional array, x[i, ...] is a shortcut for x[i, :, :, :,]
  • Assigning to Slices

    Applies to mutable sequences.

    • Gotcha: when LHS of assignment is slice, the RHS must be iterable

      In the example below, we’re trying to graft some sequence to another. With that intent, we can only graft an iterable onto another sequence, not a single element. Hence, the requirement that the RHS must be iterable.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      
      l = list(range(10))
      try:
              # so this is wrong:
              l[2:5] = 100
      except:
              print("this will throw an error, we aren't passing in an iterable for the grafting.")
      
      finally:
              # and this is right
              l[2:5] = [100]
      
      print(l)
      

Using + and * with Sequences

  • both + and - create new objects without modding their operands
  • Building Lists of Lists

    • Gotcha: Pitfall of references to mutable objects – using a * n where a contains sequence of mutable items can be problematic

      Actually applies to other mutable sequences as well, in this case it’s just a list that we’re using

      Just be careful what the contained element’s properties are like.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
      
      my_mutable_elem = ['apple', 'banana']
      print(f"my mutable elem ref: {id(my_mutable_elem)}")
      
      list_of_lists = [ my_mutable_elem ] * 2
      print(f"This creates 2 repeats \n{list_of_lists}")
      print(f"(first ref, second ref) = {[id(elem) for elem in list_of_lists]}")
      
      list_of_lists[0][0] = 'strawberry'
      print(f"This mods all 2 repeated refs \n{list_of_lists}")
      print(f"(first ref, second ref) = {[id(elem) for elem in list_of_lists]}")
      

      Here’s the same gotcha using tic-tac-toe as an example:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
      good_board = [['_'] * 3 for i in range(3)]
      
      bad_board = [['_'] * 3] * 3
      
      print(f"BEFORE, the boards look like this:\n\
      \tGOOD Board:\n\
      \t{ [row for row in good_board] }\n\
      \tBAD Board:\n\
      \t{ [row for row in bad_board] }\n")
      
      # now we make a mark on the boards:
      good_board[1][2] = 'X'
      bad_board[1][2] = 'X'
      
      
      print(f"AFTER, the boards look like this:\n\
      \tGOOD Board:\n\
      \t{ [row for row in good_board] }\n\
      \tBAD Board:\n\
      \t{ [row for row in bad_board] }\n")
      
  • Augmented Assignment with Sequences

    This refers to the in-place versions of the sequence operators. in ==, there are 2 cases:

    • Case A: Identity of a changes

      • the dunder method __iadd__ was not available for use
      • so a + b had to be evaluated and stored as a new id
      • and that id was then referenced by a as part of the new assignment
    • Case B: Identity of a does not change

      • this would mean that a is actually mutated in-place
      • it would have used the dunder method __iadd__

    In other words, the identity of the object bound to a may or may not change, depending on the availability of __iadd__.

    In general, for mutable sequences, it is a good bet that __iadd__ is implemented and that += happens

    • doing += for repeated concats of immutable sequences is inefficient

      however, str contacts have been optimised in CPython, it’s alright to do that in CPython. Extra space would have been allocated to amortise the new space allocations.

  • A += Assignment Puzzler

    • Learnings!

      I take three lessons from this:

      • Avoid putting mutable items in tuples.

      • Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.

      • Inspecting Python bytecode is not too difficult, and can be helpful to see what is going on under the hood.

    • Example

      it’s a peculiarity in the += operator!

      Learnings:

      I take three lessons from this:

      • Avoid putting mutable items in tuples.

      • Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.

      • Inspecting Python bytecode is not too difficult, and can be helpful to see what is going on under the hood.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      
      t = (1,2, [30, 40])
      print(t)
      try:
              t[2] += [50, 60]
      except:
              print("LMAO complaints")
      finally:
              print(t)
      
      try:
              t[2].extend([90, 100])
      
      except:
              print("this won't error out though")
      
      finally:
              print(t)
      

list.sort Versus the sorted Built-In

  • in-place functions should return None as a convention

    There’s a drawback to this: we can’t cascade calls to this method

  • python’s sorting uses timsort!

Managing Ordered Sequences with bisect (extra ref from textbook)

When a List Is Not the Answer

  • Arrays: best for containing numbers

    • an array of float values does not hold full-fledged float instances, but only the packed bytes representing their machine values—similar to an array of double in the C language.
    • examples:
      1. typecode b => byte => 8 bits over signed and unsigned regions ==> [-128, 127] range of representation
    • for special cases of numeric arrays for bin data (e.g. raster images), bytes and bytearray types are more appropriate!
  • Memory Views

    • Examples

      • id vs context

        The learning from this is that the memoryview objects and the memory that they provide a view of are two different regions of memory. id vs context.

        So here, m2, m3 and all have different id references, but the memory region that they give a view of is all the same.

        That’s why we can mutate using one memory view and every other view also reflects that change.

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        
        from array import array
        
        # just some bytes, the sequence is buffer-protocol-adherent
        octets = array("B", range(6))
        print(octets)
        
        # builds a new memoryview from the array
        m1 = memoryview(octets)
        print(m1)
        # exporting of a memory view to a list, this creates a new list (a copy!)
        print(m1.tolist())
        
        # builds a new memoryview, with 2 rows and 3 columns
        m2 = m1.cast('B', [2,3])
        print(m2)
        print(m2.tolist())
        
        
        m3 = m1.cast('B', [3,2])
        print(m3)
        print(m3.tolist())
        
        # overwrite byte
        m2[1,1] = 22
        
        # overwrite byte
        m3[1,1] = 33
        
        print(f"original memory has been changed: \n\t{octets} ")
        print(f"m1 has been changed:\n\t { m1.tolist() }")
        print(f"m2 has been changed:\n\t { m2.tolist() }")
        print(f"m3 has been changed:\n\t {m3.tolist()}")
        
      • corruption

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        
        from array import array
        from sys import byteorder
        
        print(byteorder)
        numbers = array('h', [-2,-1,0,1,2])
        memv = memoryview(numbers)
        
        print(len(memv))
        
        print(memv[0])
        
        # cast the half as a byte, so the resultant sequence will have double the elements:
        memv_oct = memv.cast('B')
        # the numbers are stored in little endian format
        print(memv_oct.tolist())
        # so -2 as a 2-byte signed short will be (little endian binary) 0xfe 0xff (254, 255)
        
        # so we get:
        # -2: 0xfe 0xff (254, 255)
        # -1: 0xff 0xff (255, 255)
        # 0: 0x00 0x00 (0, 0)
        # 1: 0x01 0x00 (1, 0)
        # 2: 0x02 0x00 (2, 0)
        
        
        # asisgns the value of 4 to byte-offset 5
        memv_oct[5] = 4
        print( numbers )
        # so this change is to the 2nd byte of the third element of numbers
        # byte index 5 is the high byte (since it's little endian so bytes are low -> high)
        # so the 3rd element is now [0, 0x0400]
        # = a + (b*256) = 0 + (4 * 256) is 1024 in decimal
        
        # NOTE: Note the change to numbers: a 4 in the most significant byte of a 2-byte unsigned
        # integer is 1024.
        
    • Extra: “Parsing binary records with struct”

      Here’s the reference.

      Some takeaways:

      • Proprietary binary records in the real world are brittle and can be corrupted easily. examples:
        1. string parsing: paddings, null terminated, size limits?
        2. endianness problem: what byteorder was used for representing integers and floats (CPU-architecture-dependent)?
      • always explore pre-built solutions first instead of building yourself:
        • for data exchange, pickle module works great, but have to ensure python versions align since the default binary formats may be different. Reading a pickle also may run arbitrary code.
      • if the binary exchange uses multiple programming languages, standardise the serialisation. Serial forms:
        1. multi-platform binary serialisation formats:
          1. MessagePack
          2. ProtocolBuffers
        2. JSON
    • bot assisted concept mapping

      Here’s a bot-assisted concept map between unix mmap and memoryviews:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      
      Memory mapping a file is a powerful technique that allows access to file data as if it were in memory, and the concepts connect naturally between the Unix world (via `mmap` system calls) and Python (via the `mmap` module and `memoryview` objects).
      
      **Unix World: mmap**
      
      - **Definition:** The Unix `mmap` system call maps files or devices into a process's address space, enabling file I/O by reading and writing memory. This is efficient for large files because data is loaded on demand, and multiple processes can share the same mapped region[1].
      - **Usage:** After opening a file, `mmap` associates a region of virtual memory with the file. Reading and writing to this memory behaves as if you were reading and writing to the file itself. The system manages when data is actually read from or written to disk, often using demand paging[1].
      - **Types:** Both file-backed (mapping a file) and anonymous (not backed by a file, similar to dynamic allocation) mappings are supported. Shared mappings allow interprocess communication, while private mappings isolate changes to one process[1].
      
      **Python World: mmap Module**
      
      - **Definition:** Python’s `mmap` module provides a high-level interface to memory-mapped files, closely mirroring the Unix `mmap` functionality. You must provide a file descriptor (from `fileno()` or `os.open()`) and specify the mapping size and access mode[2][3].
      - **Usage:** Memory-mapped file objects behave like both file objects and mutable byte arrays, allowing random access and slicing. You can read and write data by indexing or slicing, and you can seek through the file as if it were a standard file object[2][3].
      - **Access Modes:** You can specify read-only, write-through (changes go directly to the file), or copy-on-write (changes are local)[4][5].
      
      **Python World: memoryview**
      
      - **Definition:** The `memoryview` object provides a safe way to access the internal data of an object that supports the buffer protocol, such as `bytes`, `bytearray`, or a memory-mapped file object[6].
      - **Usage:** With a `memoryview`, you can manipulate large data structures without copying them, which is especially useful for memory-mapped files. It allows you to pass a view of the data to functions or slice it efficiently, avoiding unnecessary memory duplication[6].
      - **Connection to mmap:** When you create a memory-mapped file in Python, you can use `memoryview` to work with the data as a buffer, enabling efficient manipulation and sharing of large data sets.
      
      **Linking the Concepts**
      
      - **Memory Mapping:** Both Unix `mmap` and Python’s `mmap` module allow you to treat file data as if it were in memory, enabling efficient random access and sharing between processes[1][2][3].
      - **Efficient Data Access:** Using `memoryview` in Python is analogous to working directly with the mapped memory region in Unix, as both avoid copying large chunks of data and allow efficient manipulation of file contents[6].
      - **Interprocess Communication:** In Unix, shared memory mappings (`MAP_SHARED`) allow processes to communicate by reading and writing the same memory region. In Python, you can achieve similar effects by sharing a memory-mapped file object between processes[1][2].
      - **Performance:** Both approaches leverage the operating system’s memory management to reduce I/O overhead and enable fast, random access to file data.
      
      **Summary Table**
      
      | Concept                | Unix (`mmap`)                | Python (`mmap` module)           | Python (`memoryview`)         |
      |------------------------|------------------------------|-----------------------------------|-------------------------------|
      | Purpose                | Map files to memory          | Map files to memory               | View memory as buffer         |
      | Access Method          | System call                  | Module/object                     | Object                        |
      | Sharing                | Shared/private mappings      | Shared via file object            | View of existing buffer       |
      | Efficiency             | Demand paging, no copy       | Demand paging, no copy            | No copy, efficient slicing    |
      | Use Case               | IPC, efficient file I/O      | Efficient file I/O, IPC           | Efficient data manipulation   |
      
      By understanding these connections, you can leverage memory mapping for efficient file handling and data sharing across both Unix and Python environments.
      
      [1] https://en.wikipedia.org/wiki/Mmap
      [2] https://docs.python.org/3/library/mmap.html
      [3] https://github.com/python/cpython/blob/master/Doc/library/mmap.rst
      [4] https://pymotw.com/3/mmap/
      [5] https://realpython.com/python-mmap/
      [6] https://smart-spatial.com/data%20science/2017/09/22/MemoryView/
      [7] https://stackoverflow.com/questions/63553692/how-to-use-memory-mapped-file-in-python-linux/63554607
      [8] https://pymotw.com/3/mmap/index.html
      [9] https://unix.stackexchange.com/questions/712651/does-mmap-allow-creating-a-mapping-that-is-much-larger-than-the-amount-of-physic
      [10] https://deepaksood619.github.io/computer-science/operating-system/memory-mapping-mmap/
      [11] https://www.cmi.ac.in/~madhavan/courses/prog2-2012/docs/python-3.2.1-docs-html/library/mmap.html
      [12] https://www.blopig.com/blog/2024/08/memory-mapped-files-for-efficient-data-processing/
      [13] https://stackoverflow.com/questions/4991533/sharing-memory-between-processes-through-the-use-of-mmap/4991631
      [14] https://documentation.help/Python-2.4/module-mmap.html
      [15] https://docs.python.org/3.4/library/mmap.html?highlight=mmap
      [16] https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-mmap-map-pages-memory
      [17] https://man7.org/linux/man-pages/man2/mmap.2.html
      [18] https://programmingappliedai.substack.com/p/what-is-mmap-in-linux-and-how-it
      [19] https://w3.cs.jmu.edu/kirkpams/OpenCSF/Books/csf/html/MMap.html
      [20] https://www.ibm.com/docs/ssw_ibm_i_74/apis/mmap.htm
      [21] https://www.unix.com/man-page/linux/3/mmap/
      [22] https://stackoverflow.com/questions/47437481/python-can-i-get-a-memoryview-or-a-bytearray-mapping-to-a-mmap
      [23] https://docs.vultr.com/python/built-in/memoryview
      [24] https://pymotw.com/2/mmap/
      [25] https://www.youtube.com/watch?v=ky1n6luzL3Y
      
    • “casting” in memoryview vs Elixir ecto “casting”

      The mental models are different.

      “casting” with memoryview is about interpreting memory as a different type, not about automatic type conversion or validation as in Elixir changesets. The term is used more loosely in Python, while in Elixir, it is a formal, declarative operation for data preparation.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      
      The use of the term **"cast"** in the context of Python `memoryview` is not the same as in Elixir changesets, even though both involve types or transformations.
      
      **Python `memoryview` and Casting**
      
      - **Casting in Python `memoryview`:**
        When people refer to "casting" with `memoryview`, they usually mean taking a slice of a memoryview or viewing the underlying buffer as a different type (sometimes using modules like `ctypes` or `numpy` to achieve type reinterpretation). However, the built-in `memoryview` object itself does not perform type conversion; it provides a view into the memory buffer of an object that supports the buffer protocol, such as `bytes`, `bytearray`, or a NumPy array[5][6][1].
      - **Type Safety:**
        The `memoryview` object allows you to manipulate the data as if it were a sequence of bytes or another type (if the underlying buffer supports it), but it does not automatically convert the data to a different type. If you want to reinterpret the data as a different type (e.g., casting bytes to an integer array), you typically use additional libraries or manual conversion, not `memoryview` itself[5][1].
      - **No Implicit Type Conversion:**
        The `memoryview` simply provides a window into the existing data, preserving the memory layout and avoiding data copying[6][8]. Any "casting" is more about how you interpret the bytes, not a formal declaration or transformation of types.
      
      **Elixir Changesets and Casting**
      
      - **Casting in Elixir Changesets:**
        In Elixir's Ecto changesets, "casting" refers to the explicit declaration of how data (often from a web form or external source) should be converted or validated into a specific type before being inserted into a database. This is a declarative process that ensures data integrity and type safety.
      - **Type Conversion:**
        Casting in Elixir changesets involves actual type conversion (e.g., from a string to an integer or date), and is a core part of the data validation and preparation process.
      
      **Comparison Table**
      
      | Context                | Meaning of "Cast"                        | Type Conversion | Purpose                        |
      |------------------------|------------------------------------------|-----------------|--------------------------------|
      | Python `memoryview`    | Viewing/reinterpreting memory as a type  | No (manual only)| Efficient data access/manipulation |
      | Elixir changesets      | Declaring/validating/converting types    | Yes             | Data validation, preparation   |
      
      **Summary**
      
      In Python, "casting" with `memoryview` is about interpreting memory as a different type, not about automatic type conversion or validation as in Elixir changesets. The term is used more loosely in Python, while in Elixir, it is a formal, declarative operation for data preparation[5][6][1].
      
      [1] https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
      [2] https://stackoverflow.com/questions/71028367/how-to-make-a-custom-python-class-to-support-memoryview-conversion
      [3] https://stackoverflow.com/questions/48136122/how-do-i-cast-a-slice-of-memoryview-into-c-string-unsigned-char
      [4] https://docs.python.org/3/library/stdtypes.html
      [5] https://blog.finxter.com/converting-python-bytes-to-memoryview/
      [6] https://realpython.com/ref/builtin-functions/memoryview/
      [7] https://www.stratascratch.com/blog/understanding-data-types-in-python-with-examples/
      [8] https://docs.python.org/3/c-api/memoryview.html
      
  • NumPy

    • python’s Global Interpreter Lock (GIL) and how releasing it unlocks better parallelisation

      • it’s a mutex that protects access to python objects and therefore prevents multiple native threads from executing Python bytecode simultaneously within the same process.

        It was intended to be a simplification mechanism to make memory handling simpler but that also means having this mutex limits parallelism.

        Typical workarounds:

        1. multi-processing, separate processes, each with their own GIL

        2. offload CPU-intensive work to C-extensions or libs that release the GIL

      here’s a bot-written outline on it:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      
      The **Global Interpreter Lock (GIL)** is a core mechanism in CPython, the reference implementation of Python, that ensures only one thread executes Python bytecode at a time, even on multi-core processors[2][4][5]. Here’s a detailed overview:
      
      ## **What Is the GIL?**
      
      - **Definition:** The GIL is a mutex (mutual exclusion lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode simultaneously within the same process[2][4][7].
      - **Purpose:** It exists primarily to simplify CPython’s memory management, especially reference counting, which is not thread-safe by default. Without the GIL, concurrent access to Python objects could lead to race conditions and memory corruption[5][7].
      
      ## **How Does the GIL Work?**
      
      - **Single Thread Execution:** Only one thread holds the GIL at any moment, meaning only one thread can execute Python code at a time, even if you have multiple threads running[2][4][6].
      - **Thread Switching:** The interpreter periodically releases the GIL, allowing other threads to acquire it and execute Python code. This switching happens frequently, but it means that CPU-bound multithreaded Python programs do not benefit from multiple cores for parallel execution of Python code[2][4].
      - **Non-Python Code:** Operations that do not require the Python interpreter (such as I/O or some C extensions like NumPy) can release the GIL, allowing other threads to run Python code or the process to use multiple cores for those operations[2][4].
      
      ## **Why Does the GIL Exist?**
      
      - **Memory Management:** Simplifies reference counting and garbage collection by ensuring thread safety for Python objects[5][7].
      - **C Extensions:** Makes it easier to write and use C extensions by providing a stable, single-threaded environment for their execution[1][3][7].
      - **Implementation Simplicity:** Using a single lock is easier to implement and maintain than fine-grained locking for all Python objects[1][7].
      
      ## **Implications of the GIL**
      
      - **Limited Parallelism:** The GIL prevents true parallel execution of Python code in multi-threaded programs, making it a bottleneck for CPU-bound tasks[2][4][5].
      - **Workarounds:** For parallelism, Python developers often use multiprocessing (which uses separate processes, each with its own GIL) or offload CPU-intensive work to C extensions or libraries that release the GIL[1][4].
      - **Performance Impact:** The GIL can degrade performance in multi-threaded, CPU-bound applications. However, for I/O-bound or single-threaded programs, its impact is minimal[2][4][6].
      
      ## **Future of the GIL**
      
      - **Potential Removal:** The Python Steering Council has indicated support for PEP 703, which proposes making a version of CPython without the GIL. This could enable true multi-threaded parallelism in Python in the future[3].
      - **Challenges:** Removing the GIL is complex due to backward compatibility and the reliance of many extensions on its guarantees[3][2].
      
      ## **Summary Table**
      
      | Feature                | Description                                                                 |
      |------------------------|-----------------------------------------------------------------------------|
      | Purpose                | Protect Python objects, simplify memory management, enable C extensions     |
      | Execution Model        | Only one thread executes Python bytecode at a time                          |
      | Impact on Parallelism  | Limits CPU-bound parallelism in multi-threaded Python code                  |
      | Workarounds            | Multiprocessing, C extensions, I/O-bound operations                         |
      | Future                 | Potential removal via PEP 703, but challenges remain                        |
      
      The GIL is a key part of Python’s design, balancing simplicity and safety with some limitations for parallel execution[2][4][5].
      
      [1] https://en.wikipedia.org/wiki/Global_interpreter_lock
      [2] https://wiki.python.org/moin/GlobalInterpreterLock
      [3] https://developer.vonage.com/en/blog/removing-pythons-gil-its-happening
      [4] https://realpython.com/python-gil/
      [5] https://dev.to/adityabhuyan/understanding-pythons-global-interpreter-lock-gil-and-its-impact-on-concurrency-2da6
      [6] https://realpython.com/videos/global-interpreter-lock-overview/
      [7] https://dev.to/ohdylan/understanding-pythons-global-interpreter-lock-gil-mechanism-benefits-and-limitations-4aha
      [8] https://www.pubnub.com/blog/understanding-pythons-global-interpreter-lock/
      

      NumPy and SciPy are formidable libraries, and are the foundation of other awesome tools such as the Pandas—which implements efficient array types that can hold non‐ numeric data and provides import/export functions for many different formats, like .csv, .xls, SQL dumps, HDF5, etc.—and scikit-learn, currently the most widely used Machine Learning toolset. Most NumPy and SciPy functions are implemented in C or C++, and can leverage all CPU cores because they release Python’s GIL (Global Interpreter Lock). The Dask project supports parallelizing NumPy, Pandas, and scikit-learn processing across clusters of machines. These packages deserve entire books about them.

  • Deques and Other Queues

    • issues with list methods

      although we can use list as a stack / queue (by using .append() or .pop()). However, inserting and removing from the head of the list (the 0-idx end) is costly because the entire list must be shifted in memory => this is why just re-purposing lists is not a good idea.

    • Characteristics:

      1. when bounded, every mutation will adhere to the deque capacity for sure.
      2. hidden cost is that removing items from the middle of a deque is not fast
      3. append and popleft are atomic, so can be used for multi-threaded applications without needing locks:w
    • alternative queues in stdlib

      • asyncio provides async-programming focused queues

Chapter Summary

Further Reading

  • “numpy is all about vectorisation” oprations on array elements without explicit loops
  • More on Flat vs Container Sequences

    ``Flat Versus Container Sequences''

Chapter 3. Dictionaries and Sets

What’s New in This Chapter

Extra: Internals of sets and dicts internalsextra

This info is found in the fluentpython website. It considers the strengths and limitations of container types (dict, set) and how it’s linked to the use of hash tables.

  • Running performance tests

    • the trial example of needle in haystack has beautiful ways of writing it

      1
      2
      3
      4
      
        found = 0
        for n in needles:
            if n in haystack:
                found += 1
      

      when using sets, because it’s directly related to set theory, we can use a one-liner to count the needles that occur in the haystack by doing an intersection:

      1
      
        found = len(needles & haystack)
      
    • This intersection approach is the fastest from the test that the textbook runs.

    • the worst times are if we use the list datastructure for the haystack

    • If your program does any kind of I/O, the lookup time for keys in dicts or sets is negligible, regardless of the dict or set size (as long as it fits in RAM).

  • Hashes & Equality

    • the usual uniform random distribution assumption as the goal to reach for hashing functions, just described in a different way: to be effective as hash table indexes, hash codes should scatter around the index space as much as possible. This means that, ideally, objects that are similar but not equal should have hash codes that differ widely.

    • A hashcode for an object usually has less info than the object that the hashcode is for.

      • 64-bit CPython hashcodes is a 64-bit number => \(2^{64}\) possible values
      • consider an ascii string of 10 characters (and that there are 100 possible values in ascii) => \(100^{10}\) which is bigger than the possible values for the hashcode.

      By the way it’s actually salted, there’s some nuances on how the salt is derived but it should be such that each shell has a particular salt.

    • The modern hash function is the siphash implementation

  • Hash Implementation

    • each row in the table is traditionally a “bucket”. In the case of sets, it’s just a single item that the bucket will hold
    • For 64-bit CPython,
      • It’s a 64-bit hash code that points to a 64 bit pointer to the element value
      • so the table doesn’t need to keep track of indices, offsets work fine since they are fixed-width.
    • Also it keeps 1/3 extra space that gets doubled when encroached so there’s some amortisation happening there also.
  • Hash Table Algo for sets

    • in the flowchart, notice that the first step includes the modulo operation, this is the reason why the insertion order is not preserved since the output of running the modulo on the hashvalues will not be in order, it will spread about.

    • on hash collisions, the probing can be done in various ways. CPython uses linear probing but also mitigates the harms of using linear probing: Incrementing the index after a collision is called linear probing. This can lead to clusters of occupied buckets, which can degrade the hash table performance, so CPython counts the number of linear probes and after a certain threshold, applies a pseudo random number generator to obtain a different index from other bits of the hash code. This optimization is particularly important in large sets.

    • the last step is to actually do an equality check on the value. this is why for something to be hashable, two dunder functions must be implemented: __hash__ and __eq__

  • Hash table usage for dicts

    Dictionary implementation benefits from 2 memory optimisations. Here’s a summary of it:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
    Here’s a summary of the **two major memory optimizations** for modern Python dictionaries, as described in the referenced Fluent Python article:
    
    1. **Key-Sharing Dictionaries (PEP 412)**
       - Introduced in Python 3.3, this optimization allows multiple dictionaries that share the same set of keys (such as instance `__dict__` for objects of the same class) to share a single "keys table."
       - Only the values are stored separately for each dictionary; the mapping from keys to indices is shared.
       - This greatly reduces memory usage for objects of the same type, especially when many objects have the same attributes[1].
    
    2. **Compact Dictionaries**
       - Modern Python dictionaries use a split-table design, separating the storage of keys and values from the hash table itself.
       - The hash table stores indices into a compact array of keys and values, rather than storing the full key-value pairs directly in the hash table.
       - This reduces memory overhead, improves cache locality, and keeps insertion order predictable and efficient[1].
    
    **In summary:**
    - **Key-sharing dictionaries** save memory by sharing the key structure among similar dicts.
    - **Compact dicts** store keys and values in separate, dense arrays, minimizing wasted space and improving performance.
    
    [1] https://www.fluentpython.com/extra/internals-of-sets-and-dicts/
    [2] https://www.geeksforgeeks.org/python/minimizing-dictionary-memory-usage-in-python/
    [3] https://python.plainenglish.io/optimizing-python-dictionaries-a-comprehensive-guide-f9b04063467a
    [4] https://stackoverflow.com/questions/10264874/python-reducing-memory-usage-of-dictionary
    [5] https://labex.io/tutorials/python-how-to-understand-python-dict-memory-scaling-450842
    [6] https://www.youtube.com/watch?v=aJpk5miPaA8
    [7] https://www.reddit.com/r/pythontips/comments/149qlts/some_quick_and_useful_python_memory_optimization/
    [8] https://www.tutorialspoint.com/How-to-optimize-Python-dictionary-access-code
    [9] https://labex.io/tutorials/python-how-to-understand-python-dictionary-sizing-435511
    [10] https://www.joeltok.com/posts/2021-06-memory-dataframes-vs-json-like/
    [11] https://www.linkedin.com/advice/0/what-strategies-can-you-use-optimize-python-dictionaries-fqcuf
    
    • Original implementation

      • there’s 3 fields to keep, 64 bits each
      • first two fields play the same role as they do in the implementation of sets. To find a key, Python computes the hash code of the key, derives an index from the key, then probes the hash table to find a bucket with a matching hash code and a matching key object. The third field provides the main feature of a dict: mapping a key to an arbitrary value
    • Optimisation 1: Compact implementation

      • there’s an indices table extra that has a smaller width (hence compact)
      • Raymond Hettinger observed that significant savings could be made if the hash code and pointers to key and value were held in an entries array with no empty rows, and the actual hash table were a sparse array with much smaller buckets holding indexes into the entries array
    • Optimisation 2: Key-Sharing Dictionary ⭐️

      The combined-table layout is still the default when you create a dict using literal syntax or call dict(). A split-table dictionary is created to fill the __dict__ special attribute of an instance, when it is the first instance of a class. The keys table is then cached in the class object. This leverages the fact that most Object Oriented Python code assigns all instance attributes in the __init__ method. That first instance (and all instances after it) will hold only its own value array. If an instance gets a new attribute not found in the shared keys table, then this instance’s __dict__ is converted to combined-table form. However, if this instance is the only one in its class, the __dict__ is converted back to split-table, since it is assumed that further instances will have the same set of attributes and key sharing will be useful.

  • Practical Consequences

    • of how sets work

      1. need to implement the __hash__ and __eq__ functions
      2. efficient membership testing, the possible overheads is the small number of probing the might need to be done to find a matching element or an empty bucket
      3. Memory overhead:
        • an array of pointers is the most compact, sets have significant memory overhead. hash table adds a hash code per entry, and at least ⅓ of empty buckets to minimize collisions
      4. Insertion order is somewhat preserved but it’s not reliable.
      5. Adding elements to a set may change the order of other elements. That’s because, as the hash table is filled, Python may need to recreate it to keep at least ⅓ of the buckets empty. When this happens, elements are reinserted and different collisions may occur.
    • of how dicts work

      1. need to implement both the dunder methods __hash__ and __eq__
      2. key search almost as fast as element searches in sets
      3. Item ordering preserved in the entries table
      4. To save memory, avoid creating instance attributes outside of the init method. If all instance attributes are created in init, the dict of your instances will use the split-table layout, sharing the same indices and key entries array stored with the class.

Modern dict Syntax

  • dict Comprehensions
  • Unpacking Mappings

    • we can use the unpacking operator ** when keys aer all strings
    • if there’s any duplicates in the keys then the later entries will overwrite the earlier ones
  • Merging Mappings with | (the union operator)

    • there’s an inplace merge |= and there’s a normal merge that creates a new mapping |
    • it’s supposed to look like the union operator and you’re doing an union on two mappings

Syntax & Structure: Pattern Matching with Mappings cool

  • this will work with anything that is a subclass or virtual subclass of Mapping

  • we can use the usual tools for this:

    1. can use partial matching

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
           data = {"a": 1, "b": 2, "c": 3}
      
           match data:
               case {"a": 1}:
                   print("Matched 'a' only")
               case {"a": 1, "b": 2}:
                   print("Matched 'a' and 'b'")
               case _:
                   print("No match")
      
           # in this case, the order of the cases matter, the first match is evaluated
      
    2. can capture keys using the **rest syntax

      1
      2
      3
      
           match data:
               case {"a": 1, **rest}:
                   print(f"Matched 'a', rest: {rest}")
      
    3. can be arbitrarily deeply nested

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      
           data = {
               "user": {
                   "id": 42,
                   "profile": {
                       "name": "Alice",
                       "address": {"city": "Wonderland"}
                   }
               }
           }
      
           match data:
               case {
                   "user": {
                       "profile": {
                           "address": {"city": city_name}
                       }
                   }
               }:
                   print(f"City is {city_name}")
               case _:
                   print("No match")
      
  • Keys in the pattern must be literals (not variables), but values can be any valid pattern, including captures, literals, or even further nested patterns

  • Pattern matching works with any mapping type (not just dict), as long as it implements the mapping protocol

  • Guards (if clauses) can be used to add extra conditions to a match.
  • More on virtual sub-classes (and how it’s similar to mixins)

    • should be used when we can’t control the class (e.g. it’s an external module) but we want to adapt it

    • allows the indication that a class conforms to the interface of another – to adapt to multiple interfaces

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    
    A **virtual subclass** in Python refers to a class that is recognized as a subclass of an abstract base class (ABC) without actually inheriting from it in the traditional sense. This mechanism is provided by the `abc` module and is achieved by *registering* a class as a virtual subclass of an ABC using the `register()` method[4][5][8].
    
    ### Core Mental Model
    
    - **Traditional subclassing**: A class (the subclass) inherits from another (the superclass), forming a direct relationship. Methods and attributes are inherited, and `issubclass()` and `isinstance()` reflect this relationship[3].
    - **Virtual subclassing**: A class is *declared* to be a subclass of an ABC at runtime, without modifying its inheritance tree or MRO (Method Resolution Order). This is done by calling `ABC.register(SomeClass)`. After registration, `issubclass(SomeClass, ABC)` and `isinstance(instance, ABC)` will return `True`, but `SomeClass` does not actually inherit from `ABC`[4][5][8].
    
    ### Why Use Virtual Subclasses?
    
    - **Third-party integration**: If you want to treat classes from external libraries as conforming to your interface (ABC), but you cannot or do not want to modify their source code to inherit from your ABC, you can register them as virtual subclasses[1][8].
    - **Interface compliance**: Virtual subclassing is a way to declare that a class “conforms to” an interface, even if it doesn’t inherit from it, as long as it implements the required methods (i.e., it follows the protocol)[2][5].
    - **Decoupling**: It allows you to decouple interface definition (the ABC) from implementation, enabling more flexible and extensible designs.
    
    ### Example
    
    Suppose you have an ABC and an external class:
    
    ```python
    from abc import ABC
    
    class Car(ABC):
        def drive(self):
            pass
    
    class Tesla:
        def drive(self):
            print("Driving in Tesla")
    ```
    
    You want to use `isinstance(obj, Car)` to check if an object can be driven, but `Tesla` does not inherit from `Car`. You can register it:
    
    ```python
    Car.register(Tesla)
    
    print(issubclass(Tesla, Car))  # True
    print(isinstance(Tesla(), Car))  # True
    ```
    
    Now, `Tesla` is a *virtual subclass* of `Car`, even though it doesn't inherit from it[4][5][8].
    
    ### Key Properties
    
    - **No inheritance**: Virtual subclasses do not inherit methods or properties from the ABC. Registration only affects `issubclass()` and `isinstance()` checks[4][8].
    - **No MRO change**: The ABC does not appear in the virtual subclass’s MRO, so `super()` calls and method resolution are unaffected[4].
    - **Runtime declaration**: Registration can be done at runtime, providing flexibility for dynamic systems[4][5].
    
    ### Relationship to Other Python Typing Models
    
    | Model                 | Relationship Mechanism         | Example Use Case                                  |
    |-----------------------|-------------------------------|---------------------------------------------------|
    | Duck typing           | Implements required interface | Any object with `drive()` method can be used      |
    | Virtual subclass      | Registered with ABC           | External class made compatible with ABC interface |
    | Classical inheritance | Inherits from superclass      | Subclass extends or customizes base functionality |
    
    Virtual subclassing sits between **duck typing** (no explicit relationship, just interface compliance) and **classical inheritance** (explicit parent-child relationship), allowing you to *declare* interface compliance without direct inheritance[1][6].
    
    ### Reference Implementation
    
    The mechanism is built into the `abc.ABCMeta` metaclass, which provides the `register()` method for this purpose[4][5].
    
    > "You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as 'virtual subclasses' – these and their descendants will be considered subclasses of the registering ABC by the built-in function, but the registering ABC won’t show up in their MRO nor will method implementations defined by the registering ABC be callable..."[4]
    
    ### Practical Implications
    
    - **Type checks**: Enables more expressive and flexible type checks in frameworks and libraries.
    - **Backward compatibility**: Allows legacy or third-party classes to be integrated into new type hierarchies without code changes.
    - **Protocol enforcement**: Encourages protocol-oriented design in a language that is otherwise dynamically typed.
    
    In summary, a **virtual subclass** in Python is a class that, through explicit registration, is treated as a subclass of an abstract base class for type-checking purposes, without requiring direct inheritance or MRO modification[4][5][8]. This is a powerful tool for interface-oriented design and decoupled architectures.
    
    [1] https://stackoverflow.com/questions/51666120/whats-the-usage-of-a-virtual-subclass
    [2] https://www.reddit.com/r/learnpython/comments/9fgdib/virtual_subclasses/
    [3] https://realpython.com/ref/glossary/subclass/
    [4] https://docs.python.org/3/library/abc.html
    [5] https://www.demo2s.com/python/python-virtual-subclasses.html
    [6] https://stackoverflow.com/questions/51666120/whats-the-usage-of-a-virtual-subclass/51666808
    [7] https://en.wikipedia.org/wiki/Virtual_class
    [8] https://kavianam.ir/Virtual-Subclass-in-Python
    [9] https://stackoverflow.com/questions/57602862/what-is-indirect-and-virtual-subclass
    [10] https://hynek.me/articles/python-subclassing-redux/
    
    • a virtual subclass should be seen more like a mixin rather than multiple inheritance
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      
        **Virtual subclassing** in Python is *not* the canonical or idiomatic mechanism for achieving "multiple extends" (i.e., multiple inheritance) or adapting to multiple interfaces in the way you might do in statically-typed languages like Java or C#. Instead, Python supports **multiple inheritance** natively and directly through its class definition syntax, and this is the primary mechanism for a class to inherit from multiple parents and thus "implement multiple interfaces"[1][5][6].
      
        ### Multiple Inheritance: The Pythonic Way
      
        Python allows a class to inherit from any number of parent classes simply by listing them in the class definition:
      
        ```python
        class MyClass(InterfaceA, InterfaceB, SomeBase):
            pass
        ```
      
        This means `MyClass` will inherit all methods and attributes from `InterfaceA`, `InterfaceB`, and `SomeBase`, and will be considered a subclass of each for purposes of `issubclass()` and `isinstance()` checks[1][5][6].
      
        This is the direct and idiomatic way to "extend" multiple interfaces or base classes in Python, and is how you would typically express that a class conforms to multiple protocols or contracts.
      
        ### Virtual Subclassing: Purpose and Limitations
      
        **Virtual subclassing** via `ABC.register(SomeClass)` is a mechanism for declaring that a class should be *recognized* as a subclass of an abstract base class (ABC) for the purposes of type checking, without actually inheriting from it. This is especially useful when you want to adapt a class (often from a third-party library) to an interface you control, but cannot or do not want to modify its source code[7].
      
        - **Virtual subclassing does not provide code sharing**: The registered class does not inherit any methods or properties from the ABC; it is only recognized as a subclass for `issubclass()` and `isinstance()` checks.
      ​  - **No method resolution order (MRO) change**: The ABC does not appear in the registered class's MRO, so method resolution and `super()` calls are unaffected.
      ​  - **No attribute/method adaptation**: If the registered class does not implement the required methods, you get runtime errors.
      
        ### Adapting to Multiple Interfaces
      
        If you want a class to *adapt* to multiple interfaces:
      
        - **If you control the class**: Use multiple inheritance directly. This is idiomatic and gives you both interface conformance and code sharing[1][5][6].
      ​  - **If you do not control the class**: Use virtual subclassing to declare interface conformance, but be aware this is only for type checks and does not provide actual method implementations[7].
      
        ### Mixin Pattern
      
        A related idiom is the **mixin**: a class designed to be used as a base class that provides additional methods, typically without defining its own state. Mixins are commonly used in Python to modularly add functionality via multiple inheritance, and are a way to "compose" behaviors without the pitfalls of deep inheritance hierarchies[6].
      
        ### Summary Table
      
        | Mechanism           | Code Sharing | Interface Conformance | Idiomatic Use         |
        |---------------------|-------------|----------------------|----------------------|
        | Multiple Inheritance| Yes         | Yes                  | Implementing multiple interfaces, mixins |
        | Virtual Subclassing | No          | Yes (type checks)    | Adapting foreign classes to your ABCs    |
      
        **In summary:**
        Python's primary and idiomatic way to "extend" multiple interfaces is through direct multiple inheritance. **Virtual subclassing** is a supplementary mechanism for adapting existing classes to an interface for type checks, but it does not provide code sharing or actual interface adaptation[1][5][6][7].
      
        [1] https://www.digitalocean.com/community/tutorials/understanding-class-inheritance-in-python-3
        [2] https://stackoverflow.com/questions/66849568/issubclass-and-multiple-inheritance
        [3] https://stackoverflow.com/questions/27545888/is-multiple-inheritance-from-the-same-base-class-via-different-parent-classes-re
        [4] https://softwareengineering.stackexchange.com/questions/291061/is-there-a-better-pattern-than-multiple-inheritance-here
        [5] https://dataplatform.cloud.ibm.com/docs/content/wsd/nodes/scripting_guide/clementine/jython/clementine/python_inheritance.html?audience=wdp&context=dph&context=analytics&context=analytics&context=analytics&context=analytics&context=analytics&context=analytics&context=cpdaas
        [6] https://openstax.org/books/introduction-python-programming/pages/13-5-multiple-inheritance-and-mixin-classes
        [7] https://hynek.me/articles/python-subclassing-redux/
        [8] https://docs.python.org/3/tutorial/classes.html
        [9] https://realpython.com/inheritance-composition-python/
        [10] https://www.geeksforgeeks.org/python/multiple-inheritance-in-python/
      

Standard API of Mapping Types

The recommendation is to wrap a dict by composition instead of subclassing the Collection, Mapping, MutableMapping ABCs.

Note that because everything ultimately relies on the hastable, the keys must be hashable (doesn’t matter if the value is hashable)

  • What Is Hashable

    • ✅ User Defined Types: for user defined types, the hashcode is the id() of the object and the __eq__ method from the object parent class compares the object ids.
    • gotcha: there’s a salt applied to hashing

      And the salt differs across python processes.

      The hash code of an object may be different depending on the version of Python, the machine architecture, and because of a salt added to the hash computation for secu‐ rity reasons.3 The hash code of a correctly implemented object is guaranteed to be constant only within one Python process.

  • Overview of Common Mapping Methods: using dict, defaultdict and OrderedDict

    :NOTER_PAGE: (115 . 0.580146)

  • Inserting or Updating Mutable Values: when to use setdefault

    Should use setdefault when you want to mutate the mapping and there’s nothing there

    E.g. you wanna fill in empty default values

    so instead of doing this which has 2 searches through the dict index ⛔️

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    import re
    import sys
    WORD_RE = re.compile(r'\w+')
    
    index = {}
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                word = match.group()
                column_no = match.start() + 1
                location = (line_no, column_no)
                # this is ugly; coded like this to make a point
                occurrences = index.get(word, [])
                occurrences.append(location)
                index[word] = occurrences
    # display in alphabetical order
    for word in sorted(index, key=str.upper):
        print(word, index[word])
    

    we could do just do a single search within the dict index and do:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    """Build an index mapping word -> list of occurrences"""
    import re
    import sys
    WORD_RE = re.compile(r'\w+')
    index = {}
    with open(sys.argv[1], encoding='utf-8') as fp:
            for line_no, line in enumerate(fp, 1):
                    for match in WORD_RE.finditer(line):
                            word = match.group()
                            column_no = match.start() + 1
                            location = (line_no, column_no)
                            index.setdefault(word, []).append(location)
                            # display in alphabetical order
    for word in sorted(index, key=str.upper):
            print(word, index[word])
    

    setdefault returns the value, so it can be updated without requiring a second search.

Automatic Handling of Missing Keys

We have 2 options here.

  • defaultdict: Another Take on Missing Keys

    • it’s actually a callable that we are passing as an arg, so when we do things like bool or list we’re actually passing in the constructor to these builtins.
    • callable is stored within the default_factory and we can replace the factory as we wish!
    • interesting: if we do a membership check on a key that doesn’t exist, the default factory won’t be called yet.
  • The missing Method

    :PROPERTIES: :NOTER_PAGE: (121 . 0.519175)

    TLDR: subclass UserDict instaed of dict to avoid these issues

    Take note of the nuances in the implementation that is shown because they avoid infinite recursion.

    It’s important to think of how the method delegation may introduce chances of infinite recursion.

    Also, same thing for what the fallback methods are for builtin methods.

  • note: k in my_dict faster than k in my_dict.keys()

    Also technically k in my_dict is faster than using the k in my_dict.keys() because it avoids the attribute lookup to find the .keys method.

  • Inconsistent Usage of missing in the Standard Library

    TLDR: subclass UserDict instaed of dict to avoid these issues subclassing builtin types is tricky! (will come up later in the book).

    Basically, this dunder method is inconsistently used. Be careful if you wanna subclass this, it may result in infinite recursions.

Variations of dict

  • collections.OrderedDict

    Mostly the modern implementation for dict is good enough

    • has some minor differences from the modern implementation of dict:
      • can handle frequent reordering ops better than dict => useful to track recent accesses like in an LRU cache.
    • use it for simple implementations of an LRU cache.
  • collections.ChainMap

    • chains together multiple mappings by holding references to the mappings
    • any mods happen to the first-inserted mapping
    • it’s useful to search hierarchically (search from d1, then d2, then d3) and prioritises the results found in the earlier implementation.
  • collections.Counter

    • counters are great!!!
    • GOTCHA: when things are tied, only one is returned. example if counter looks like this: Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1}) then if we do ct.most_common(3) then even if both b and c are tied, we only get b in the result. Here’s some more context on this:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      
        The behavior you’re observing with `Counter.most_common(n)`—where only one of the tied elements is included when there’s a tie at the cutoff—is by design, and it’s important to understand the underlying mechanics and implications.
      
        ### Why does this happen?
      
        - **`most_common(n)` returns exactly `n` items**: The method returns the *first* `n` elements from the sorted list of items, sorted by count (descending). If multiple elements have the same count at the cutoff position, only those that appear first (according to the internal order, which is typically insertion order for equal counts) are included[1][2][9].
      ​  - **Ties are not expanded**: If two or more elements are tied for the last spot in the top `n`, only as many as needed to reach `n` are returned. The method does not automatically include all items with the same count as the nth item[6].
      
        #### Example:
      
        ```python
        from collections import Counter
      
        ct = Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
        print(ct.most_common(3))
        # Output might be: [('a', 10), ('z', 3), ('b', 2)]
        ```
      
        Even though both `'b'` and `'r'` have a count of `2`, only `'b'` is included because you asked for the top 3, and `'b'` appears before `'r'` in the internal ordering[2][6].
      
        ### How is the order determined for ties?
      
        - **Insertion order**: For elements with equal counts, their order in the result is based on the order they were first encountered in the input (Python 3.7+ dictionaries preserve insertion order)[2][9].
      ​  - **No guarantee for all ties**: There is no built-in mechanism to include all elements with the same count as the nth element unless you implement it yourself[6].
      
        ### What if you want all tied elements included?
      
        You need to post-process the result. For example, you can:
      
        1. Get the count of the nth item.
        2. Iterate through all items and include those with counts >= nth count.
      
        ```python
        from collections import Counter
      
        ct = Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
        n = 3
        top_n = ct.most_common(n)
        cutoff = top_n[-1][1]
        all_tied = [(k, v) for k, v in ct.items() if v >= cutoff]
        print(all_tied)
        ```
        This will include all items with counts equal to or greater than the nth most common, including ties[6].
      
        ### Summary
      
        - `most_common(n)` returns exactly `n` elements, even if there are ties at the last position.
      ​  - For ties, only the first encountered elements (in insertion order) are included.
      ​  - To include all tied elements, you must filter manually after getting the cutoff value[2][6][9].
      
        This behavior is documented and is a common source of confusion—your observation is correct and is a subtle but important aspect of using `Counter.most_common()` in Python.
      
        [1] https://www.geeksforgeeks.org/python/python-most_common-function/
        [2] https://docs.python.org/3/library/collections.html
        [3] https://www.digitalocean.com/community/tutorials/python-counter-python-collections-counter
        [4] https://stackoverflow.com/questions/29240807/python-collections-counter-most-common-complexity
        [5] https://blog.csdn.net/weixin_43056275/article/details/124384145
        [6] https://stackoverflow.com/questions/33791057/counter-most-common-is-a-little-misleading/33791292
        [7] https://www.youtube.com/watch?v=fqACZvcsNug
        [8] https://dev.to/atqed/you-can-be-happy-to-know-python-counter-how-to-get-the-most-common-elements-in-a-list-o1m
        [9] https://ioflood.com/blog/python-counter-quick-reference-guide/
        [10] https://dev.to/kathanvakharia/python-collections-module-counter-2gn
      
  • shelve.Shelf

    • shelves are for storing pickle jars
    • shelves are persistent storage for a mapping of strings to pickle objects
    • A Shelf instance is a context manager, so you can use a with block to make sure it is closed after use.
    • Ref “Pickle’s nine flaws”

      here’s the writeup

      And here’s a bot summary of it:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      
      Here is a summary of the nine flaws of Python's `pickle` module as detailed by Ned Batchelder[1][2]:
      
      1. **Insecure**
         Pickle is fundamentally unsafe for untrusted data. Maliciously crafted pickles can execute arbitrary code during unpickling, as the process can invoke any callable in the Python environment—not just constructors for your objects.
      
      2. **Old pickles look like old code**
         Pickle stores the exact structure of objects as they existed when pickled. If your code changes (e.g., you add or remove attributes), unpickling old data can result in objects that don't match your current codebase, leading to bugs and inconsistencies.
      
      3. **Implicit**
         The serialization process is implicit and automatic: you cannot explicitly control how objects are serialized. For example, you cannot choose to serialize a datetime as an ISO8601 string instead of a datetime object.
      
      4. **Over-serializes**
         Pickle serializes everything in your objects, including data you may not want to persist (like caches or open file handles). There is no straightforward way to exclude unwanted attributes, and if an attribute is not pickle-able, you'll get an exception[2].
      
      5. **`__init__` isn’t called**
         When unpickling, the `__init__` method of your class is not invoked. This can leave objects in an inconsistent state, especially if `__init__` does essential setup or side effects.
      
      6. **Python only**
         Pickle is Python-specific. While there are rare cross-language implementations, they're limited and not practical for general use. This makes pickled data hard to share with non-Python systems.
      
      7. **Unreadable**
         Pickles are binary blobs, not human-readable. You can't inspect or search them with standard tools, making debugging and data recovery more difficult.
      
      8. **Appears to pickle code**
         Pickle will serialize references to functions and classes, but not their code—only their names. This can give the false impression that code is being serialized, but on unpickling, the code must already exist in the environment.
      
      9. **Slow**
         Pickle is slower than many alternative serialization formats, both in terms of speed and efficiency.
      
      > Some of these issues can be mitigated with custom methods like `__getstate__` or `__reduce__`, but at that point, alternative serialization formats (e.g., JSON, protocol buffers) may be more robust and maintainable[1].
      
      These flaws highlight why `pickle` is best reserved for trusted, Python-only, and short-lived data interchange scenarios—not for general-purpose or cross-system serialization.
      
      [1] https://nedbatchelder.com/blog/202006/pickles_nine_flaws.html
      [2] https://nedbatchelder.com/blog/tag/python.html
      [3] https://www.python4data.science/en/latest/data-processing/serialisation-formats/pickle/index.html
      [4] https://pycoders.com/issues/426
      [5] https://stanforddaily.com/2019/10/11/face-it-pickles-are-bad-an-irrefutable-proof/
      [6] https://content.ces.ncsu.edu/pickle-and-pickle-product-problems
      [7] https://diff.blog/post/pickles-nine-flaws-49891/
      [8] https://pythonbytes.fm/episodes/show/193/break-out-the-django-testing-toolbox
      [9] https://www.reddit.com/r/Python/comments/1c5l9px/big_o_cheat_sheet_the_time_complexities_of/
      [10] https://podscripts.co/podcasts/python-bytes/189-what-does-strstrip-do-are-you-sure
      
  • Subclassing UserDict Instead of dict

    • key idea here is that it uses composition and keeps an internal dict within the data attribute
    • implementing other functions as we extend it will require us to use the self.data attribute.

Immutable Mappings

  • we can use a read-only MappingProxyType from the types module to expose a readonly proxy

    the constructor in a concrete Board subclass would fill a private mapping with the pin objects, and expose it to clients of the API via a public .pins attribute implemented as a mappingproxy. That way the clients would not be able to add, remove, or change pins by accident.

Dictionary Views

  • the views are supposed to be proxies as well so they are updated. Any changes to the original mapping will be viewable as well
  • because they are not sequences (they are view objects) they are not subscript-able. so doing something like myvals[0] won’t work. If we wish, we could convert it to a list, but then it’s a copy, it’s no longer a live dynamic read-only proxy.

Practical Consequences of How dict Works

  • why we should NOT add instance attrs outside of __init__ functions

    That last tip about instance attributes comes from the fact that Python’s default behavior is to store instance attributes in a special dict attribute, which is a dict attached to each instance.9 Since PEP 412—Key-Sharing Dictionary was implemented in Python 3.3, instances of a class can share a common hash table, stored with the class. That common hash table is shared by the dict of each new instance that has the same attributes names as the first instance of that class when init returns. Each instance dict can then hold only its own attribute values as a simple array of pointers. Adding an instance attribute after init forces Python to create a new hash table just for the dict of that one instance

    also KIV the implementation of __slots__ and how that is even better of an optimisation.

Set Theory

As we had found out from the extension writeup, the intersection operator is a great oneliner found = len(needles & haystack) or found = len(set(needles) & set(haystack)) to be more generalisable (though there’s the overhead from building the set)

  • Set Literals

    • using the set literal ({1,2,3}) for construction is faster than using the constructor (set([ 1,2,3 ])) because the constructor will have to do a key lookup to fetch the function
    • the literal directly uses a BUILDSET bytecode
  • Set Comprehensions

    • looks almost the same as dictcomps

Practical Consequences of How Sets Work

  • Set Operations

Set Operations on dict Views

  • .keys() and .items() are similar to frozenset

    • .values() may work like this too but only if all the values in the dict are hashable
  • Even better: the set operators in dictionary views are compatible with set instances.

Chapter Summary

Further Reading

Chapter 4. Unicode Text Versus Bytes

What’s New in This Chapter

Character Issues

  • “string as a sequence of characters” needs the term “character” to be defined well
  • in python 3, it’s “unicode
  • Unicode char separates:
    • identity of the char => refers to its code point
    • the byte representation for the char => dependent on the encoding used (codec between code points and byte sequences)

Byte Essentials

  1. binary sequences, there are 2 builtin types:
    • mutable: bytearray
    • immutable: byte
  2. Each item in bytes or bytearray is an integer from 0 to 255
  3. literal notation depends (just a visual representation thing):
    • if in ascii range, display in ascii
    • if it’s a special char like tab and such, then escape it
    • if amidst apostrophes, then use escape chars
    • else just use the hex notation for it e.g. \x100
  4. most functions work the same, except those that do formatting and those that depend on unicode data so won’t work:
    • case, fold
  5. regexes work the same only if regex is compiled from a binary sequence instead of a str
  6. how to build bytes or bytearray :
    1. use bytes.fromhex()
    2. use bytes.encode(“mystr”, encoding=“utf-8”)
    3. use soemthing that implements buffer protocol to create from source object to new binary sequence (e.g. memoryview).
      • This needs us to explicitly typecast
      • generally, this will also always copy the bytes from the source – except for memoryview which is a direct proxy to the memory
  • Indexing vs slicing & why str is special

    Strings are really the special kind here.

    For bytes, bytearray, list, etc., seq returns a single element (e.g., an int), while seq[:1] returns a new sequence of the same type containing just that element. This is different from str, where both return a string. This distinction is important when working with binary data or non-string sequences in Python.

    More descriptive:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    
    Let's break down the meaning of this passage from "Fluent Python" and the behavior it describes.
    
    ## The Key Point
    
    > For all other sequence types in Python, 1 item is not the same as a slice of length 1.
    
    This means:
    - **Accessing a single element** (e.g., `seq`) returns a single item (e.g., an int, a character, etc.).
    - **Taking a slice of length 1** (e.g., `seq[:1]` or `seq[0:1]`) returns a new sequence of the same type, containing just that one item.
    
    For most sequences in Python, **these two operations do not return the same type or value**.
    
    ### The Exception: `str`
    
    For strings, `s` and `s[:1]` both return a string (the first character), so they look the same:
    
    ```python
    s = "hello"
    print(s[0])   # 'h'
    print(s[:1])  # 'h'
    print(type(s[0]), type(s[:1]))  #
    ```
    
    ### The Rule: Other Sequences
    
    For other sequence types (e.g., `bytes`, `bytearray`, `list`, `tuple`):
    
    - **Indexing** returns a single element (not a sequence).
    - **Slicing** returns a sequence of the same type.
    
    #### Example: `bytes`
    
    ```python
    cafe = bytes('café', encoding='utf_8')
    print(cafe)      # b'caf\xc3\xa9'
    print(cafe[0])   # 99  (int: ASCII code for 'c')
    print(cafe[:1])  # b'c'  (bytes of length 1)
    print(type(cafe[0]), type(cafe[:1]))  #
    ```
    - `cafe` gives you the integer value of the first byte (`99`).
    - `cafe[:1]` gives you a `bytes` object of length 1: `b'c'`.
    
    #### Example: `bytearray`
    
    ```python
    cafe_arr = bytearray(cafe)
    print(cafe_arr[-1])    # 169 (int: last byte)
    print(cafe_arr[-1:])   # bytearray(b'\xa9') (bytearray of length 1)
    ```
    - `cafe_arr[-1]` is an integer (the last byte).
    - `cafe_arr[-1:]` is a `bytearray` object of length 1.
    
    #### Example: `list`
    
    ```python
    lst = [10, 20, 30]
    print(lst[0])   # 10
    print(lst[:1])  # [10]
    ```
    - `lst` is an int.
    - `lst[:1]` is a list of length 1.
    
    ## Why Is This Surprising?
    
    - In Python's `str` type, `s` and `s[:1]` both return a string (the first character), which is a bit special.
    - For all other sequences, **indexing returns a single element (not a sequence), slicing returns a sequence**.
    
    ## Mental Model
    
    - **Indexing** (`seq[i]`): Returns the element at position `i` (type depends on the sequence).
    - **Slicing** (`seq[i:j]`): Returns a new sequence of the same type, containing elements from `i` to `j-1`.
    
    ## References
    
    - [Fluent Python, 2nd Edition, Chapter 4: Text versus Bytes](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/)
    - [Python Data Model: Sequence Types](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)
    
    **In summary:**
    For `bytes`, `bytearray`, `list`, etc., `seq` returns a single element (e.g., an int), while `seq[:1]` returns a new sequence of the same type containing just that element. This is different from `str`, where both return a string. This distinction is important when working with binary data or non-string sequences in Python.
    

Basic Encoders/Decoders

  • Highlight on page 153

    • Contents

      Each codec has a name, like ‘utf_8’

    • Comment

      so utf-8 is a codec here

  • Highlight on page 154

    • Contents

      like ASCII and even the multibyte GB2312, cannot represent every Unicode character. The UTF encod‐ ings, however, are designed to handle every Unicode code point.

Understanding Encode/Decode Problems

Errors that we can expect:

  1. (generic) UnicodeError
    1. UnicodeDecodeError
    2. UnicodeEncodeError
  2. When loading libraries, might end-up facing SyntaxError also because of encoding issues
  • Coping with UnicodeEncodeError

    • the error handlers for encoding error can include 'xmlcharrefreplace'. What this does is XML character reference: &#<unicode code point> and in so doing, there’s no loss of that information Here’s more context on it:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      
        The statement from "Fluent Python"—
      
        > 'xmlcharrefreplace' replaces unencodable characters with an XML entity. If you can’t use UTF, and you can’t afford to lose data, this is the only option.
      
        —means that when you encode a string using a limited encoding (like ASCII) and specify `errors='xmlcharrefreplace'`, **any character that cannot be represented in the target encoding is replaced with an XML numeric character reference** (e.g., `&#233;` for "é"). This ensures that **no information is lost**: all original characters are either encoded directly (if possible) or represented as XML entities, which are reversible.
      
        ### How does it work?
      
        - When encoding, Python checks each character:
      ​      - If the character can be encoded in the target encoding (e.g., ASCII), it is kept as-is.
      ​      - If it cannot, it is replaced with its XML character reference: `&#;`
      ​  - When decoding, you can later convert these references back to the original characters, so the process is *lossless* in terms of information content.
      
        #### Example
      
        ```python
        txt = "Café"
        encoded = txt.encode("ascii", errors="xmlcharrefreplace")
        print(encoded)  # b'Caf&#233;'
        ```
        Here, "é" (which is not in ASCII) is replaced with `&#233;`, preserving the character information[2][7].
      
        ### Why is there no data loss?
      
        - **All original characters are represented:** Characters that can't be encoded are replaced with their numeric reference, which uniquely identifies the character.
      ​  - **Reversible:** You can later parse the XML entities back into the original Unicode characters, restoring the original string[1][6].
      
        ### Contrast with other error handlers
      
        - `'replace'` swaps unencodable characters for `?` (data loss).
      ​  - `'ignore'` simply omits them (data loss).
      ​  - `'backslashreplace'` uses Python escape sequences (reversible, but not standard in XML/HTML).
      ​  - `'xmlcharrefreplace'` uses XML/HTML-compatible numeric references (reversible, and standard for text interchange).
      
        ### Practical implication
      
        If you must encode text in a limited character set (like ASCII or Latin-1) but need to ensure that all characters are preserved in some form (for later recovery or interoperability), `'xmlcharrefreplace'` is the safest choice[4][6][7].
      
        **In summary:**
        Using `'xmlcharrefreplace'` means that **no original character data is lost**—all characters are either encoded directly or replaced with a reversible XML entity. This is why the book says it is the only option if you can't use UTF and can't afford to lose data.
      
        [1] https://stackoverflow.com/questions/44293891/python-string-encoding-xmlcharrefreplace-decode
        [2] https://www.w3schools.com/python/ref_string_encode.asp
        [3] https://docs.python.org/3/howto/unicode.html
        [4] https://docs.python.org/3/library/codecs.html
        [5] https://www.codecademy.com/resources/docs/python/strings/encode
        [6] https://code.activestate.com/recipes/303668-encoding-unicode-data-for-xml-and-html/
        [7] https://www.geeksforgeeks.org/python/python-strings-encode-method/
        [8] https://www.digitalocean.com/community/tutorials/python-string-encode-decode
        [9] https://labex.io/tutorials/python-what-is-the-role-of-the-encoding-and-errors-parameters-in-the-str-function-in-python-395133
        [10] https://docs.vultr.com/python/standard-library/str/encode
      
  • Coping with UnicodeDecodeError

    • Highlight on page 156

      • Contents

        On the other hand, many legacy 8-bit encodings like ‘cp1252’, ‘iso8859_1’, and ‘koi8_r’ are able to decode any stream of bytes, including random noise, without reporting errors. Therefore, if your program assumes the wrong 8-bit encoding, it will silently decode garbage.

      • Comment

        utf8/16 will sound off because it’s a strict error check

        the older 8bit codecs will do it silently

    • Highlight on page 157

      • Contents

        “�” (code point U+FFFD), the official Unicode REPLACEMENT CHARACTER intended to represent unknown characters.

      • Comment

        there’s an official REPLACEMENT CHARACTER

  • SyntaxError When Loading Modules with Unexpected Encoding

    • utf8 default for python source code
    • fix this by defining explicitly what encoding type to use at the top of the file when writing that file out.
      1
      
        # coding: cp1252
      
      OR just fix it by converting to UTF-8
  • How to Discover the Encoding of a Byte Sequence

    • you can’t but you can make a good guess
    • chardet exists for this reason, it’s an estimated detection of the encoding type.
    • Highlight on page 159

      • Contents

        human languages also have their rules and restrictions, once you assume that a stream of bytes is human plain text, it may be possible to sniff out its encoding using heuristics and statistics. For example, if b’\x00’ bytes are common, it is probably a 16- or 32-bit encoding, and not an 8-bit scheme, because null characters in plain text are bugs. When the byte sequence b’\x20\x00’ appears often, it is more likely to be the space character (U+0020) in a UTF-16LE encoding, rather than the obscure U+2000 EN QUAD character—whatever that is. That is how the package “Chardet—The Universal Character Encoding Detector” works to guess one of more than 30 supported encodings. Chardet is a Python library that you can use in your programs, but also includes a command-line utility, charde tect.

      • Comment

        typically an encoding is declared – so you have to be told what encoding it is

        however, it’s possible to guess probabilistically what the encoding could be.

        there are packages for that (Chardet)

  • BOM: A Useful Gremlin

    • Byte-Order Mark: helps us know if the machine that the encoding was performed on is little or big endian.
    • endianness becomes a problem only for any encoding format that takes more than a byte (so for UTF-16 and UTF-32) ==> so BOM only matters for them
    • so BOM not needed for UTF-8
    • but it can still be added in (discouraged though)
    • Highlight on page 160

      • Contents

        UTF-16 encoding prepends the text to be encoded with the special invisible character ZERO WIDTH NO-BREAK SPACE (U+FEFF).

    • Highlight on page 160

      • Contents

        This whole issue of endianness only affects encodings that use words of more than one byte, like UTF-16 and UTF-32

    • Highlight on page 161

      • Contents

        using UTF-8 for general interoperability. For example, Python scripts can be made executable in Unix systems if they start with the comment: #!/usr/bin/env python3. The first two bytes of the file must be b’#!’ for that to work, but the BOM breaks that con‐ vention. If you have a specific requirement to export data to apps that need the BOM, use UTF-8-SIG but be aware that Python’s codecs documentation says: “In UTF-8, the use of the BOM is dis‐ couraged and should generally be avoided.”

      • Comment

        use UTF-8-SIG because will be harmless

        also note that the python codecs documentation says that in utf8, using a BOM (byte order mark) is discouraged.

Handling Text Files & the “Unicode Sandwich”

Here’s the gist of why it’s “unicode sandwich”

  1. decode bytes on input
  2. process text only (the meat of the sandwich is the business logic that should use strings)
  3. encode text on output

The best practice for handling text I/O is the “Unicode sandwich” (Figure 4-2).5 This means that bytes should be decoded to str as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or decoding in the middle of other processing. On output, the str are encoded to bytes as late as possible.

  • Highlight on page 161

    • Contents

      e best practice for handling text I/O is the “Unicode sandwich” (Figure 4-2).5 This means that bytes should be decoded to str as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or decoding in the middle of other processing. On output, the str are enco‐ ded to bytes as late as possible.

    • Comment

      Unicode sandwhich is the best practices for handling text files and their encoding:

      1. bytes -> str (decode bytes as early as possbile, i.e. on input)

      2. process text only in the business logic

      3. encode text on output only

  • Highlight on page 162

    • Contents

      Code that has to run on multiple machines or on multiple occasions should never depend on encoding defaults. Always pass an explicit encoding= argument

    • Comment

      cross-platform code should always explicitly define the encoding value!

      so unix machines will use utf-8 but then when using, say, a windows machine there might be an encoding issue becaue

  • Highlight on page 163

    • Contents

      TextIOWrapper with the encoding set to a default from the local

  • Beware of Encoding Defaults

    even within say windows itself, not every application would have the same encoding.

    for unix it’s more standardised, so it’s most likely expected to be utf-8

    • Defaults

      Main thing to remember is that the most important encoding setting is the one that is retired by locale.getpreferredencoding()

      The changes can be effected by changing the environment variables.

Normalizing Unicode for Reliable Comparisons

  • canonical equivalents exist, but they have different code points under the hood.
  • there’s a bunch of different normalisation forms, for extra safety, when saving strings, should normalise that string (using NFC normalistaion for example)
  • gotcha: some single characters can be normalised to result in visually similar but they compare different
  • string normalisation can be lossy, so there can be actual dataloss from multiple sandwhich creation, destruction, creation
    • NFKC and NFKD are examples of such normalisation forms - these forms should only be used for intermediate representations for search & index
  • NFC is not sufficient for search and indexing because it preserves compatibility distinctions that are irrelevant (and even counterproductive) for matching. NFKC/NFKD are used as intermediate representations for search and indexing because they erase these distinctions, enabling robust, user-friendly search behavior—at the cost of losing some original form information, which is why they are not used for storage or display. See more info here:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    
      To understand why **NFC normalization is not always suitable for search and indexing**, and why compatibility forms like **NFKC/NFKD** are often used as intermediate representations for these purposes, let's clarify the properties and goals of each normalization form and their implications for search/index use cases.
    
      ### **NFC vs. NFKC: What’s the Difference?**
    
      - **NFC (Normalization Form C, Canonical Composition):**
    ​    - Collapses canonically equivalent sequences into a single, composed form.
    ​    - Preserves distinctions between characters that are *compatible* but not *canonically equivalent* (e.g., ligatures, superscripts, full-width vs. half-width characters).
    ​    - Designed to be *lossless* for textual content, so that round-tripping (normalize, then denormalize) does not lose data[4][7].
    
      - **NFKC (Normalization Form KC, Compatibility Composition):**
    ​    - Collapses both canonically equivalent and *compatibility equivalent* sequences.
    ​    - This means it will, for example, convert ligatures like 'fi' (U+FB01) to 'fi', or full-width Latin letters to their standard forms.
    ​    - This process is **lossy**: information about the original form (e.g., that a ligature or superscript was used) is lost[4][7].
    
      ### **Why Not Use NFC for Search and Indexing?**
    
      **NFC is designed to preserve distinctions that are meaningful in text rendering or data storage, but are often *not* meaningful for search and indexing.**
      For example:
    ​  - The string "field" could be encoded as:
    ​    - `U+0066 U+0069 U+0065 U+006C U+0064` ("field")
    ​    - `U+FB01 U+0065 U+006C U+0064` ("field" with the ligature 'fi')
    ​  - **NFC will *not* turn the ligature into "fi"; it will keep the distinction.**
        This means a search for "field" will not match a document containing "field" (with the ligature) if both are normalized to NFC[4][7].
    
      **For search and indexing, you often want:**
    ​  - "field" and "field" to be treated as equivalent.
    ​  - Full-width and half-width forms, superscripts, circled numbers, etc., to be treated as their base forms for matching.
    
      **NFKC/NFKD normalization does this:**
    ​  - It "folds" away these compatibility distinctions, making visually or semantically similar forms compare equal[1][4][7].
    
      ### **Special Role of Compatibility Normalization for Search/Index**
    
      - **Intermediate representation**:
        You normalize both the indexed data and the search queries to NFKC/NFKD, so that *compatibility* distinctions are erased and matches are more robust.
    ​  - **Not for storage or display**:
        Because NFKC/NFKD is lossy, you shouldn't use it for storing or displaying user data—only for the internal process of searching and indexing[1][4].
    
      ### **Summary Table**
    
      | Use Case        | Recommended Normalization | Rationale                                                    |
      |-----------------|--------------------------|--------------------------------------------------------------|
      | Storage/Display | NFC                      | Preserves all distinctions, lossless for most text           |
      | Search/Index    | NFKC/NFKD                | Folds away compatibility differences for robust matching      |
    
      ### **References from Results**
    ​  - "NFC removes the distinction between equivalent characters, while preserving the distinction between compatible characters or sequences; NFKC removes the distinction between both equivalent and compatible sequences. NFC conversion is not considered lossy, but NFKC conversion is." [4]
    ​  - "It's more explicitly lossy; the compatibility forms are recommended for performing searches since they fold additional characters such as plain latin and 'full-width' latin letters." [1]
    ​  - "The compatibility normalizations are super useful, because ... you often want a search query for `ffi` to match on `ffi` (and vice versa). But they are intended to lose symbolic representation (ffi and ffi are now the same thing with no way to distinguish), where the canonical normalizations are not." [7]
    
      ### **In summary**
    
      **NFC** is not sufficient for search and indexing because it preserves compatibility distinctions that are irrelevant (and even counterproductive) for matching.
      **NFKC/NFKD** are used as intermediate representations for search and indexing because they erase these distinctions, enabling robust, user-friendly search behavior—at the cost of losing some original form information, which is why they are not used for storage or display.
    
      [1] https://www.mediawiki.org/wiki/Unicode_normalization_considerations
      [2] https://stackoverflow.com/questions/15985888/when-to-use-unicode-normalization-forms-nfc-and-nfd
      [3] https://unicode.org/reports/tr15/
      [4] https://jazz.net/wiki/bin/view/LinkedData/UseOfUnicodeNormalForms
      [5] https://www.w3.org/wiki/I18N/CanonicalNormalizationIssues
      [6] https://blog.reeset.net/archives/2532
      [7] https://news.ycombinator.com/item?id=19379965
      [8] https://go.dev/blog/normalization
      [9] https://www.reddit.com/r/programming/comments/b09c0j/when_zo%C3%AB_zo%C3%AB_or_why_you_need_to_normalize_unicode/
      [10] https://unicode-org.github.io/icu/design/normalization/custom.html
    
  • Notes for page 140 V: 39% H: 25%

    sequences like ‘é’ and ’e\u0301’ are called “canonical equivalents,” and applications are supposed to treat them as the same. But Python sees two different sequences of code points, and considers them not equal.

  • Notes for page 140 V: 82% H: 50%

    it may be good to normalize strings with normalize(‘NFC’, user_text) before saving.

  • Case Folding (normalisation tranformation)

    • folding everything into lowercase
    • NOTE: casefold() and str.lower() have ~ 300 code points that return different results
  • Utility Functions for Normalized Text Matching

    util functions that might help:

    • nfc_equal
    • fold_equal
  • Extreme “Normalization”: Taking Out Diacritics

    • google search uses this aggressive normalisation based on real world attention that people give to diacritics
    • also helps for readable URLs (e.g for latin-based languages)
    • one way to call this transformation is “shaving”. We “shave” the diacritics

Sorting Unicode Text

  • python sorts by comparing sequences one by one
  • for strings, it compares code points
  • so to sort non-ascii text in python, have to use local.strxfrom to have locale-aware comparisons
  • Sorting with the Unicode Collation Algorithm

    • stdlib solution: there’s a locale.strxfrm to do locale-specific comparisons

      Python is to use the locale.strxfrm function which, according to the locale module docs, “transforms a string to one that can be used in locale-aware comparisons.”

      1
      2
      3
      4
      5
      6
      
      import locale
      my_locale = locale.setlocale(locale.LC_COLLATE, 'pt_BR.UTF-8')
      print(my_locale)
      fruits = ['caju', 'atemoia', 'cajá', 'açaí', 'acerola']
      sorted_fruits = sorted(fruits, key=locale.strxfrm)
      print(sorted_fruits)
      
    • use the Unicode Collation Algorithm via pyuca lib

The Unicode Database

Db is in the form of multiple text files.

Contains:

  • code point to char name mappings
  • metadata about the individual characters and how they are related.

That’s how the str methods isalpha, isprintable, isdecimal, and isnumeric work.

  • Finding Characters by Name

    use name() function from the unicodedata library

  • Numeric Meaning of Characters

    Some useful string functions here:

    1. .isnumeric()
    2. .isdecimal()

    comparisons with the human meaning of these rather than the code point.

  • common string functions may lookup this unicode database

    This is responsible for the string functions like isdecimal isnumeric

    the Unicode database records whether a character is printable, is a letter, is a decimal digit, or is some other numeric symbol. That’s how the str methods isal pha, isprintable, isdecimal, and isnumeric work. str.casefold also uses infor‐ mation from a Unicode table.

Dual-Mode str and bytes APIs

  • str Versus bytes in Regular Expressions

    • if given bytes patterns like \d and \w will only match ASCII characters
    • if given str patterns like \d and \w will only match beyond just ASCII characters.

    to make one point: you can use regular expressions on str and bytes, but in the second case, bytes outside the ASCII range are treated as nondigits and nonword characters.

  • regex patterns using bytes will treat outside-ASCII range chars as nondigits and nonword chars

    trivial example to make one point: you can use regular expressions on str and bytes, but in the second case, bytes outside the ASCII range are treated as nondigits and nonword characters.

  • str Versus bytes in os Functions

    • os functions actually abide by the Unicode Sandwich: they actually call sys.getfilesystemencoding() as soon as they can

Chapter Summary

  1. remember that 1 char == 1 byte is only true if it’s utf-8, there’s more than just that.
  2. just always be explicit about encodings when reading them Follow the unicode sandwich and ensure the encoding is explicit always.
  3. Unicode provides multiple ways of representing some characters, so normalizing is a prerequisite for text matching.

Further Reading

Chapter 5. Data Class Builders

:NOTER_PAGE: (193 . 0.108844)

I think my stance on using data classes is that it should help mock things easily to come up with scaffolds which are easy to replace.

It’s interesting that the type hinting for class vs instance attributes ended up needing to use pseudoclasses specific for this purpose (ClassVar, InitVar)

What’s New in This Chapter

Overview of Data Class Builders

  • Problem posed:
    • __init__ constructor can become too complex if we’re just going to assign attributes from constructor parameters
  • 3 options:
    • collections.namedtuple
    • typing.NamedTuple
      • newer than namedtuple
    • @dataclass decorator from dataclasses module
  • How they work:
    • they don’t rely on inheritence
    • typing hints are there if we use NamedTuple or dataclass
    • some of them are subclasses of tuple
    • All of them use metaprogramming techniques to inject methods and data attributes into the class under construction.
    • Some of them are more updated ways of doing things: typed.NamedTuple is newer than namedtuple
  • Examples:
    • Named tuple:

      • define inline Coordinate = typing.NamedTuple('Coordinate', lat=float, lon=float)

      • defined with a class statement Although here, NamedTuple is not a superclass, it’s actually a metaclass

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        
              from typing import NamedTuple
              class Coordinate(NamedTuple):
                      lat: float
                      lon: float
        
                      def __str__(self):
                              ns = 'N' if self.lat >= 0 else 'S'
                              we = 'E' if self.lon >= 0 else 'W'
        
                      return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'
        
    • Using dataclass

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
          from dataclasses import dataclass
          @dataclass(frozen=True)
          class Coordinate:
                  lat: float
                  lon: float
      
                  def __str__(self):
                      ns = 'N' if self.lat >= 0 else 'S'
                      we = 'E' if self.lon >= 0 else 'W'
      
                  return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'
      
  • Main Features

    • Mutability

      Out of the 3, only @dataclass allows us to keep the class mutable (if we need we an mark it as frozen btw).

      The rest, since they are subclasses of tuple are immutable.

      For the immutable ones, we can replace the object using replace functions.

    • NamedTuple as a metaclass customization of a class def

      Although NamedTuple appears in the class statement as a super‐ class, it’s actually not. typing.NamedTuple uses the advanced func‐ tionality of a metaclass2 to customize the creation of the user’s class.

    • Correctly reading type hints @ runtime

      It will be discussed in more detail later in the book

      reading from annotations directly is not recom‐ mended. Instead, the recommended best practice to get that information is to call inspect.get_annotations(MyClass) (added in Python 3.10) or typing.get_type_hints(MyClass) (Python 3.5 to 3.9). That’s because those functions provide extra services, like resolving forward references in type hints.

Classic Named Tuples

  • collections.namedtuple is a factory function

    So it’s possible to hack things by adding functions to this subclass.

    collections.namedtuple function is a factory that builds subclasses of tuple enhanced with field names, a class name, and an informative repr.

  • Memory Use by collections.namedtuple

    There’s no excess mem usage because it’s the class that will store the attribute names

    So it’s same space usage as a tuple.

    Each instance of a class built by namedtuple takes exactly the same amount of memory as a tuple because the field names are stored in the class.

  • Injecting methods into the subclass

    this is a hack, shouldn’t be relied upon.

    NOTE: No need to name the first arg as self if you’re hacking things by injecting methods

    the first argument doesn’t need to be named self. Anyway, it will get the receiver when called as a method.

    • normal classes method definition, self attribute is the receiver

      just some extra information about what the receiver is in the context of defining class methods in python

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      
      In Python, the **self** parameter in class methods is the conventional name for the **receiver**—the object instance that the method is being called on. While Python documentation and community almost always use the term "self," in some other object-oriented languages (like Ruby or in theoretical OOP discussions), "receiver" is the standard term for the object that receives the method call.
      
      ### What is the "receiver"?
      
      - The **receiver** is the specific instance of the class on which the method is invoked.
      - In Python, when you define a method like `def foo(self, ...)`, the `self` parameter is automatically bound to the instance when you call `instance.foo(...)`[6][1][7].
      - This allows each method to access and modify the attributes and other methods of the particular object instance, not just the class as a whole[6][1].
      
      ### How does it work?
      
      - When you call `obj.method(args)`, Python translates this to `Class.method(obj, args)`. The object `obj` is passed as the first argument to the method, and inside the method, it is referred to as `self`[6][1][7].
      - This is how methods can operate on instance-specific data: `self.attribute` refers to the attribute named `attribute` on the particular instance, not on the class or on other instances[1][6][7].
      
      #### Example:
      
      ```python
      class Car:
          def __init__(self, model):
              self.model = model  # self is the receiver
      
          def show(self):
              print("Model is", self.model)
      
      audi = Car("Audi A4")
      audi.show()  # 'self' inside show() refers to the audi instance
      ```
      
      - Here, `audi.show()` is equivalent to `Car.show(audi)`. The `audi` object is the receiver.
      
      ### Why is the receiver important?
      
      - It enables **object-specific state and behavior**. Each instance maintains its own data, and methods can access or mutate that data through `self`[1][6][7].
      - It allows methods to call other methods on the same object, e.g., `self.other_method()`[3].
      - Without the receiver, methods would not know which instance's data to operate on, and all data would have to be global or class-level.
      
      ### How does Python implement this?
      
      - When you retrieve a method from an instance (e.g., `method = obj.method`), Python returns a **bound method**—a function object with the receiver (`obj`) already attached as its first argument[5][8].
      - When you call the bound method, you only need to supply the remaining arguments; `self` is supplied automatically[5].
      
      ### Summary
      
      - **self** is the Pythonic name for the **receiver** of a method call in a class.
      - It refers to the specific instance on which the method was called.
      - It gives methods access to instance-specific data and behavior, enabling true object-oriented programming in Python[1][6][7].
      
      > “The 'self' parameter in Python class methods represents the class instance (object). Methods can access and manipulate attributes (variables) and call other methods of the same class using this special variable.”[6]
      
      This is a foundational mechanism for encapsulation and polymorphism in Python’s object model.
      
      [1] https://www.geeksforgeeks.org/python/self-in-python-class/
      [2] https://www.reddit.com/r/learnpython/comments/k9f4q7/could_someone_explain_the_use_of_self_when_it/
      [3] https://docs.python.org/3/tutorial/classes.html
      [4] https://stackoverflow.com/questions/14671218/python-class-methods-changing-self
      [5] https://stackoverflow.com/questions/70076256/how-do-python-handles-self-attribute-of-a-class-internally-when-the-method-is
      [6] https://www.w3resource.com/python-interview/explain-the-purpose-of-the-self-parameter-in-python-class-methods.php
      [7] https://blog.finxter.com/understanding-the-self-in-python-classes/
      [8] https://docs.python.org/3/reference/datamodel.html
      [9] https://softwareengineering.stackexchange.com/questions/422364/is-it-better-to-pass-self-or-the-specific-attributes-in-python-methods
      [10] https://www.pythonmorsels.com/what-is-self/
      

Typed Named Tuples

  • compile-time type annotations: the main feature of named tuples

    Classes built by typing.NamedTuple don’t have any methods beyond those that col lections.namedtuple also generates—and those that are inherited from tuple. The only difference is the presence of the annotations class attribute—which Python completely ignores at runtime.

Typed Named Tuples

  • the type annotations are ignored by python at runtime

Type Hints 101

  • No Runtime Effect

    • Type hints not enforced by compiler & interpreter

      • main intent is for use by static analysis tools, at rest

      The first thing you need to know about type hints is that they are not enforced at all by the Python bytecode compiler and interpreter.

    • Works at import time!

      that’s why importing libraries may fail.

  • Variable Annotation Syntax

    • variable here refers to the fact that variables are being annotated, not that the type hint is variable.
    • the syntax is just var_name: some_type = a_default_value
  • The Meaning of Variable Annotations

    • For classic class definitions, survival of annotations & survival of attributes within annotations

      :NOTER_PAGE: (206 . 0.086168)

      This applies to the classic class definitions, without the named tuples and such.

      This makes sense because there’s no reason to keep the annotations.

      surviving of annotation <== if there’s a type hint given

      surviving of the attribute in the class <== if there’s a value assignable

      Note that the annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class. The a survives only as an annotation. It doesn’t become a class attribute because no value is bound to it.6 The b and c are stored as class attributes because they are bound to values.

    • Annotations are type annotations for immutable attributes

      This is because NT is extended from Tuple class.

      • Contents

        If you try to assign values to nt.a, nt.b, nt.c, or even nt.z, you’ll get Attribute Error exceptions with subtly different error messages. Try that and reflect on the messages.

      • Comment

        Because it’s read-only instance attribute and it’s expected to be immutable

    • using the @dataclass decorator allows the attrs to persist as instance attributes

      :NOTER_PAGE: (208 . 0.488788)

      • Contents

        However, there is no attribute named a in DemoDataClass—in contrast with DemoNTClass from Example 5-11, which has a descriptor to get a from the instances as read-only attributes (that myste‐ rious <_collections._tuplegetter>). That’s because the a attribute will only exist in instances of DemoDataClass. It will be a public attribute that we can get and set, unless the class is frozen. But b and c exist as class attributes, with b holding the default value for the b instance attribute, while c is just a class attribute that will not be bound to the instances.

      • Comment

        when using a decorator, the descriptor for the class that is ONLY type-hinted will only exist in concrete instances of that class.

    • annotation special attr are for type hints

      annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class.

More About @dataclass

  • Don’t set a custom attribute outside of its constructor function!

    :NOTER_PAGE: (209 . 0.862182)

    • Contents

      Setting an attribute after init defeats the dict key-sharing memory optimization mentioned in “Practical Consequences of How dict Works” on page 102.

    • Comment

      Reminder: all the attrs for a class should really just be defined within the class itself to benefit from the memory optimisation that it comes with by default

  • immutability is emulated by methods

    Which means it can be bypassed by overriding the implementation of these functions! (the settattr and deattr dunder methods)

    emulates immutability by generating setattr and delattr, which raise data class.FrozenInstanceError

  • Field Options

    • WARNING: mutable defaults are NOT allowed.

      similar to the assignment gotchas where if we do my arr = [[] * 3], reusing a mutable reference (the inner list) means that the 3 instances all point to the same memory location

      we can how that would be a problematic bug

      therefore, it’s illegal to set default values that are mutable when we use dataclasses.

      we can use default_factory as a solution to this.

    • default_factory helps prevent mutability bugs

      • if a default value is provided that is mutable, then it would mean that many instances can edit the same mutable handle ==> this is a problematic bug. That’s why the default option is only to pass a factory function if you want to assign mutable default values so that each mutable default is a separate reference.

      • but this won’t apply to custom mutable objects, that’s why it’s a common source of mutable data related bugs l

      The default_factory parameter lets you provide a function, class, or any other call‐ able, which will be invoked with zero arguments to build a default value each time an instance of the data class is created. This way, each instance of ClubMember will have its own list—instead of all instances sharing the same list from the class, which is rarely what we want and is often a bug.

    • mental model for sentinel values

      ``sentinel value’’

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      
      ### Understanding Sentinel Values
      
      **Sentinel values** are special values used in programming to signify that a variable or parameter is missing, not set, or has a specific condition that needs to be distinguished from valid values. They serve as markers that help identify states in data structures or function arguments without conflicting with legitimate data.
      
      #### Key Characteristics of Sentinel Values
      
      1. **Uniqueness**: A sentinel value is typically unique and not used as a regular value in the context where it is applied. This uniqueness allows it to serve as a clear indicator of absence or a specific condition.
      
      2. **Common Usage**:
         - **Function Arguments**: In functions, sentinel values can indicate that an optional parameter was not provided. For example, in Python, you might use `None` or a custom sentinel object to differentiate between "no value" and "a value of None".
         - **Data Structures**: In data structures like linked lists or trees, sentinel nodes may be used to simplify boundary conditions (e.g., head and tail pointers).
      
      3. **Examples**:
         ```python
         # Using None as a sentinel
         def fetch_data(key, default=None):
             if key not in my_dict:
                 return default  # Return the sentinel if key is missing
             return my_dict[key]
      
         # Using a custom sentinel
         MISSING = object()
         def get_value(key, default=MISSING):
             if key not in my_dict:
                 if default is MISSING:
                     raise KeyError(f"{key} not found")
                 return default
             return my_dict[key]
         ```
      
      ### Mental Model for Sentinel Values
      
      To form a mental model for sentinel values, consider the **concept of markers or flags** in various contexts:
      
      1. **Etymological Basis**: The term "sentinel" originates from the Latin word "sentinella," meaning "to watch." Just as sentinels guard and signal the presence or absence of something important, sentinel values act as indicators in programming. They help you "watch" for specific conditions in your code.
      
      2. **Mental Model**: Think of sentinel values as **flags on a map**:
         - Imagine a treasure map where certain locations are marked with flags indicating whether they contain treasure (valid data) or are empty (missing data).
         - Just like those flags help you navigate the map without confusion, sentinel values guide your logic by clearly indicating when something is absent or when a specific condition applies.
      
      3. **Practical Application**: When designing functions or data structures, consider how you can implement sentinel values to handle edge cases gracefully. This approach can prevent errors and make your code more robust by explicitly managing conditions that would otherwise lead to ambiguity.
      
      ### Summary
      
      Sentinel values are special markers used to indicate missing or special conditions in programming. They provide clarity and help manage edge cases effectively. By conceptualizing them as flags on a map, you can better understand their role in guiding logic and maintaining code integrity. This mental model emphasizes the importance of clear indicators in navigating complex data scenarios, much like sentinels watch over critical points of interest.
      
      Citations:
      [1] https://www.reddit.com/r/slatestarcodex/comments/83phqz/what_mental_models_do_you_find_to_be_most_helpful/
      [2] https://fs.blog/mental-models/
      [3] https://pmc.ncbi.nlm.nih.gov/articles/PMC8290163/
      [4] https://jamesclear.com/feynman-mental-models
      [5] https://www.linkedin.com/pulse/10x-your-results-mental-models-mark-green
      [6] https://forum.valuepickr.com/t/scalability-thinking-mental-models/104746
      [7] https://www.safalniveshak.com/latticework-mental-models-active-reading/
      [8] https://learn.microsoft.com/ja-jp/globalization/encoding/unicode-standard
      
  • Post-init Processing

    Allows us to insert logic after the contructor, to do things like calculations and such.

    • Highlight on page 213

      • Contents

        method generated by @dataclass only takes the arguments passed and assigns them—or their default values, if missing—to

    • Highlight on page 213

      • Contents

        Common use cases for post_init are validation and computing field values based on other fields.

  • Typed Class Attributes

    Need to use ClassVar from the typing module.

    This is a pseudotype Read more here:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    
    Yes, **this is still the standard and recommended way to declare class variables with type hints in the most recent versions of Python (as of Python 3.13)**.
    
    ### Why is `ClassVar` Needed?
    
    - By default, when you annotate a variable inside a class, **type checkers (like mypy, pyright, or the built-in `dataclasses` module) assume it is an instance variable**—that is, each instance of the class will have its own copy.
    - If you want to indicate that a variable is a **class variable** (shared across all instances), you must use `typing.ClassVar` in your annotation[1][2].
    - This is especially important for tools like `@dataclass`, which will otherwise treat all annotated variables as fields to be included in the instance's `__init__`.
    
    ### The Syntax
    
    ```python
    from typing import ClassVar
    
    class Example:
        all_handles: ClassVar[set[str]] = set()
    ```
    - This tells both the type checker and `@dataclass` (if used) that `all_handles` is a class attribute, not an instance attribute.
    - The type inside `ClassVar[...]` can be any valid type, including generics like `set[str]`.
    
    ### Why is this necessary/ugly?
    
    - The reason for this "pseudotype" is that Python's variable annotations (PEP 526) are designed to be explicit and unambiguous for static analysis and code generation tools.
    - Without `ClassVar`, type checkers and dataclasses would treat the variable as an instance field, which is almost always wrong for shared state.
    
    ### Is this still the case in Python 3.13+?
    
    **Yes.**
    - The [official Python docs](https://docs.python.org/3/library/typing.html) (as of Python 3.13.5) still require `ClassVar` for class variables in type annotations[2].
    - This is also reflected in community best practices and in the implementation of type checkers and the standard library[1][2][3][4].
    
    ### Example with `@dataclass`
    
    ```python
    from dataclasses import dataclass
    from typing import ClassVar
    
    @dataclass
    class Example:
        all_handles: ClassVar[set[str]] = set()  # Not included as a dataclass field
        name: str  # This is an instance variable
    ```
    - `all_handles` is not included in the generated `__init__`, nor in the instance's `__dict__`.
    
    ### Related: `Final` and `ClassVar`
    
    - If you want a class variable to be a constant, you can use `Final` from `typing`, but **you cannot combine `Final` and `ClassVar`** in the annotation (see [1]). You must choose one or the other, as combining them is not supported by type checkers.
    
    ### References to Search Results
    
    - [PEP 526 and usage of ClassVar](https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python)[1]
    - [Python 3.13 typing documentation](https://docs.python.org/3/library/typing.html)[2]
    - [Class variables in Python](https://pynative.com/python-class-variables/)[5]
    
    **In summary:**
    Declaring class variables with type hints using `ClassVar[...]` is still required and is the correct, modern, and supported approach in all current and foreseeable future versions of Python. This ensures type checkers and dataclasses handle your class variables as intended.
    
    [1] https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python
    [2] https://docs.python.org/3/library/typing.html
    [3] https://github.com/python/typing/discussions/1424
    [4] https://programming-25.mooc.fi/part-9/5-class-attributes/
    [5] https://pynative.com/python-class-variables/
    [6] https://docs.python.org/3/tutorial/classes.html
    [7] https://www.digitalocean.com/community/tutorials/understanding-class-and-instance-variables-in-python-3
    [8] https://realpython.com/python-variables/
    [9] https://programming-25.mooc.fi/part-8/3-defining-classes/
    [10] https://github.com/python/typing/discussions/1636
    
  • Initialization Variables That Are Not Fields

Data Class as a Code Smell

  • Data Class as Scaffolding
  • value: dataclasses are a good way to do temp wire-ups / stubs

    In this scenario, the data class is an initial, simplistic implementation of a class to jump-start a new project or module. With time, the class should get its own methods, instead of relying on methods of other classes to operate on its instances. Scaffolding is temporary;

  • Data Class as Intermediate Representation

Pattern Matching Class Instances

  • pattern matching in python is new

    • It’s very similar to elixir’s pattern matching and is different from a typical case-switch construct

    • it typically uses the __match_args__ dunder declaration, will be discussed in a later part of the book

    Here’s some details on it

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    
    Python's `match-case` syntax, introduced in Python 3.10, offers a powerful alternative to traditional `switch-case` constructs found in other programming languages like C, C++, and Java. Here’s a detailed comparison of Python's `match-case` with traditional `switch-case` syntax.
    
    ### Key Differences Between Python's `match-case` and Traditional `switch-case`
    
    1. **Pattern Matching vs. Value Matching**:
       - **Python's `match-case`**: Supports **pattern matching**, which means it can match complex data structures, such as lists, tuples, and even class instances. It allows for destructuring and extracting values from these structures.
         ```python
         match some_value:
             case (x, y):  # Matches a tuple with two elements
                 print(f"Matched a tuple with x={x} and y={y}")
             case _:
                 print("No match")
         ```
       - **Traditional `switch-case`**: Typically only matches against scalar values (like integers or strings) and does not support destructuring. It evaluates the expression and compares it against constant cases.
         ```c
         switch (value) {
             case 1:
                 printf("One");
                 break;
             case 2:
                 printf("Two");
                 break;
             default:
                 printf("Default case");
         }
         ```
    
    2. **Wildcards and Default Cases**:
       - **Python's `match-case`**: Uses the underscore (`_`) as a wildcard to catch all unmatched cases, similar to an `else` statement.
       - **Traditional `switch-case`**: Uses a `default` case for handling unmatched values, but it requires explicit declaration.
    
    3. **Multiple Patterns**:
       - **Python's `match-case`**: Allows combining multiple patterns using the pipe operator (`|`) for cases that should execute the same block of code.
         ```python
         match day:
             case "Saturday" | "Sunday":
                 print("It's the weekend!")
             case _:
                 print("It's a weekday.")
         ```
       - **Traditional `switch-case`**: Requires separate cases for each value or uses fall-through behavior (if not explicitly handled with `break`).
    
    4. **No Break Statements Needed**:
       - **Python's `match-case`**: Automatically exits after executing the matched case block, eliminating the need for `break` statements to prevent fall-through.
       - **Traditional `switch-case`**: Requires explicit use of `break` to prevent fall-through to subsequent cases.
    
    5. **Guard Conditions**:
       - **Python's `match-case`**: Supports guard conditions using an `if` statement within the case clause to add additional checks.
         ```python
         match details:
             case [amt, duration] if amt < 10000:
                 return amt * 0.1 * duration
             case [amt, duration] if amt >= 10000:
                 return amt * 0.15 * duration
         ```
       - **Traditional `switch-case`**: Does not natively support guard conditions; you would need to use additional if-else statements.
    
    ### Summary
    
    - Python's `match-case` syntax is more flexible and powerful than traditional `switch-case`, allowing for complex pattern matching and destructuring of data structures.
    - It simplifies code by removing the need for break statements and supports more expressive patterns through guards and multiple patterns.
    - While both constructs serve similar purposes in controlling flow based on variable values, Python's approach aligns more closely with modern programming paradigms that emphasize readability and expressiveness.
    
    In conclusion, while Python's `match-case` serves a similar purpose to traditional switch-case statements in other languages, it introduces significant enhancements that make it more versatile and easier to use in many scenarios.
    
    Citations:
    [1] https://www.geeksforgeeks.org/python-match-case-statement/
    [2] https://www.tutorialspoint.com/python/python_matchcase_statement.htm
    [3] https://www.youtube.com/watch?v=L7tT0NZF-Ag
    [4] https://www.datacamp.com/tutorial/python-switch-case
    [5] https://discuss.python.org/t/providing-a-shorthand-match-case-statement/21421
    [6] https://stackoverflow.com/questions/74655787/match-case-statement-with-multiple-or-conditions-in-each-case
    [7] https://www.youtube.com/watch?v=prB2lfuPDAc
    [8] https://docs.python.org/pt-br/3.13/whatsnew/3.10.html
    
  • Designed to match classes instances by types and by attrs

    • Contents

      Class patterns are designed to match class instances by type and—optionally—by attributes. The subject of a class pattern can be any class instance, not only instances of data classes.10

  • Keyword Class Patterns

    • Captures also work with this syntax

      • Contents

        Keyword class patterns are very readable, and work with any class that has public instance attributes, but they are somewhat verbose.

  • Positional Class Patterns

    The pattern for an attribute can be defined positionally as well.

    Named collectors / captures still work with this.

Chapter Summary

  • Dataclasses as a code smell

    • Contents

      warned against possible abuse of data classes defeating a basic principle of object-oriented programming: data and the functions that touch it should be together in the same class. Classes with no logic may be a sign of misplaced logic.

Further Reading

  • Highlight on page 228

    • Contents

      Finally, if you want to annotate that class attribute with a type, you can’t use regular types because then it will become an instance attribute. You must resort to that pseu‐ dotype ClassVar annotation:

  • Underline on page 228

    • Contents

      Here we are

Chapter 6. Object References, Mutability, and Recycling

What’s New in This Chapter

Variables Are Not Boxes, they are labels

  • updated mental model and language

    We should see it as a “to bind” instead of “to assign” whereby a name is bound to an object.

    A sticky note is a better image rather than a box.

Identity, Equality, and Aliases

  • id() checking

    the is operator does id checking, the = operator uses whatever the __eq__ is defined as (typically value-based checking).

    programming. Identity checks are most often done with the is operator, which compares the object IDs, so our code doesn’t need to call id() explicitly.

  • Choosing Between == and is

    for equality check, use ==

    for identity check, use is, this avoids the direct use of id() since

    • is is used when comparing with singletons – typically just None

      So the correct way to do None check is via a singleton

      However, if you are comparing a variable to a singleton, then it makes sense to use is. By far, the most common case is checking whether a variable is bound to None. This is the recommended way to do it: x is None And the proper way to write its negation is: x is not None None is the most common singleton we test with is.

    • Default to == if unsure
    • is is faster than == because it can’t be overloaded
  • The Relative Immutability of Tuples

    this is why tuples are unhashable, because they are container types and though they are immutable, their containees may not be

Copies Are Shallow by Default

  • shallow copying is more of a problem if mutable items within the inner nestings

    This saves memory and causes no problems if all the items are immutable. But if there are mutable items, this may lead to unpleasant surprises.

  • shallow-copy negative example

    the example below will demonstrate how when the inner element is mutable, then only the reference is copied, so if we modify that, then the original mutable entity gets mutated.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    l1 = [3, [66, 55, 44], (7, 8, 9)]
    l2 = list(l1)
    l1.append(100)
    l1[1].remove(55) # removal removes from both of the nested arrays since it's the same reference
    print('l1:', l1)
    print('l2:', l2)
    l2[1] += [33, 22]
    l2[2] += (10, 11)
    print('l1:', l1)
    print('l2:', l2)
    
  • Deep and Shallow Copies of Arbitrary Objects
  • complexity in cyclical references

    if it’s a naive implementation, circular references can give deepcopying an issue, but the usual deepcopy will handle things alright, not to worry

    this is because deepcopy does a graph-traversal of the original object and uses a memo table to keep track of references.

    Note that making deep copies is not a simple matter in the general case. Objects may have cyclic references that would cause a naïve algorithm to enter an infinite loop. The deepcopy function remembers the objects already copied to handle cyclic refer‐ ences gracefully.

Function Parameters as References

“Call by sharing”/“pass by reference” is the only mode of parameter passing in python.

This is usually the case for OOP languages in general (JS, Ruby, Java [though in Java, primitive types are call by value])

  • Mutable Types as Parameter Defaults: Bad Idea

    • the default params, if mutable and used, will all point to the same SHARED mutable obj since the params are just aliases to it

      issue with mutable defaults explains why None is commonly used as the default value for parameters that may receive mutable values.

      demonstrates, when a HauntedBus is instantiated with passengers, it works as expected. Strange things happen only when a HauntedBus starts empty, because then self.passengers becomes an alias for the default value of the passengers parameter. The problem is that each default value is evaluated when the function is defined—i.e., usually when the module is loaded—and the default values become attributes of the function object. So if a default value is a mutable object, and you change it, the change will affect every future call of the function.

  • Defensive Programming with Mutable Parameters
  • Principle of Least Astonishment == no surprising side-effects

    TwilightBus violates the “Principle of least astonishment,” a best practice of interface design.3 It surely is astonishing that when the bus drops a student, their name is removed from the basketball team roster.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    ### TL;DR: Principle of Least Astonishment (POLA)
    
    The **Principle of Least Astonishment** (POLA), also known as the **Principle of Least Surprise**, is a design guideline in user interface and software design that emphasizes creating systems that behave in ways users expect. The main idea is to minimize confusion and surprises, ensuring that interactions are intuitive and predictable.
    
    #### Key Points:
    - **User Expectations**: Systems should align with users' mental models and past experiences to reduce cognitive load and learning curves.
    - **Behavior Consistency**: Components of a system should operate consistently, following common conventions to avoid unexpected behavior.
    - **Applications**: POLA applies across various aspects of design, including user interfaces, API design, and error handling.
    - **Benefits**: Adhering to POLA leads to improved usability, reduced development time, enhanced maintainability, and increased user satisfaction.
    
    By following the Principle of Least Astonishment, designers can create more intuitive and user-friendly applications that enhance overall user experience.
    
    Citations:
    [1] https://pointjupiter.com/ultimate-guide-principle-of-least-astonishment-pola/
    [2] https://en.wikipedia.org/wiki/Least_surprise
    [3] https://deviq.com/principles/principle-of-least-astonishment/
    [4] https://usertimes.io/2018/12/07/the-principle-of-least-astonishment/
    [5] https://www.centercode.com/glossary/principle-of-least-surprise
    [6] https://www.linkedin.com/pulse/principle-least-surprise-incus-data-pty-ltd
    [7] https://dovetail.com/ux/principle-of-least-surprise/
    [8] https://barrgroup.com/blog/how-endianness-works-big-endian-vs-little-endian
    
    • rule of thumb on when to alias vs make a copy on mutable args

      Just make a copy if you’re not sure (when you’re going to be consuming a mutable argument).

      Unless a method is explicitly intended to mutate an object received as an argument, you should think twice before aliasing the argu‐ ment object by simply assigning it to an instance variable in your class. If in doubt, make a copy. Your clients will be happier. Of course, making a copy is not free: there is a cost in CPU and mem‐ ory. However, an API that causes subtle bugs is usually a bigger problem than one that is a little slower or uses more resources.

del and Garbage Collection

del is a statement and not a function, that’s why ew don’t do del(x), we do del x (though, this will work too)

  • Weak references are useful to have pointers but not affect refcount for an obj

    • good to do monitoring / caching activities using weak references
    • see this for more elaboration: Weak References | Fluent Python, the lizard book
    • To inspect whether an object is still alive without holding a strong reference, Python provides the weakref module. A weakref to an object returns None if the object has been garbage collected, effectively giving you a safe way to test “dangling-ness”:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      
        import weakref
      
        class MyClass:
            pass
      
        obj = MyClass()
        obj_id = id(obj)
        weak_obj = weakref.ref(obj)
      
        print(weak_obj())  # <MyClass object at ...>
      
        del obj
      
        print(weak_obj())  # None, indicating the original object was garbage collected
      

    This works because finalize holds a weak reference to {1, 2, 3}. Weak references to an object do not increase its reference count. Therefore, a weak reference does not prevent the target object from being garbage collected. Weak references are useful in caching applica‐ tions because you don’t want the cached objects to be kept alive just because they are referenced by the cache.

  • we can actually use ctypes to read memory spaces directly!

    this memory location will have to be casted first though.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    import ctypes
    
    x = 42
    address = id(x)
    
    # Use ctypes to cast the address back to a Python object and get its value
    value = ctypes.cast(address, ctypes.py_object).value
    
    print(value)  # Output: 42
    
  • the __del__ method is more like a fini teardown

    unlikely that we actually will need to implement it.

    if implemented for a class, it gets called by the interpreter before freeing up the memory.

    also kind of depends on the implementation of python itself, e.g. some might keep track of more than just refcounts.

Tricks Python Plays with Immutables \(\rightarrow\) Interned Immutables

  • Interning as an optimisation technique for the internal python implementation

    Basically some strings and common ints are shared memory, avoids unnecessary duplication.

    • won’t work if you use .copy()
    • won’t work if you use [:]

    NOTE: What is interened or not can’t be always determined because that implementation detail is undocumented.

    NOTE: therefore, for immutables, always check sameness using = instead of =is

    The sharing of string literals is an optimization technique called interning. CPython uses a similar technique with small integers to avoid unnecessary duplication of num‐ bers that appear frequently in programs like 0, 1, –1, etc. Note that CPython does not intern all strings or integers, and the criteria it uses to do so is an undocumented implementation detail. Never depend on str or int interning! Always use == instead of is to compare strings or integers for equality. Interning is an optimi‐ zation for internal use of the Python interpreter.

Chapter Summary

  • Practical Consequences of using references

    1. simple assignment doen’t create copies
    2. for augmented assignments e.g. +=, *=, it depends on the LHS variable:
      • if bound to immutable object, then it creates new objects
      • if bound to mutable object, then it modifies that object in place
    3. re-binding: assigning a new value to an existing variable doesn’t change the object previously bound to it, the var is just boudn to a different object.
    4. function params are passed as aliases
      • mutable objects may get mutated unless the consumer function copies it
    5. it’s dangerous to use mutable default values \(\implies\) that’s why the convention is to use None instead.

Further Reading

  • Object identity becomes important only when objects are mutable

    • if everything was immutable, it makes no difference whether variables hold actual objects or they hold refs to shared objects (intered). Just comparing them by value would hanve been sufficient.
  • mutable objects end up being the reason why threaded programming is hard

    • if multiple threads mutate objects and the synchronization is not handled correctly, then it leads to corrupted data
  • GC used to be just refcounts, but that can leak memory (e.g. when there are refcycles to unreachable objects, leading to cyclic garbage). Current GC is a generational GC.

    Mental Model:

    The memory is thought of as having generations: each generation is a collection of objects grouped by how long they’ve existed.

    Younger generations (new objects) are collected (checked for unreachable cycles) frequently. Older generations (objects that survived earlier collections) are checked less often.

    More elaboration:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    
      ### Generational Garbage Collector in CPython: Mental Model and Rationale
    
      #### 1. **Reference Counting Only: Its Limits and Memory Leaks**
    
      - **Reference Counting** (the core memory management scheme in CPython) works by keeping a count of how many references exist to each object. When the count drops to zero, memory is released immediately.
    ​  - **Primary shortcoming:** If objects reference each other (e.g., two lists referencing each other), but nothing outside references them, neither’s count drops to zero. They become **"garbage"**—unreachable—but their counts never reach zero. This is a **classic memory leak**: unused memory that cannot be reclaimed.
    
      #### 2. **Generational GC: Solving Cycles and Leaks**
    
      To address cyclical references—and reduce overhead—CPython complements refcounting with a **generational garbage collector** (`gc` module).
    
      **Mental Model:**
    ​  - The memory is thought of as having *generations*: each generation is a collection of objects grouped by how long they've existed.
    ​  - **Younger generations** (new objects) are *collected* (checked for unreachable cycles) frequently. **Older generations** (objects that survived earlier collections) are checked *less* often.
    
      #### 3. **Why Generational GC Is Effective**
    
      - **Empirical observation:** Most objects in Python die young (they become unreachable soon after they're created). Therefore, checking *new* objects often is efficient.
    ​  - **Cyclic collection:**
    ​      - During collection, the GC looks for reference cycles—sets of objects referring only to each other but not from elsewhere.
    ​      - The GC can safely reclaim all objects in such cycles.
    ​      - By extending beyond simple refcounting, the cycle detector enables memory occupied by unreachable cycles to be safely released.
    ​  - **Old objects that survive collections are promoted to older generations**; these are checked less frequently, reducing unnecessary overhead.
    
      #### 4. **Generational Structure in CPython**
    
      CPython typically uses *three generations*:
    ​  - **Generation 0**: Collected (checked) most frequently; new objects start here.
    ​  - **Generation 1**: Objects promoted from gen 0 if they survive one collection.
    ​  - **Generation 2**: The oldest and least frequently collected generation; objects promoted from gen 1 after surviving further collections.
    
      Collections trigger:
    ​  - Automatically based on allocation thresholds.
    ​  - Explicitly via the `gc.collect()` API.
    
      #### 5. **Memory Leak Solution: How It Works**
    
      - **Pure reference counting** cannot detect cyclic garbage, leading to leaks.
    ​  - **Generational GC** *detects* and *collects* cyclically-linked groups of unreachable objects, returning their memory to the system.
    ​  - Thus, even if the reference count of an object never drops to zero due to a reference cycle, the GC will eventually detect and collect it if it has become unreachable.
    
      #### 6. **Practical Takeaways for Tech Leaders**
    
      - **Mental Model:** CPython’s memory management is twofold—reference count for immediacy and generational GC for cycle detection.
    ​  - **Leak prevention:** Programmers need not (and usually cannot) manually break all cycles; the GC rescues memory otherwise lost in cycles.
    ​  - **Performance:** The generational design reduces overhead by focusing frequent scans on objects most likely to be garbage.
    
      #### 7. **Further Reading and References**
    
      - The CPython documentation for the `gc` module provides details and empirical thresholds for collection.
    ​  - Deep dives into Python’s memory management explain the symbiosis of refcounting and generational GC as a pragmatic solution balancing immediacy, overhead, and completeness (detection of cycles).
    
      **In summary:**
      A generational garbage collector in CPython efficiently manages memory by combining reference counting (for immediate reclamation) with cycle detection (generational collection). This hybrid approach solves the memory leak issue inherent in pure reference-counted systems—cycles are detected and collected—making Python both safe and performant for real-world programs.
    
  • Rebinding a ref within a fn body doesn’t effect changes outside the fn because it’s a copy of the ref

    because the function gets a copy of the reference in an argument, rebinding it in the function body has no effect outside of the function.

Part II. Functions as Objects

Chapter 7. Functions as First-Class Objects

Definition of a First Class Object

Programming language researchers define a “first-class object” as a program entity that can be:

  • Created at runtime
  • Assigned to a variable or element in a data structure
  • Passed as an argument to a function
  • Returned as the result of a function

What’s New in This Chapter

Treating a Function Like an Object

  • the __doc__ attribute is typically used for the help(<fn_name>)

  • having fist-class functions enables programming in a functional style

Higher-Order Functions

  • argument and return type are both functions

  • apply has been deprecated because we can define **kwargs now

    so a function can be defined as fn(*args, **kwargs)

  • Modern Pythonic Replacements for map, filter, and reduce

    • map and filter return generators \(\implies\) we can just directly define genexps now instead.

    • reduce is no longer a builtin, KIV this for a later chapter in this book.

Anonymous Functions

  • The best use of anonymous functions is in the context of an argument list for a higher-order function.

    e.g. sorted(fruits, key=lambda word:word[::-1])

  • python anon functions have to be pure functions, limited language features available in lambda definition

    However, the simple syntax of Python limits the body of lambda functions to be pure expressions. In other words, the body cannot contain other Python statements such as while, try, etc. Assignment with = is also a statement, so it cannot occur in a lambda.

  • := assignment syntax exists

    The new assignment expression syntax using := can be used—but if you need it, your lambda is probably too complicated and hard to read, and it should be refac‐ tored into a regular function using def.

⭐️ The Nine Flavors of Callable Objects

  • () is the call operator
  • genrators, native coroutines and async generator functions are different from the rest of the callables types in that their return values are never applicaitons data, it’s objects that need further processing to yiled application data / do useful work.
    1. callable: User-defined Functions
    1. callable: builtin functions

    e.g. len

    1. callable: builtin methods

    methods implemented in C e.g. dict.get

    1. callable: methods:

    fns defined in the body of a class

    1. callable: Class
    • classes are callable, it calls __new__ \(\rightarrow\) __init__

    • we an override __new__ actually

    • there’s no new operator in Python

    When invoked, a class runs its new method to create an instance, then init to initialize it, and finally the instance is returned to the caller. Because there is no new operator in Python, calling a class is like calling a function.2

    1. callable: class instances

    if we define a call method in the class

    1. callable: Generator Functions
    • they have yield in the function body – when called, they return a generator object

    Generator functions Functions or methods that use the yield keyword in their body. When called, they return a generator object.

    1. callable: native coroutine functions
    • functions that are async def \(\rightarrow\) when called, they return a coroutine object

    Native coroutine functions Functions or methods defined with async def. When called, they return a coroutine object.

    1. callable: async generator functions
    • have a yield -> returns generators
    • are async def -> generators are to be used with async for

    Asynchronous generator functions Functions or methods defined with async def that have yield in their body. When called, they return an asynchronous generator for use with async for. Added in Python 3.6.

User-Defined Callable Types

  • this relates to class instances that work as callable objects.

  • usecases:

    the main idea is really to have objects that can store their own internal state and are callable.

    Here’s two concrete cases:

    1. A class implementing __call__ is an easy way to create function-like objects that have some internal state that must be kept across invocations, like the remaining items in the BingoCage

    2. Decorators that need to remember state between calls of the decorator e.g. the @cache decorator

      this is also useful to split a complex implementation into staged, multi-decorator implementations.

      we’re familiar with this from using frameworks like flask or django where functions can have many decorators applies that does some pre-processing.

      • Practical Takeaways

        • Mental Model: Think of the decorator stack as a pipeline: each decorator acts before or after your main logic, transforming input, output, or context as needed.

        • Framework idioms: Nearly every Flask or Django extension adds its value via decorators to manage permissions, caching, rate limits, etc.

        • Custom use: You can easily define your own decorators to factor out repeated steps in your app for preprocessing (e.g., parameter parsing), postprocessing (e.g., formatting responses), or injecting cross-cutting security checks.

        In summary, the practice of splitting implementation using decorators (as in Flask/Django) is a core idiom for composing, reusing, and organizing web application logic in Python frameworks—and is a concrete, high-impact use of the broader decorator pattern described previously

        elaboration on the splitting up part:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        
               Splitting a concrete implementation into separate methods in Python using decorators typically refers to using decorators as a modular way to extend, preprocess, or postprocess your functions—effectively breaking up a monolithic function into composable steps. Decorators allow you to separate concerns and add reusable behaviors before or after the core logic, without modifying the original function directly.
        
               ## Key Concepts
        
               - **Decorators** are functions (or classes) that take another function/method, wrap it with additional behavior, and return the wrapped function.
        ​       - This is useful for **"splitting" implementation concerns**: e.g., one decorator handles logging, another argument validation, another result transformation, etc.
        ​       - Using multiple decorators, you can stack behaviors, creating a pipeline of processing stages for your function[1][5].
        
               ## Example: Splitting String Processing Steps
        
               Suppose you have a function that returns a string and you want to:
               1. Convert the result to uppercase.
               2. Then split it into words.
        
               You can write two decorators and apply them:
        
               ```python
               import functools
        
               def uppercase_decorator(func):
                   @functools.wraps(func)
                   def wrapper(*args, **kwargs):
                       result = func(*args, **kwargs)
                       return result.upper()
                   return wrapper
        
               def split_string_decorator(func):
                   @functools.wraps(func)
                   def wrapper(*args, **kwargs):
                       result = func(*args, **kwargs)
                       return result.split()
                   return wrapper
        
               @split_string_decorator
               @uppercase_decorator
               def greet():
                   return 'hello world'
        
               print(greet())  # Output: ['HELLO', 'WORLD']
               ```
        
               Here, the function’s logic is split: first uppercase conversion, then string splitting—each in its own decorator[1][5].
        
               ## Why Use This Pattern?
        
               - **Separation of concerns**: Each decorator encapsulates a single aspect, making your code easier to maintain, test, and reuse[1][6][9].
        ​       - **Composable behaviors**: You can stack/reorder behaviors as needed.
        ​       - **Cleaner implementation**: The core function stays focused on its business logic.
        
               ## Common Uses
        
               - **Argument or result preprocessing/postprocessing** (e.g., stripping strings, timing execution, error handling)[7].
        ​       - **Cross-cutting concerns** like authentication, logging, caching.
        ​       - **Extending methods in classes** with shared behaviors (see below caveat when decorating class methods)[3][4][8].
        
               ## Important Caveats
        
               - When applying decorators to class methods, the decorator receives either a function or a method, depending on implementation. You may need to ensure your decorator handles `self`/`cls` correctly, or else prefer using `functools.wraps` and testing thoroughly[3][4][8].
        ​       - The order of decorator application matters, especially if each processes return values or arguments—always document and manage dependencies between them[1][5].
        
               ## Summary Table
        
               | Use Case         | Approach                      | Notes                           |
               |------------------|------------------------------|---------------------------------|
               | Preprocessing    | Decorator before function    | E.g., strip, validate inputs    |
               | Postprocessing   | Decorator after function     | E.g., format, cache results     |
               | Cross-cutting    | Multiple stacked decorators  | E.g., logging + auth + cache    |
        
               In summary, splitting a concrete implementation using decorators lets you modularize, extend, and compose functionality in clear, reusable units[1][6][9]. This is a foundational pattern in Python for writing clean, maintainable, and flexible code.
        
               [1] https://www.datacamp.com/tutorial/decorators-python
               [2] https://www.thepythoncodingstack.com/p/demystifying-python-decorators
               [3] https://stackoverflow.com/questions/62857278/different-behavior-when-implementing-a-decorator-in-python-with-a-function-or-a
               [4] http://blog.dscpl.com.au/2014/01/how-you-implemented-your-python.html
               [5] https://www.youtube.com/watch?v=NyHuBGqCBII
               [6] https://www.codingem.com/what-are-python-decorators/
               [7] https://stackoverflow.com/questions/13698549/python-decorator-for-function-argument-preprocessing
               [8] https://github.com/GrahamDumpleton/wrapt/blob/develop/blog/10-performance-overhead-when-applying-decorators-to-methods.md
               [9] https://itnext.io/python-decorators-demystified-cef81d7e986d
               [10] https://refactoring.guru/design-patterns/decorator/python/example
        

        elaboration linked to flask / django:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        
               Your observation is highly relevant: **Flask and Django make extensive use of Python decorators to split and modularize concrete implementations of web functionality.** This pattern is foundational to how these frameworks are built and used.
        
               ### Contextualized Explanation with Flask/Django
        
               - **In Flask:** Every view (endpoint) is just a function, and decorators are used to transparently inject additional behavior before or after your function executes. The classic example is `@app.route`, which registers the function as a route handler. Beyond routing, decorators are used extensively for **authentication (`@login_required`), templating, caching, API argument validation, CORS, response formatting, and more**.[1][2][5]
        
                   - For example, a "login required" decorator checks authentication before the underlying view runs:
                   ```python
                   from functools import wraps
                   from flask import g, request, redirect, url_for
        
                   def login_required(f):
                       @wraps(f)
                       def decorated_function(*args, **kwargs):
                           if g.user is None:
                               return redirect(url_for('login', next=request.url))
                           return f(*args, **kwargs)
                       return decorated_function
        
                   @app.route('/secret')
                   @login_required
                   def secret_view():
                       ...
                   ```
                   Here, **the authentication check is split out from your business logic** into a reusable decorator[5].
        
                   - Another example, templating can be split using a decorator. Rather than explicitly rendering a template in every view, you can return just the context dictionary, and a decorator handles the rendering:
                   ```python
                   def templated(template):
                       def decorator(f):
                           @wraps(f)
                           def decorated_function(*args, **kwargs):
                               ctx = f(*args, **kwargs)
                               return render_template(template, **ctx)
                           return decorated_function
                       return decorator
        
                   @app.route('/')
                   @templated('index.html')
                   def index():
                       return dict(value=42)
                   ```
                   This **splits view organization (data preparation) from view rendering**[1][5].
        
               - **In Django:** The pattern is similar, although Django adds a distinction between **middleware** (which wraps the whole request/response pipeline) and **decorators** (which wrap only specific views).
        ​           - Decorators like `@login_required`, `@require_POST`, `@csrf_exempt`, etc., **modularize cross-cutting concerns** so they can be composed and reused across multiple views.
        ​           - Django middleware shares some similarities with decorators, acting at a broader scope[4].
        
               ### Why Frameworks Encourage This
        
               - **Separation of concerns:** Decorators allow you to extract generic, orthogonal logic (like permissions, logging, or caching) from the core business function, improving maintainability and testability.
        ​       - **Composable behaviors:** By stacking multiple decorators (order matters), you can chain checks, transformations, or wrappers around your views in a clear, declarative fashion[2].
        ​       - **Minimizing boilerplate:** You avoid repetition—rather than implementing access checks, error handling, or formatting logic in each view, you write these as reusable decorators once.
        
               ### Practical Takeaways
        
               - **Mental Model:** Think of the decorator stack as a pipeline: each decorator acts before or after your main logic, transforming input, output, or context as needed.
        ​       - **Framework idioms:** Nearly every Flask or Django extension adds its value via decorators to manage permissions, caching, rate limits, etc.
        ​       - **Custom use:** You can easily define your own decorators to factor out repeated steps in your app for preprocessing (e.g., parameter parsing), postprocessing (e.g., formatting responses), or injecting cross-cutting security checks.
        
               In summary, **the practice of splitting implementation using decorators (as in Flask/Django) is a core idiom for composing, reusing, and organizing web application logic in Python frameworks**—and is a concrete, high-impact use of the broader decorator pattern described previously[1][2][5].
        
               ---
               [1]: View Decorators — Flask Documentation (3.1.x)
               [2]: Advanced patterns for views and routing - Explore Flask
               [4]: Are middlewares an implementation of the Decorator ...
               [5]: 12. View Decorators
        
               [1] https://flask.palletsprojects.com/en/stable/patterns/viewdecorators/
               [2] http://explore-flask.readthedocs.org/en/latest/views.html
               [3] https://www.geeksforgeeks.org/python/decorator-method-python-design-patterns/
               [4] https://stackoverflow.com/questions/48696631/are-middlewares-an-implementation-of-the-decorator-pattern
               [5] https://tedboy.github.io/flask/patterns/viewdecorators.html
               [6] https://flask.palletsprojects.com/en/stable/design/
               [7] https://moldstud.com/articles/p-flask-and-restful-design-patterns-a-practical-overview-for-developers
               [8] https://stackoverflow.com/questions/48248771/design-patterns-for-flask-api-implementation/63376123
               [9] https://python.plainenglish.io/the-magic-of-python-decorators-from-basics-to-advanced-techniques-94d25dc5f920
               [10] https://dev.to/senichimaro/flask-decorator-something-57gd
        
  • checking if something is callable

    just use the callable() builitn and it will give us a boolean on whether it’s callable

  • for custom obj to be callable, just have to implement __call__

    arbitrary Python objects may also be made to behave like functions. Implementing a call instance method is all it takes.

    • usecase - example use case could be having an alias to a default func within the class e.g. bingo.pick() and bingo()
  • usecase - implementing decorators that hold state across decorator invocation

    :NOTER_PAGE: (270 0.4916420845624386 . 0.35096774193548386)

    Another good use case for call is implementing decora‐ tors. Decorators must be callable, and it is sometimes convenient to “remember” something between calls of the decorator (e.g., for memoization—caching the results of expensive computations for later use) or to split a complex implementation into separate methods. The functional approach to creating functions with internal state is to use closures. Closures, as well as decorators, are the subject of Chapter 9.

  • closures are the functional approach to having functions with internal state (in contrast to decorators being used for this same purpose)

    KIV this for a future chapter (chapter 9)

From Positional to Keyword-Only Parameters

Python’s argument declaration features.

This part is about argument captures when defining functions.

There’s two types of interesting ways to pass variadic params:

  1. positional *args
  2. keyword based **kwargs

NOTE: the usual params can always be passed in keyword fashion (unless it’s positional-only)

  • Keyword-only parameters

    There’s two types of argument passing to consider when writing functions:

    • variable positional args

      • accepting multiple positional arguments as tuple using *<arg_name>

      • this allows us to have variadic params

      • this gives us a positional tuple for these args

      ``def tag(name, *content, class_=None, **attrs):’’

    • we can define a keyword-only argument signature for functions

      To specify keyword-only arguments when defining a function, name them after the argument prefixed with *.

      • if don’t want to support variable position args, then do def f(a, *, b):

      Keyword-only arguments are a feature of Python 3. In Example 7-9, the class_ parameter can only be given as a keyword argument—it will never capture unnamed positional arguments. To specify keyword-only arguments when defining a function, name them after the argument prefixed with *. If you don’t want to support variable positional arguments, then just do this: def f(a, *, b):

  • Positional-Only Parameters (using /)

    • defining position-only params to a function

      e.g. def divmod(a,b,/)

      after the / we can specify args as per usual

      To define a function requiring positional-only parameters, use / in the parameter list. This example from “What’s New In Python 3.8” shows how to emulate the divmod built-in function: def divmod(a, b, ): return (a / b, a % b) All arguments to the left of the / are positional-only. After the /, you may specify other arguments, which work as usual.

Packages for Functional Programming

The intent wasn’t to support the typical functional paradigms but there is support.

The operator module is for useful operations, the functools modules is for useful higher order functions.

functools also has some higher order functions that can be used as decorators (e.g. cache, singledispatch)

  • The operator Module

    • the operator module has a bunch of callable functions that we can use as params for higher order functions.

    • TO_HABIT: use itemgetter and attrgetter more when accessing nested objects!

    • provides function equivalents to operators for trivial things

      These are callable functions, e.g. they are

      • itemgetter: Essentially, itemgetter(1) creates a function that, given a collection, returns the item at index 1. That’s easier to write and read than lambda fields: fields[1], which does the same thing.

      • attrgetter: attrgetter supports nested accesses

        :NOTER_PAGE: (275 0.24311023622047243 . 0.09819121447028424)

        A sibling of itemgetter is attrgetter, which creates functions to extract object attributes by name. If you pass attrgetter several attribute names as arguments, it also returns a tuple of values. In addition, if any argument name contains a . (dot), attrgetter navigates through nested objects to retrieve the attribute.

        attrgetter demo

        :NOTER_PAGE: (275 0.4183070866141732 . 0.09819121447028424)

      Example 7-14. Demo of attrgetter to process a previously defined list of namedtuple called metro_data (the same list that appears in Example 7-13) >>> from collections import namedtuple >>> LatLon = namedtuple(‘LatLon’, ’lat lon’) >>> Metropolis = namedtuple(‘Metropolis’, ’name cc pop coord’) >>> metro_areas = [Metropolis(name, cc, pop, LatLon(lat, lon)) … for name, cc, pop, (lat, lon) in metro_data] >>> metro_areas[0] Metropolis(name=‘Tokyo’, cc=‘JP’, pop=36.933, coord=LatLon(lat=35.689722, lon=139.691667)) >>> metro_areas[0].coord.lat 35.689722 >>> from operator import attrgetter >>> name_lat = attrgetter(’name’, ‘coord.lat’) >>> >>> for city in sorted(metro_areas, key=attrgetter(‘coord.lat’)): … print(name_lat(city)) … (‘São Paulo’, -23.547778) (‘Mexico City’, 19.433333) (‘Delhi NCR’, 28.613889) (‘Tokyo’, 35.689722) (‘New York-Newark’, 40.808611)

      also can refer to this perplexity overview of what attrgetter can do:

        1
        2
        3
        4
        5
        6
        7
        8
        9
       10
       11
       12
       13
       14
       15
       16
       17
       18
       19
       20
       21
       22
       23
       24
       25
       26
       27
       28
       29
       30
       31
       32
       33
       34
       35
       36
       37
       38
       39
       40
       41
       42
       43
       44
       45
       46
       47
       48
       49
       50
       51
       52
       53
       54
       55
       56
       57
       58
       59
       60
       61
       62
       63
       64
       65
       66
       67
       68
       69
       70
       71
       72
       73
       74
       75
       76
       77
       78
       79
       80
       81
       82
       83
       84
       85
       86
       87
       88
       89
       90
       91
       92
       93
       94
       95
       96
       97
       98
       99
      100
      101
      102
      103
      104
      105
      106
      107
      
      The `operator.attrgetter` function in Python is a convenient way to create a callable that retrieves attributes from objects. It can be particularly useful when working with deeply nested dictionaries or objects. Below are examples demonstrating how to use `operator.attrgetter` for accessing values within a deeply nested dictionary structure.
      
      ### Example 1: Accessing Nested Attributes in Objects
      
      Suppose you have a class structure representing a person and their address:
      
      ```python
      from operator import attrgetter
      
      class Address:
          def __init__(self, city, state):
              self.city = city
              self.state = state
      
      class Person:
          def __init__(self, name, address):
              self.name = name
              self.address = address
      
      # Create instances of Address and Person
      address1 = Address("New York", "NY")
      address2 = Address("Los Angeles", "CA")
      
      person1 = Person("Alice", address1)
      person2 = Person("Bob", address2)
      
      # List of persons
      people = [person1, person2]
      
      # Use attrgetter to access nested attributes
      get_city = attrgetter('address.city')
      cities = list(map(get_city, people))
      
      print(cities)  # Output: ['New York', 'Los Angeles']
      ```
      
      ### Example 2: Accessing Nested Values in a Dictionary
      
      If you have a nested dictionary structure, you can use `attrgetter` to retrieve values from it. Here’s an example:
      
      ```python
      from operator import itemgetter
      
      # Nested dictionary structure
      data = [
          {"name": "Alice", "location": {"city": "New York", "state": "NY"}},
          {"name": "Bob", "location": {"city": "Los Angeles", "state": "CA"}},
      ]
      
      # Accessing the city using itemgetter for nested dictionaries
      get_city = itemgetter('location')
      cities = [itemgetter('city')(get_city(person)) for person in data]
      
      print(cities)  # Output: ['New York', 'Los Angeles']
      ```
      
      ### Example 3: Combining Multiple Attributes
      
      You can also use `attrgetter` to fetch multiple attributes at once. For instance, if you want to get both the city and state from the nested structure:
      
      ```python
      from operator import attrgetter
      
      class Address:
          def __init__(self, city, state):
              self.city = city
              self.state = state
      
      class Person:
          def __init__(self, name, address):
              self.name = name
              self.address = address
      
      # Create instances of Address and Person
      address1 = Address("New York", "NY")
      address2 = Address("Los Angeles", "CA")
      
      person1 = Person("Alice", address1)
      person2 = Person("Bob", address2)
      
      # List of persons
      people = [person1, person2]
      
      # Use attrgetter to access both city and state
      get_location = attrgetter('address.city', 'address.state')
      locations = list(map(get_location, people))
      
      print(locations)  # Output: [('New York', 'NY'), ('Los Angeles', 'CA')]
      ```
      
      ### Summary
      
      - **`operator.attrgetter`**: This function is useful for creating callables that can retrieve attributes from objects. It works well with nested structures.
      - **Accessing Nested Attributes**: You can easily access deeply nested attributes using `attrgetter`, which simplifies your code and improves readability.
      - **Combining Attributes**: You can retrieve multiple attributes at once by passing them as arguments to `attrgetter`.
      
      These examples illustrate how `operator.attrgetter` can be effectively used for accessing nested attributes in both object-oriented and dictionary-based data structures in Python.
      
      Citations:
      [1] https://florian-dahlitz.de/articles/introduction-to-pythons-operator-module
      [2] https://note.nkmk.me/en/python-operator-usage/
      [3] https://teamtreehouse.com/community/keyitemgetterattrgetter
      [4] https://discuss.python.org/t/operator-attrgetter-itemgetter-for-setting/20544
      [5] https://www.digitalocean.com/community/tutorials/how-to-use-args-and-kwargs-in-python-3
      [6] https://stackoverflow.com/questions/61452346/python-attrgetter-that-handles-none-values-and-can-be-used-in-a-loop
      [7] https://www.kdnuggets.com/2023/02/python-function-arguments-definitive-guide.html
      [8] https://www.youtube.com/watch?v=on77DvsVgrI
      
    • methodcaller vs defining a lambda for it

      ``Example 7-15. Demo of methodcaller: second test shows the binding of extra’’

      • usecases:

        1. it allows us to partially freeze some args to an existing function

          e.g. hyphenate = methodcaller('replace', ' ', '-')

      • methodcaller is likely better than using a lambda because it’s more like a pointer to a callable function, with args as opposed to a new lambda being defined in each execution in the case of using lambdas

      • references: SO post and the perplexity dump below

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      
      The `operator.methodcaller` function in Python provides a way to create a callable that invokes a method on an object, using the method's name specified as a string. This can be particularly useful in scenarios where you need to call the same method repeatedly on different objects or when working with higher-order functions like `map` or `filter`.
      
      ### Value Proposition of Using `operator.methodcaller`
      
      1. **Readability**: Using `methodcaller` can improve readability by clearly indicating that you are calling a method, especially when the method name is dynamic or when you are applying it across multiple items.
      
      2. **Performance**: `methodcaller` can be more efficient than using a lambda function because it avoids the overhead of creating a new function object for each call. Instead, it creates a single callable that can be reused.
      
      3. **Simplicity**: It simplifies code when you need to apply the same method call to multiple objects, making your intentions clearer without having to define multiple lambda functions.
      
      4. **Consistency**: It provides a consistent interface for calling methods, especially when dealing with collections of objects where the method names might be determined at runtime.
      
      ### Example Usage of `operator.methodcaller`
      
      Here's how you can use `operator.methodcaller` in practice:
      
      #### Example 1: Basic Usage
      
      ```python
      from operator import methodcaller
      
      # Define a list of strings
      strings = ["hello", "world", "python"]
      
      # Use methodcaller to call the upper() method on each string
      upper_case_strings = list(map(methodcaller('upper'), strings))
      
      print(upper_case_strings)  # Output: ['HELLO', 'WORLD', 'PYTHON']
      ```
      
      In this example, `methodcaller('upper')` creates a callable that converts strings to uppercase. The `map` function applies this callable to each string in the list.
      
      #### Example 2: Calling Methods with Arguments
      
      You can also use `methodcaller` to call methods that require arguments:
      
      ```python
      from operator import methodcaller
      
      # Define a list of strings
      strings = ["hello", "world", "python"]
      
      # Use methodcaller to call the rjust() method on each string
      right_justified_strings = list(map(methodcaller('rjust', 10, '.'), strings))
      
      print(right_justified_strings)  # Output: ['.....hello', '.....world', '....python']
      ```
      
      In this case, `methodcaller('rjust', 10, '.')` creates a callable that right-justifies each string to a width of 10 characters, filling with dots.
      
      ### Comparison with Lambda Functions
      
      While you could achieve similar functionality using lambda functions, there are notable differences:
      
      - **Performance**: As mentioned earlier, using `methodcaller` can be faster than using a lambda because it avoids the overhead associated with creating new function objects for each call.
      
      - **Conciseness**: For simple method calls, `methodcaller` can make your code cleaner and more concise compared to writing out lambda expressions.
      
      - **Readability**: Using `methodcaller` explicitly indicates that you're invoking a method, which may enhance code clarity compared to lambda functions that might require additional context to understand what they do.
      
      ### Conclusion
      
      Using `operator.methodcaller` provides several advantages over defining lambdas for method calls, including improved readability, performance benefits, and simplicity. It is particularly useful in functional programming contexts where you need to apply methods across collections of objects efficiently and clearly.
      
      Citations:
      [1] https://stackoverflow.com/questions/2705104/lambda-vs-operator-attrgetterxxx-as-a-sort-key-function
      [2] https://florian-dahlitz.de/articles/introduction-to-pythons-operator-module
      [3] https://martinheinz.dev/blog/54
      [4] https://treyhunner.com/2018/09/stop-writing-lambda-expressions/
      [5] https://note.nkmk.me/en/python-operator-usage/
      [6] https://realpython.com/python-operator-module/
      [7] https://news.ycombinator.com/item?id=28104674
      [8] https://discuss.python.org/t/operator-attrgetter-itemgetter-for-setting/20544
      
  • Freezing Arguments with functools.partial

    it’s a partial application of another function, where we freeze some params

    e.g. nfc = functools.partial(unicodedata.normalize, 'NFC')

    Another is partial: given a callable, it produces a new callable with some of the arguments of the original callable bound to predetermined values. This is useful to adapt a function that takes one or more arguments to an API that requires a callback with fewer arguments.

Chapter Summary

  • main ideas are that you can assign functions to variables, pass them to other functions, store them in data structures, and access function attributes, allowing frameworks and tools to act on that information.

  • Python uses a statement-oriented syntax in which expressions cannot contain statements, and many language constructs are statements —including try/catch, which is what I miss most often when writing lambdas. This is the price to pay for Python’s highly readable syntax.

  • it’s a good thing that the “functional features” are limited in python because it helps make the language easier to use.

NOTE: python doesn’t do tail recursion optimisation (TCO)

reasons:

  1. hard to do stack tracing, makes it harder to debug

  2. not aligned with python’s idioms

    • typically TCO necessary to avoid stack overflow, in languages that are fundamentally functional where deep recursion is idiomatic
  3. hard to implement for the python interpreter.

    current interpreter is supposed to :

    • allow for dyanmic typing
    • allow for introspection

    it’s hard to implement the TCO stuff AND also follow these principles.

biggest obstacle to wider adop‐ tion of functional programming idioms in Python is the lack of tail-call elimination, an optimization that allows memory-efficient computation of a function that makes a recursive call at the “tail” of its body. In another blog post, “Tail Recursion Elimina‐ tion”, Guido gives several reasons why

here’s a more comprehensive perplexity output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Python does not implement tail recursion optimization (TCO), meaning recursive functions—even if tail-recursive—consume a new stack frame for each call, and will hit the recursion depth limit rather than looping indefinitely. This is a deliberate design decision, most famously articulated by Python’s creator, Guido van Rossum[2][3][4][8][10].

### Guido van Rossum’s Reasoning

Guido has explicitly written about why Python doesn’t and won’t adopt TCO:
- **Readability and Debuggability:** A major reason is that Python values clear tracebacks for debugging. If TCO were implemented, the call stack would no longer show the chain of tail-recursive calls, since repeated “tail” calls would overwrite each other instead of expanding the stack. This would make it much harder to trace the logical path that led to an error[2][3].
- **Not Essential for Python’s Intended Use:** In languages that are fundamentally functional (like Scheme), deep recursion is idiomatic, and TCO is necessary to avoid stack overflow. In Python, the expectation is that if you need repeated iteration, you should use explicit loops, which are readable and efficient. Recursion in Python is more commonly used for problems where the recursion depth will be limited in practice, such as tree traversal or parsing, rather than as a substitute for iteration[2][4][8][10].
- **Implementation Complexity:** Due to the dynamic, introspective nature of Python, reliably transforming all tail calls into jumps is technically challenging and would complicate the interpreter. Guido also noted that you can't always know at compile time what function or method is actually being called due to Python’s dynamic features[2].

### Community Patterns and Workarounds

- Python coders generally prefer explicit loops over deep recursion for iterative processes.
- Some clever decorators and tricks (often involving stack frame introspection and exception handling) can mimic TCO in restricted circumstances, but these are not supported officially and can introduce their own overhead or edge cases[5][7].
- The maximum recursion depth in standard Python is limited (e.g., 1,000 frames), highlighting that Python is not optimized for deep recursion by design[5][10].

### Guido’s Perspective (from “Tail Recursion Elimination”)

> "Tail recursion elimination is necessary in functional languages with no side effects, like Scheme, but not in a language with explicit state and tracebacks like Python. Tail recursion requires changing semantics that people may not expect, and it destroys stack traces." [2]

### Summary Table

| Language         | Tail Recursion Optimization? | Reason                                                        |
|------------------|-----------------------------|---------------------------------------------------------------|
| Python           | No                          | Preserves debuggability, prefers explicit loops               |
| Scheme/Lisp      | Yes                         | Recursion is idiomatic, required for efficient iteration      |
| Haskell          | Yes                         | Functional style, deep recursion common                       |

In summary, Python intentionally does not optimize tail calls to preserve debuggability, consistency, and code clarity. This is a conscious part of the language philosophy articulated by Guido and is unlikely to change[2][3][4][10].

[1] https://en.wikipedia.org/wiki/Tail_call
[2] http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
[3] https://stackoverflow.com/questions/13591970/does-python-optimize-tail-recursion
[4] https://www.reddit.com/r/learnpython/comments/q2oggt/does_python_support_tailrecursion/
[5] https://riptutorial.com/python/example/17857/tail-recursion-optimization-through-stack-introspection
[6] https://stackoverflow.com/questions/37193076/why-is-tail-recursion-optimization-faster-than-normal-recursion-in-python
[7] https://stackoverflow.com/questions/27417874/tail-recursion-optimization-decorator-in-python
[8] https://www.designgurus.io/answers/detail/does-python-optimize-tail-recursion
[9] https://www.geeksforgeeks.org/dsa/why-is-tail-recursion-optimization-faster-than-normal-recursion/
[10] https://www.geeksforgeeks.org/dsa/tail-recursion-in-python/

Further Reading

Chapter 8. Type Hints in Functions

What’s New in This Chapter

About Gradual Typing

  • what it means by a gradual type system

    “type hints are optional at all levels”

    :NOTER_PAGE: (285 0.308300395256917 . 0.14267185473411154)

    • it has an interplay between duck typing and nominal typing

    • a type system that is optional

      • by optional it means that we need to be able to silence it
      • we can silence it at varying levels of abstraction
    • a type system that doesn’t catch type errors @ runtime

      • only for static analysis
    • doesn’t enhance performance

    Type hints are optional at all levels: you can have entire packages with no type hints, you can silence the type checker when you import one of those packages into a mod‐ ule where you use type hints, and you can add special comments to make the type checker ignore specific lines in your code.

Gradual Typing in Practice

  • Starting with Mypy
  • Making Mypy More Strict
  • GOTCHA: accidentally using = instead of : for type hints

    so this is legal but also a typo: def hex2rgb(color=str) -> tuple[int,int,int]

    wherein we accidentally wrote = instead of :.

    Just have to be careful for these things because the static analyser won’t point it out (since it’s legal).

  • Good Style:

    • No space between the parameter name and the :; one space after the :
    • Spaces on both sides of the = that precedes a default parameter value

    Use blue instead of black for the static typechecking, it’s more aligned with python’s idioms.

  • A Default Parameter Value
  • EXTRA NOTE: python prefers single quotes by default for strings

    ``using single quotes’’

  • Using None as a Default \(\implies\) use Optional

    the idea here is that None can be a better default value to use. So the type hinting should use an Optional

    It still needs a default value (of None) because typehints are meaningless at runtime.

    BTW, it’s not the annotation that makes the param optional, it’s the provisioning of a default value for that param.

Types Are Defined by Supported Operations

  • what’s a “type”?

    in a practical sense, see it as the set of supported operations

  • a supported operation here refers to whether the data object has the associated operator function defined or not.

    So the example given is abc.Sequence and it does not have the __mul__ implemented, so if the function is this then the type checker will complain

    1
    2
    3
    4
    
      from collections import abc
    
      def double(x: abc.Sequence):
              return x * 2
    

Gradual Type System: an interplay b/w duck typing and nominal typing

``have the interplay of two different views of types:’’

  • the key idea is, when do we want to detect typing errors: if @ runtime, then it’s more aligned with duck typing. if @ compile time, then it’s aligned with nominal typing.
  • duck typing (implicitly, structural typing):

    • focuses on “behaviour”, only enforced at runtime

    • objects are “typed” but variables aren’t

    • what really matters is what operations are supported \(\implies\) that’s why it’s duck typing.

      “if it quacks like a duck” means if it has an implementation like that and the implementation supports the arguments provided

    • naturally this type checking is done at runtime

  • nominal typing:

    • focuses on “type identity”
    • “nominal” because it depends on the name, referring to the declaration that was made (like a label)
    • compatibility of type depends on what the explicitly-defined type is
  • NOTE: a static checker may complain about type errors even if the code will actually work and execute without issues.

  • there’s a duality to be balanced here

    This little experiment shows that duck typing is easier to get started and is more flexi‐ ble, but allows unsupported operations to cause errors at runtime. Nominal typing detects errors before runtime, but sometimes can reject code that actually runs—such as the call alert_bird(daffy)

Types Usable in Annotations

  • The Any Type

    • the purpose of defining an any type

      • more general types \(\implies\) narrower interfaces in the sense that they support fewer operations.

      • need for a special wildcard type: so you’d want to have something that can accept values of every type but not end up having a narrow interface \(\rightarrow\) that’s why we have any

      • so, Any is a magic type that sits at the bottom and at the top of the type hierarchy (from the POV of the typechecker).

      More general types have narrower interfaces, i.e., they support fewer operations. The object class implements fewer operations than abc.Sequence, which implements fewer operations than abc.MutableSequence, which implements fewer operations than list. But Any is a magic type that sits at the top and the bottom of the type hierarchy. It’s simultaneously the most general type—so that an argument n: Any accepts values of every type—and the most specialized type, supporting every possible operation. At least, that’s how the type checker understands Any

    • Contrasting subtype-of vs consistent-with relations

      In a gradual type-system there are elements of behavioural sub-typing (the classic one that adheres to LSP principle) as well as a more flexible compatibility notion in the form of consistent sub-typing.

      • subtype-of relationship: behavioural sub-typing adheres to LSP

        • LSP was actually defined in the context of supported operations:

          If an object of T2 substitutes an object of type T1 and the program still behaves correctly, then T2 is a subtype-of T1.

        T2 is expected. This focus on supported operations is reflected in the name behavioral subtyping,

      • consistent-with relationship: that’s what the any is for

        this is the part where Any is consistence with both up and down the heirarchy.

  • Simple Types and Classes

    • Can just directly use them for type-hinting.
    • for classes, consistent-with is defined like subtype-of: a subclass is consistent with all its superclasses.
    • exception: int is Consisten-With complex
      • all the numeric types are directly subclassed from object.
      • int has a superset of functions but it’s not really a subclass of complex but it is still consistent-with complex!
  • Optional and Union Types

    • even the optional type is just syntax sugar for Union[myType , None].

    • the latest syntax allows us to use A | B instead of Union[A, B].

    • NOTE: we can actually define return types that are Unions, but this makes it ugly because the caller of this function now needs to handle the type checking at runtime.

    • Union is more useful with types that are not consistent among themselves.

      For example:

      Union[int, float] is redundant because int is consistent-with float.

      If you just use float to annotate the parameter, it will accept int values as well.

    • syntactic sugar for optional and union type: |

      Better Syntax for Optional and Union in Python 3.10 We can write str | bytes instead of Union[str, bytes] since Python 3.10. It’s less typing, and there’s no need to import Optional or Union from typing. Contrast the old and new syntax for the type hint of the plural parameter of show_count: plural: Optional[str] = None plural: str | None = None

      The | operator also works with isinstance and issubclass to build the second argument: isinstance(x, int | str). For more, see PEP 604—Complementary syntax for Union[].

    • try not to define return values with union types

      it means the responsibility of doing type checking on the return values is on the consumer of the function \(\rightarrow\) bad pattern

  • Generic Collections (defining types for collections like list[str])

    • python collections (container classes) are generally heterogeneous

    • Generic types can be declared with type parameters to specify the type of the items they can handle.

    • the simplest form of generic type hints is container[item] where container is any container type; examples being:

      • list
      • set
      • abc.MutableSet
    • references:

    • Situations that python’s type annotations won’t be able to handle:

      • unsupported 1 - can’t type check array.array typecode for python v 3.10
      • unsupported 2 - when collection defined with typecode, overflow is not checked for

        yet another reminder that these numerics in python are not fixed-width

        constructor argument, which determines whether integers or floats are stored in the array. An even harder problem is how to type check integer ranges to prevent OverflowError at runtime when adding elements to arrays. For example, an array with typecode=‘B’ can only hold int values from 0 to 255. Cur‐ rently, Python’s static type system is not up to this challenge.

  • Tuple Types

    There are 3 ways we can annotate tuple types:

    1. annotating them as records

    2. annotating them as records with named fields

    3. annotating them as immutable sequences

    • tuples as records

      Just use the builtin like e.g. def geohash(lat_lon: tuple[float,float]) -> str:

    • for tuples being used as records with named fields \(\implies\) using NamedTuple

      • can “alias” it using a named tuple – follows the consistent-with relationship

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        
          from typing import NamedTuple
          from geolib import geohash as gh
        
                  PRECISION = 9
        
                  class Coordinate(NamedTuple):
                      lat: float
                      lon: float
                      # NOTE this wrapper prevents static checkers from complaining that the geohash lib does not have typehints.
                  def geohash(lat_lon:Coordinate) -> str:
                          return gh.encode(*lat_lon, PRECISION)
        
      • So here, Coordinate is consistent-with tuple[float,float]

        because of this consistency, if a fn signature was def display(lat_lon: tuple[float, float]) -> str:, then Coordinate NamedTuple will still work

      1
      
      None
      
    • for tuples to be used as immutable sequences

      Objective here is to annotate tuples of unspecified length that are used as immutable lists

      We specify a single type, followed by a comma and ...

      This ellipsis is useful to us.

      e.g. tuple[int, ...] is a tuple with int items.

      • note: we can’t tie down a particular length though

      Here’s a consolidated example:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      
      from collections.abc import Sequence
              def columnize(
                      # 1: this is a sequence of strings
                      sequence: Sequence[str], num_columns: int = 0
                      # 2: return type below means it's a list of tuples and the tuples only contain strings.
              ) -> list[tuple[str, ...]]:
                      if num_columns == 0:
                              num_columns = round(len(sequence) ** 0.5)
      
                      num_rows, reminder = divmod(len(sequence), num_columns)
                      num_rows += bool(reminder)
                      return [tuple(sequence[i::num_rows]) for i in range(num_rows)]
      
  • Generic Mappings

    • the syntax is just MappingType[KeyType, ValueType]
    • we can annotate local variables! e.g. index: dict[str, set[str]] = {}
    • interesting example of an inverted index

      ``returning an inverted index’’

      There’s a whole bunch of literature on inverted indexes. This does a value -> key mapping.

      see wikipedia article on this

    • good example code

      In the example below:

      1. the local variable of index is annotated because the typechecker will complain otherwise

      2. the walrus operator := is used to define a name for an expression-output

        TO_HABIT: this is useful and I already use it for while loops, but I can potentially use it for other expressions as well.

        in the example for charidx.py:

        1
        2
        3
        4
        
           # we name the expression for the if-predicate so that we can use it thereafter
           if name := unicodedata.name(char, ''):
                   for word in tokenize(name):
                           index.setdefault(word, set()).add(char)
        

        More information about the walrus operator:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        81
        82
        
           The **walrus operator** (`:=`), introduced in Python 3.8, allows for assignment expressions, enabling you to assign a value to a variable as part of an expression. This feature can enhance code readability and efficiency in certain contexts. Here’s a detailed overview of its functionality, use cases, and implications.
        
           ### What is the Walrus Operator?
        
           - **Syntax**: The walrus operator is used as follows:
             ```python
             variable := expression
             ```
             This assigns the result of `expression` to `variable` and returns that value.
        
           - **Purpose**: The primary purpose of the walrus operator is to allow assignments to be made within expressions, reducing redundancy and improving code conciseness.
        
           ### Key Use Cases
        
           1. **Reducing Redundant Calculations**:
        ​      - The walrus operator can be particularly useful when you want to use a value multiple times without recalculating it.
              ```python
              # Without walrus operator
              result = [func(x) for x in data if func(x)]
        
              # With walrus operator
              result = [y for x in data if (y := func(x))]
              ```
              In this example, `func(x)` is called only once per iteration instead of twice.
        
           2. **Cleaner Loop Constructs**:
        ​      - It simplifies loops where the loop condition depends on user input or other calculations.
              ```python
              # Using walrus operator
              while (data := input("Enter your data: ")) != "":
                  print("You entered:", data)
              ```
              This eliminates the need for an initial assignment before entering the loop.
        
           3. **Conditional Assignments**:
        ​      - You can assign a value within an if statement, making the code more readable.
              ```python
              if (match := re.match(pattern, s)):
                  print("Match found:", match.group())
              ```
        
           4. **Accumulate Data In-Place**:
        ​      - The walrus operator can also be used to accumulate values while iterating.
              ```python
              c = 0
              print([(c := c + x) for x in [5, 4, 3, 2]])  # Output: [5, 9, 12, 14]
              ```
        
           ### Advantages
        
           - **Conciseness**: It reduces boilerplate code by allowing assignments within expressions.
        ​   - **Performance**: It can improve performance by avoiding repeated function calls or calculations.
        ​   - **Readability**: In certain contexts, it makes the code clearer by showing intent directly where values are being assigned and used.
        
           ### Considerations
        
           - **Readability vs. Complexity**: While it can enhance readability, excessive or inappropriate use may lead to complex and hard-to-read code. It's important to balance conciseness with clarity.
        ​   - **Avoiding Nested Expressions**: Using nested walrus operators can make code difficult to understand and maintain.
        
           ### Etymology of "Walrus"
        
           The term "walrus operator" is informal and comes from the resemblance of the `:=` symbol to a walrus's eyes and tusks. The playful name was popularized in discussions about its introduction and has since become widely accepted in the Python community.
        
           ### Mental Model
        
           To conceptualize the walrus operator:
        ​   - Think of it as a way to "capture" a value while simultaneously using it in an expression.
        ​   - Visualize it as a tool that allows you to hold onto something (the value) while you continue working with it immediately (the expression).
        
           ### Summary
        
           The walrus operator (`:=`) in Python provides a powerful way to assign values within expressions, enhancing code conciseness and performance in specific scenarios. While it offers significant advantages, careful consideration should be given to its use to maintain code clarity and avoid unnecessary complexity.
        
           Citations:
           [1] https://www.geeksforgeeks.org/walrus-operator-in-python-3-8/
           [2] https://martinheinz.dev/blog/79
           [3] https://www.kdnuggets.com/how-not-to-use-pythons-walrus-operator
           [4] https://realpython.com/python-walrus-operator/
           [5] https://www.reddit.com/r/Python/comments/jmnant/walrus_operator_good_or_bad/
           [6] https://stackoverflow.com/questions/73644898/why-is-python-walrus-operator-needed-instead-of-just-using-the-normal-assig
           [7] https://realpython.com/python-operator-module/
           [8] https://www.digitalocean.com/community/tutorials/how-to-use-args-and-kwargs-in-python-3
        
      3. the tokenize function is a generator. KIV for chapter 17 for a deep dive into this.

      Example 8-14. charindex.py

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      
      
      import re
      import unicodedata
      
      from collections.abc import Iterator
      
      RE_WORD = re.compile(r"\w+")
      STOP_CODE = sys.maxunicode + 1
      
      
      def tokenize(text: str) -> Iterator[str]:
          """
          return iterable of uppercased words
          """
          for match in RE_WORD.finditer(text):
              yield match.group().upper()
      
      
      def name_index(start: int = 32, end: int = STOP_CODE) -> dict[str, set[str]]:
          index: dict[str, set[str]] = {}
      
          for char in (chr(i) for i in range(start, end)):
              if name := unicodedata.name(char, ""):
                  for word in tokenize(name):
                      index.setdefault(word, set()).add(char)
          return index
      
  • Abstract Base Classes

    • PRINCIPLE: Robustness Principle / Postel’s Law:

      “Be conservative in what you send, be liberal in what you accept.”

    • it makes sense to define a generic type hint (of abstract classes) so that we can support many concrete implementations of it.

    • rule of thumb - better to use abc.Mapping or abc.MutableMapping instead of dict

      Because it will support more mapping types

      Therefore, in general it’s better to use abc.Mapping or abc.MutableMapping in parameter type hints, instead of dict (or typing.Dict in legacy code).

    • fall of the “numeric tower” of numeric class-hierarchy

      • there used to be a bunch of ABCs for numeric types, but now it’s not useful because numeric types are special.

        they are directly subclassed from Object type and are consistent-with each other.

      • this numeric tower is a linear hierarchy of ABCs with Number at the top

        1. Number
        2. Complex
        3. Real
        4. Rational
        5. Integral

      Point being that the static type checking of things within the numeric tower doesn’t work well – have to use the explicit types, KIV the solution for it, comes in a later chapter

      Those ABCs work perfectly well for runtime type checking, but they are not sup‐ ported for static type checking. The “Numeric Tower” section of PEP 484 rejects the numbers ABCs and dictates that the built-in types complex, float, and int should be treated as special cases, as explained in “int Is Consistent-With complex” on page

    • 3 options to type-annotate numeric things

      1. use a concrete type instead e.g. int, float, complex

      2. declare a union type Union[float, Decimal, Fraction]

      3. Use numeric protocols e.g. SupportsFloat

        kiv numeric protocols for chapter 13

      In practice, if you want to annotate numeric arguments for static type checking, you have a few options:

      1. Use one of the concrete types int, float, or complex—as recommended by PEP
    >
    > 3.  Declare a union type like Union[float, Decimal, Fraction].
    > 4.  If you want to avoid hardcoding concrete types, use numeric protocols like Sup
    >
    > portsFloat, covered in “Runtime Checkable Static Protocols” on page 468.
    > The upcoming section “Static Protocols” on page 286 is a prerequisite for understand‐
    > ing the numeric protocols.
    > Meanwhile, let’s get to one of the most useful ABCs for type hints: Iterable.
  • Generic Iterables

    • Python Typeshed Project

      Not that important.

      Just for compatibility initially.

      It is a way to provide “headers” with type annotations.

      This is how the type annotations are retrofit in existing stdlibs because the stdlib fucntions have no annotations.

      It relies on a .pyi file that’s basically like a c-header file.

      :NOTER_PAGE: (310 0.6666666666666667 . 0.2703549060542797)

      ``Stub Files and the Typeshed Project’’

    • Explicit Type Aliases are supported, they improve readability

      Read more in PEP613.

      Though it seems that there’s a separate syntax for this. FromTo: TypeAlias = tuple[str, str]

      1
      2
      
      from typing import TypeAlias
      FromTo: TypeAlias = tuple[str, str]
      
    • ⚠️ Danger of unbounded iterables on memory requirements

      • GOTCHA: iterable arguments need to be completely consumed. This poses a risk if we have infinite iterables (e.g. cyclic generators).

        this is something to keep in mind about.

      • however, the value of this is that it allows flexibility and the ability to inject in generators instead of prebuilt sequences

      return a result. Given an endless iterable such as the itertools.cycle generator as input, these functions would consume all memory and crash the Python process. Despite this potential danger, it is fairly common in modern Python to offer functions that accept an Iterable input even if they must process it completely to return a result.

  • Parameterized Generics and TypeVar

    • for us to refer to a generic type, we have to use TypeVars
    • KIV the fact that TypeVar also allows us to define covariants and contravariants in addition to bounds.
    • type var bound @ point of usage, is a reflection on the result type

      where T is a type vari‐ able that will be bound to a specific type with each usage. This allows a parameter type to be reflected on the result type.

    • why TypeVar is needed (and unique to python)

      • TypeVar is a construct that is unique to the python language

        introduces the variable name in the current namespace as opposed to getting that variable declared beforehand

        it’s unique because languages like C, Java, Typescript don’t needt he name of type variables to be declared beforehand, so they don’t need such a construct

      • mental model: it’s a variable representing a type instead of being a type by itself

      • see more on typevar:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        
          The concept of **TypeVar** in Python is a unique construct primarily used for creating generic types, allowing developers to write functions and classes that can operate on any data type while maintaining type safety. This feature is particularly useful in statically typed languages, but it has specific implications and uses in Python, which is dynamically typed. Here’s a breakdown of why TypeVar is significant in Python and how it differs from similar concepts in languages like JavaScript.
        
          ### Understanding TypeVar in Python
        
          1. **Generic Programming**:
        ​     - **TypeVar** allows you to define a placeholder for a type that can be specified later when the function or class is instantiated. This enables generic programming, where you can write code that works with any data type.
        ​     - Example:
               ```python
               from typing import TypeVar, Generic
        
               T = TypeVar('T')
        
               class Wrapper(Generic[T]):
                   def __init__(self, value: T):
                       self.value = value
        
               int_wrapper = Wrapper(10)  # T is inferred as int
               str_wrapper = Wrapper("Hello")  # T is inferred as str
               ```
        
          2. **Type Safety**:
        ​     - TypeVar enhances type safety by ensuring that the operations performed on the generic type are valid for the specific type passed during instantiation. This helps catch errors at development time rather than runtime.
        
          3. **Flexibility**:
        ​     - It allows for more flexible and reusable code. You can create functions and classes that can handle multiple types without duplicating code for each specific type.
        
          ### Comparison with JavaScript
        
          JavaScript does not have a direct equivalent to Python's TypeVar due to its dynamic typing system. Here are some key differences:
        
          1. **Dynamic vs. Static Typing**:
        ​     - JavaScript is dynamically typed, meaning types are determined at runtime and variables can hold values of any type without explicit declarations.
        ​     - In contrast, Python’s TypeVar allows for static type checking when using tools like `mypy`, enabling developers to specify expected types while still maintaining flexibility.
        
          2. **Lack of Generics**:
        ​     - While JavaScript supports some level of generics through its type systems (like TypeScript), it does not have built-in constructs like TypeVar that are part of the core language syntax.
        ​     - In TypeScript (a superset of JavaScript), generics are defined differently, using angle brackets (`<T>`), but they do not use a construct like `TypeVar` to define a variable type that can be reused across multiple functions or classes.
        
          3. **Type Inference**:
        ​     - Python's TypeVar allows for type inference based on context, which can help with readability and maintainability of code. JavaScript's dynamic nature means that developers often rely on documentation or comments to convey expected types.
        
          ### Etymology of "TypeVar" and Mental Model
        
          The term **TypeVar** combines "Type" (referring to data types) and "Var" (short for variable). This naming emphasizes that it acts as a variable representing a type rather than being a concrete type itself.
        
          #### Mental Model:
        ​  - Think of **TypeVar** as a placeholder or a template for a data type:
        ​    - Imagine it as an empty box labeled "T" where you can put different items (data types) later. When you define a function or class using TypeVar, you’re saying, “This box can hold anything; just tell me what it will hold when you use it.”
        ​  - This concept aligns with generic programming principles found in other languages but is uniquely adapted to Python's dynamic typing environment.
        
          ### Summary
        
          - **TypeVar** is a powerful construct in Python that enables generic programming by allowing developers to create flexible and reusable code while maintaining type safety.
        ​  - Unlike JavaScript, which lacks direct support for generics in its core syntax, Python provides TypeVar as part of its typing module, facilitating static type checking.
        ​  - The term "TypeVar" reflects its role as a variable representing types, allowing developers to think in terms of templates or placeholders when designing their functions and classes.
        
          Citations:
          [1] https://stackoverflow.com/questions/55345608/instantiate-a-type-that-is-a-typevar
          [2] https://discuss.python.org/t/non-uniqueness-of-typevar-on-python-versions-3-12-causes-resolution-issues/37350
          [3] https://guicommits.com/python-generic-type-function-class/
          [4] https://typing.readthedocs.io/en/latest/spec/generics.html
          [5] https://www.reddit.com/r/learnpython/comments/1adbgfp/should_i_use_a_typevar/
          [6] https://dagster.io/blog/python-type-hinting
          [7] https://docs.python.org/es/3.13/library/typing.html
          [8] https://www.typescriptlang.org/play/typescript/language-extensions/nominal-typing.ts.html
        

      make the [] operator work on classes like Sequence[T]. But the name of the T variable inside the brackets must be defined somewhere—otherwise the Python interpreter would need deep changes to support generic type notation as special use of []. That’s why the typing.TypeVar constructor is needed: to introduce the variable name in the cur‐ rent namespace. Languages such as Java, C#, and TypeScript don’t require the name of type variable to be declared beforehand,

    • Restricting/Bounding the TypeVar

      • there might be a need to explicitly restrict using a whilelist of types instead of letting the consistent-with subtyping do its job.

        Without the restriction, anything that is consistent with T will work, but that’s unideal because the function that’s consuming the type most likely needs this to be restricted

      • we have 2 ways to restrict the possible types assigned to T:

      • [1] restricted TypeVar – references a whitelist

        This is a fixed whitelist.

        Problem is that, it may not be easy to maintain if numerous items in the list.

        that’s where bounding can be done.

        ``NumberT = TypeVar(‘NumberT’, float, Decimal, Fraction)’’

      • [2] bounded TypeVar – defines an upper bound on the type, works on anything that is consistent-with

        • sets an upper boundary for the acceptable types.

          e.g. HashableT = TypeVar('HashableT', bound=Hashable) then the variable could be Hashable or any of its subtypes

        • careful not to get confused with the use of the word “bound” for that named param to TypeVar. It’s just

        • this becomes the same generics construct as in Java

        The solution is another optional parameter of TypeVar: the bound keyword parame‐ ter. It sets an upper boundary for the acceptable types. In Example 8-18, we have bound=Hashable, which means the type parameter may be Hashable or any subtype- of it.14

    • Predefined TypeVars

      • AnyStr is an example of such a predefined type var, supports both bytes and str .
  • Static Protocols via typing.Protocols

    A protocol in the historical sense is an informal interface. KIV proper introduction to Protocols till Chapter 13.

    In the context of type hints,

    A protocol is really all about structural typing. Types match if the behaviours are consistent-with each other.

    This feature is also known as “static duck typing”. It’s because we make duck typing explicit for static type checkers.

    the solution to annotate the series parameter of top was to say “The nominal type of series doesn’t matter, as long as it implements the __lt__ method.” Python’s duck typing always allowed us to say that implicitly, leaving static type checkers clueless. That’s the contrast with implicit duck typing that we have been seeing all along

    • protocol definition vs implementation

      • protocol can be defined by subclassing typing.Protocol

        it’s a class of its own,

        here’s an example:

        1
        2
        3
        4
        5
        
        
          from typing import Protocol, Any
        
          class SupportsLessThan(Protocol):
                  def __lt__(self, other: Any) -> bool: ...
        

        and then we can use this protocol to define a TypeVar: LT = TypeVar('LT', bound=SupportsLessThan)

        NOTE:

        1. it subclasses typing.Protocol

        2. class body has one or more methods. the methods have ... in their bodies.

          this is sufficient to define the type signature for the protocol, and that’s what matters / is used to determine if something adheres to a protocol.

      • the implementer of a protocol doesn’t need to inherit, register or declare any relationship with the class that defines the protocol

      a protocol type is defined by specifying one or more methods, and the type checker verifies that those methods are imple‐ mented where that protocol type is required. In Python, a protocol definition is written as a typing.Protocol subclass. However, classes that implement a protocol don’t need to inherit, register, or declare any rela‐ tionship with the class that defines the protocol. It’s up to the type checker to find the available protocol types and enforce their usage.

    • use case: when we can’t just define a boundary, but we want to define a protocol for the functions it supports

      were able to use typing.Hashable as the upper bound for the type parameter. But now there is no suitable type in typing or abc to use, so we need to create it.

    • T just needs to be checked if it’s consistent-with the protocol

      • disambiguation b/w protocols and abstract classes (as interfaces) in OOP (see perplexity thread)
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        
          To clarify the concepts of **protocols** and **interfaces** in Python, and to compare them with Java's virtual interfaces, let's break down the definitions and functionalities.
        
          ### Disambiguation: Protocols vs. Interfaces in Python
        
          1. **Protocols**:
        ​     - **Definition**: A protocol in Python, introduced in PEP 484, defines a set of methods and properties that a class should implement, but it does not require explicit inheritance. Protocols allow for structural typing, meaning that if a class has the required methods and properties, it can be considered to implement the protocol.
        ​     - **Usage**: Protocols are defined using the `typing.Protocol` class. They are useful for static type checking with tools like `mypy` and allow for "duck typing" — if an object behaves like a certain type, it can be used as that type regardless of its actual class.
        ​     - **Example**:
               ```python
               from typing import Protocol
        
               class Drawable(Protocol):
                   def draw(self) -> None:
                       ...
        
               class Circle:
                   def draw(self) -> None:
                       print("Drawing a circle")
        
               def render(shape: Drawable) -> None:
                   shape.draw()
        
               circle = Circle()
               render(circle)  # This works because Circle has a draw method
               ```
        
          2. **Interfaces**:
        ​     - **Definition**: In Python, interfaces are often implemented using abstract base classes (ABCs) defined in the `abc` module. An interface specifies a contract that classes must adhere to by explicitly declaring methods that must be implemented.
        ​     - **Usage**: Interfaces require explicit inheritance from an ABC. This means that if a class claims to implement an interface, it must inherit from the ABC and implement all its abstract methods.
        ​     - **Example**:
               ```python
               from abc import ABC, abstractmethod
        
               class Drawable(ABC):
                   @abstractmethod
                   def draw(self) -> None:
                       pass
        
               class Circle(Drawable):
                   def draw(self) -> None:
                       print("Drawing a circle")
        
               def render(shape: Drawable) -> None:
                   shape.draw()
        
               circle = Circle()
               render(circle)  # This works because Circle implements Drawable
               ```
        
          ### Comparison with Java's Virtual Interfaces
        
          - In Java, interfaces are explicitly defined contracts that classes must implement. A class can implement multiple interfaces, providing flexibility similar to Python's protocols.
        ​  - Java's interfaces support default methods (methods with an implementation), which is not directly available in Python's protocols or ABCs without additional workarounds.
        ​  - Both Java interfaces and Python protocols allow for polymorphism; however, Python's protocols enable duck typing without requiring explicit declarations of interface implementation.
        
          ### Summary of Differences
        
          | Feature               | Protocols (Python)                  | Interfaces (Python via ABC)               | Java Interfaces                       |
          |-----------------------|-------------------------------------|-------------------------------------------|---------------------------------------|
          | Definition            | Structural typing                   | Explicit method contracts                  | Explicit method contracts              |
          | Inheritance           | No explicit inheritance required    | Requires explicit inheritance              | Requires explicit implementation       |
          | Flexibility           | More flexible (duck typing)        | Less flexible (strict adherence needed)   | Supports multiple interface implementation |
          | Default Methods       | No default methods                  | No default methods                        | Supports default methods               |
          | Use Case              | Type checking with mypy            | Enforcing method implementation            | Defining contracts for behavior       |
        
          ### Conclusion
        
          In summary, while both protocols and interfaces serve to define expected behaviors in Python, they do so in different ways. Protocols offer more flexibility through structural typing, while interfaces enforce strict adherence through inheritance. Understanding these distinctions helps in designing systems that leverage polymorphism effectively.
        
          If you have further questions or need more clarification on any specific points regarding protocols, interfaces, or their comparisons with other languages like Java, feel free to ask!
        
          Citations:
          [1] https://stackoverflow.com/questions/67233702/python-terminology-interface-vs-protocol
          [2] https://blog.glyph.im/2021/03/interfaces-and-protocols.html
          [3] https://sinavski.com/post/1_abc_vs_protocols/
          [4] https://testdriven.io/tips/9f452585-e673-4617-8f35-ac85ab413e14/
          [5] https://www.reddit.com/r/Python/comments/10ikape/interfaces_with_protocols_why_not_ditch_abc_for/
          [6] https://news.ycombinator.com/item?id=26488156
          [7] https://andrewbrookins.com/technology/building-implicit-interfaces-in-python-with-protocol-classes/
          [8] https://discuss.python.org/t/add-built-in-flatmap-function-to-functools/21137
        

      protocol has one or more method definitions, with … in their bodies. A type T is consistent-with a protocol P if T implements all the methods defined in P, with matching type signatures.

    • example

      the examples below use MyPy’s debugging facilities, take note.

      • typing.TYPE_CHECK can be used to guard against runtime function calls

        • things like reveal_type() is a Mypy debugging facility, not a regural function.

        typing.TYPE_CHECKING constant is always False at runtime, but type check‐ ers pretend it is True when they are type checking.

      • reveal_type() is a pseudofunction, a mypy debugging facility

        ``reveal_type() pseudofunction call, showing the inferred type of the argument.’’

  • Callables via typing.Callable

    • allows us to hint the type of Higher Order Functions that are taking in callables

      parameterized like so: Callable[[ParamType1, ParamType2], ReturnType]

      The params list can have zero or more types.

    • if we need a type hint to match a function with a flexible signature, replace the whole parameter list with a ...

      Callable[..., ReturnType]

      other than that, there’s NO syntax to annotate optional or kwargs

    • Variance in Callable Types

      With generic type params, we now have to deal with type hierarchies and so we have to deal with type variance.

      KIV variance on Chapter 15

      • covariance

        example: Callable[[], int] is a subtype-of Callable[[], float] because int is a subtype of float

        \(\implies\) Callable is covariant on the return type because the subtype-of relationships of the types int and float is in the same direction as the relationship of the Callable types that use them as return types

      • most parameterized generic types are invariant

  • NoReturn via typing.NoReturn

    for functions that never return

    actually used for no returns like exception throws in the case of sys.exit() that raises SystemExit

    • extra: typeshed-like stub files don’t define default values, so they use ... instead

      ``Stub files don’t spell out the default values, they use … instead.''

      :NOTER_PAGE: file:///Users/rtshkmr/org/future_vyapari/books/Luciano Ramalho - Fluent Python_ Clear, Concise, and Effective Programming-O’Reilly Media (2022).pdf :ID: ./Luciano Ramalho - Fluent Python_ Clear, Concise, and Effective Programming-O’Reilly Media (2022).pdf-annot-325-10

    • extra: the use of ellipsis operator ... is context-dependent

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      
      The `...` operator in Python, known as the **ellipsis**, is a built-in singleton object of type `ellipsis`. Its role varies depending on context, and it’s often used where meaning can be ambiguous because Python itself does not mandate one specific use. Here are its main uses:
      
      1. **Placeholder for Incomplete Code**
         You can use `...` inside functions, classes, or other blocks to indicate "code not yet implemented" or "to be done later," similar to `pass`. For example:
         ```python
         def my_function():
             ...
         ```
         This works as a no-op placeholder, and some developers prefer it stylistically over `pass`.
      
      2. **Extended Slicing in Multi-Dimensional Arrays**
         In libraries like **NumPy**, `...` is very useful for slicing arrays with many dimensions. It stands for "expand all missing dimensions" in slice notation, allowing you to avoid explicitly typing many colons:
         ```python
         import numpy as np
         arr = np.arange(27).reshape(3, 3, 3)
         result = arr[..., 0]  # Equivalent to arr[:, :, 0]
         ```
         This slices all elements along all earlier dimensions and selects index 0 of the last dimension.
      
      3. **Unique Default or Sentinel Value**
         Sometimes `...` is used as a unique sentinel or default parameter value to distinguish "no argument provided" from `None` or other possible values:
         ```python
         def greet(name, greeting=...):
             if greeting is ...:
                 greeting = "Hello"
             print(f"{greeting}, {name}!")
         ```
      
      4. **Representation of Recursive or Circular References**
         When you print objects with circular references (e.g., a list containing itself), Python shows `[...]` in the output to avoid infinite recursion. This is a display convention, not an actual replacement of the object.
      
      5. **Type Hinting Uses**
         The ellipsis appears in some advanced type hinting scenarios, for example, to indicate variable-length tuples (`Tuple[int, ...]`) or unspecified arguments in callables.
      
      ### Summary Table of Ellipsis Roles
      
      | Role                                | Description & Example                         |
      |------------------------------------|-----------------------------------------------|
      | Placeholder for unimplemented code | Use `...` inside functions/classes instead of `pass` |
      | Extended slicing in multidimensional arrays | `arr[..., 0]` to slice across many dimensions |
      | Unique default/sentinel value      | Default function arguments different from None |
      | Circular reference representation  | Shown as `[...]` in printed recursive data to avoid infinite loops |
      | Type hinting                       | Used in tuple or callable annotations to indicate variable length |
      
      In essence, `...` is a versatile built-in object intended to serve as a **syntactic convenience and semantic marker** that gains meaning from context, especially in libraries like NumPy and in type hinting.
      
      If you are seeing it "everywhere" ambiguously, it’s because its meaning tends to be **context-dependent**, and it’s often used as a compact placeholder or slicing shorthand.
      
      Let me know if you want detailed examples or specific usage in any particular context like NumPy, type hints, or function placeholders.
      
      ---
      References:
      1. Scaler: Uses of Ellipsis in Python[1]
      2. Real Python: When to Use Ellipsis[2]
      3. Python Land: Python Ellipsis Explained[4]
      4. Explaining Ellipsis in Function Defaults and NumPy Slicing[3]
      
      [1] https://www.scaler.com/topics/python-ellipsis/
      [2] https://realpython.com/python-ellipsis/
      [3] https://www.kdnuggets.com/exploring-pythons-ellipsis-more-than-just-syntax-sugar
      [4] https://python.land/python-ellipsis
      [5] https://stackoverflow.com/questions/772124/what-does-the-ellipsis-object-do
      [6] https://www.reddit.com/r/learnpython/comments/12pqfz5/til_about_ellipses_in_python/
      [7] https://gist.github.com/promto-c/f51cc2c0eb8742ce5cc3e65601df2deb
      [8] https://www.geeksforgeeks.org/python/what-is-three-dots-or-ellipsis-in-python3/
      [9] https://mbizsoftware.com/to-what-purpose-does-a-python-ellipsis-perform/
      

Annotating Positional Only and Variadic Parameters

Consider this example:

1
2
3
4
5
6
7
8
9
from typing import Optional

def tag(
        name: str,
        /, # this marks the preceding arguments as positional only
        *content: str, # for the arbitratry positional params
        class_: Optional[str] = None,
        **attrs: str, # this is for the variadic named kwargs
      ) -> str:

So what we see here is that:

  1. for the arbitrary positional params, it’s all fixed to str

  2. for the kwargs, it’s **atrs: <mytype> where mytype would be the type of the value and the key will be str

Imperfect Typing and Strong Testing

Some limitations to the type hinting capabilities:

  • unsupported: useful things like argument unpacking

    handy features can’t be statically checked; for example, argument unpack‐ ing like config(**settings).

  • unsupported: advanced features like properties, descriptors, meta things

    properties, descriptors, metaclasses, and metaprogram‐ ming in general are poorly supported or beyond comprehension for type checkers.

  • since can’t hint data constraints, type hinting doesn’t help with correctness of business logic

    Common data constraints cannot be expressed in the type system—even simple ones. For example, type hints are unable to ensure “quantity must be an integer > 0” or “label must be a string with 6 to 12 ASCII letters.” In general, type hints are not help‐ ful to catch errors in business logic.

  • conclusion - robustness of python codes comes mainly from quality unit-testing

    concluded: “If a Python program has adequate unit tests, it can be as robust as a C++, Java, or C# program with adequate unit tests (although the tests in Python will be faster to write).”

Chapter Summary

  • Protocol and how it enables static duck typing is useful because it’s very ‘pythonic’

    • in many ways, type hinting is very unPythonic, but typing.Protocol fits nicely..

    • protocols should be seen as the bridge between python’s duck-typed core and the nominal typing that allows static type checkers to catch bugs.

    3.8, Protocol is not widely used yet—but it is hugely important. Protocol enables static duck typing: the essential bridge between Python’s duck-typed core and the nominal typing that allows static type checkers to catch bugs.

  • using the term “generics” in python is kinda funny

    because it’s actually doing the opposite by tying-down exactly what the types can be instead of keeping it more “generic”

    Generics or Specifics? From a Python perspective, the typing usage of the term “generic” is backward. Com‐ mon meanings of “generic” are “applicable to an entire class or group” or “without a brand name.” Consider list versus list[str]. The first is generic: it accepts any object. The sec‐ ond is specific: it only accepts str. The term makes sense in Java, though. Before Java 1.5, all Java collections (except the magic array) were “specific”: they could only hold Object references, so we had to cast the items that came out of a collection to

  • we can do runtime type-checking in python

    • for advanced versions of runtime typechecking (i.e. anything beyond doing guards for isinstance) we have to rely on some libraries

      See this

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      
        Yes, you can perform runtime type checking in Python, and there are several ways to achieve this. Here’s a detailed overview based on the search results:
      
        ### Runtime Type Checking in Python
      
        1. **Dynamic Typing**: Python is a dynamically typed language, meaning that types are determined at runtime. This allows for flexibility but can lead to type-related errors if not managed properly.
      
        2. **Type Checking Tools**:
      ​     - **`isinstance()`**: The built-in function `isinstance()` can be used to check if an object is an instance of a specific class or a tuple of classes. However, it does not support checking against complex type annotations defined in the `typing` module.
             ```python
             x = 5
             if isinstance(x, int):
                 print("x is an integer")
             ```
      
        3. **TypeGuard Library**:
      ​     - Libraries like **TypeGuard** provide runtime type checking capabilities that can validate function arguments and return types based on type annotations.
      ​     - You can use the `@typechecked` decorator to automatically check types at runtime.
             ```python
             from typeguard import typechecked
      
             @typechecked
             def add(a: int, b: int) -> int:
                 return a + b
      
             add(1, 2)  # Works fine
             add(1, "2")  # Raises TypeError at runtime
             ```
      
        4. **Other Libraries**:
      ​     - **`runtime-type-checker`**: This library allows you to perform type checks at runtime using decorators or direct function calls like `check_type()`.
             ```python
             from runtime_type_checker import check_type
      
             check_type("hello", str)  # OK
             check_type(123, str)      # Raises TypeError
             ```
      
        5. **Pydantic and Marshmallow**: These libraries are often used for data validation and serialization, providing runtime checks based on type annotations.
      ​     - **Pydantic Example**:
             ```python
             from pydantic import BaseModel
      
             class User(BaseModel):
                 name: str
                 age: int
      
             user = User(name="Alice", age=30)  # Valid
             user = User(name="Alice", age="30")  # Raises validation error
             ```
      
        ### Summary of Key Points
      
        - **Type Annotations**: While Python supports type hints (introduced in PEP 484), they are primarily for static analysis tools like `mypy`. By default, Python does not enforce these types at runtime.
      ​  - **Runtime Type Checking Libraries**: To enforce type checks during execution, you can use libraries such as TypeGuard, runtime-type-checker, Pydantic, or Marshmallow.
      ​  - **Flexibility vs. Safety**: The dynamic nature of Python provides flexibility but may lead to errors that can be caught with proper type checking mechanisms.
      
        ### Conclusion
      
        In summary, while Python does not enforce type hints at runtime by default, you can implement runtime type checking using various libraries designed for this purpose. These tools help ensure that your code adheres to the expected types defined in your annotations, providing an additional layer of safety in your applications.
      
        Citations:
        [1] https://stackoverflow.com/questions/43646823/python-3-5-checking-type-annotation-at-runtime
        [2] https://github.com/agronholm/typeguard
        [3] https://pypi.org/project/runtime-type-checker/
        [4] https://testdriven.io/blog/python-type-checking/
        [5] https://discuss.python.org/t/proposal-for-enhancing-runtime-type-checking-in-python-using-pattern-matching-and-type-hints-in-function-definitions/59391
        [6] https://www.datacamp.com/tutorial/type-checking-in-python
        [7] https://www.infoworld.com/article/2268917/get-started-with-python-type-hints.html
        [8] https://realpython.com/python-type-checking/
      

    ``covers runtime type checking validation.''

Further Reading

:PROPERTIES: :NOTER_PAGE: (328 . 0.3852)

Chapter 9. Decorators and Closures

  • there’s a need to disambiguate the following:

    1. “decorators” in the context of design patterns vs

    2. “decorators” as a python source code syntax feature to do basic metaprogramming

    3. “decorators” in the context of source code that alters the syntax tree when walking and annotating it

      b and c are related

      See elaboration here:

        1
        2
        3
        4
        5
        6
        7
        8
        9
       10
       11
       12
       13
       14
       15
       16
       17
       18
       19
       20
       21
       22
       23
       24
       25
       26
       27
       28
       29
       30
       31
       32
       33
       34
       35
       36
       37
       38
       39
       40
       41
       42
       43
       44
       45
       46
       47
       48
       49
       50
       51
       52
       53
       54
       55
       56
       57
       58
       59
       60
       61
       62
       63
       64
       65
       66
       67
       68
       69
       70
       71
       72
       73
       74
       75
       76
       77
       78
       79
       80
       81
       82
       83
       84
       85
       86
       87
       88
       89
       90
       91
       92
       93
       94
       95
       96
       97
       98
       99
      100
      101
      102
      103
      104
      
             Absolutely—the term “decorator” is overloaded in the programming lexicon, and disambiguating its different meanings is essential, especially for a technical audience steeped in first-principles thinking.
      
             ## 1. **Decorator as an Object-Oriented Design Pattern**
      
             **Definition:**
      ​       - The **Decorator Pattern** (from the Gang of Four design patterns) is a structural object-oriented pattern.
      ​       - **Mental Model:** It’s about composing behaviors at runtime by “wrapping” objects so you can add, remove, or alter responsibilities dynamically, without modifying the original object’s code.
      ​       - **Canonical Use:** You create a base interface, concrete components, and “decorator” classes implementing the same interface but containing an instance of the component (or another decorator). The decorator adds extra functionality before/after delegating calls to the wrapped object.
      
             **Example:**
             ```python
             class Coffee:
                 def cost(self):
                     return 5
      
             class MilkDecorator:
                 def __init__(self, coffee):
                     self._coffee = coffee
      
                 def cost(self):
                     return self._coffee.cost() + 2  # adds cost of milk
      
             coffee = Coffee()
             milk_coffee = MilkDecorator(coffee)
             print(milk_coffee.cost())  # Output: 7
             ```
             **Key Points:**
      ​       - The object’s interface remains constant.
      ​       - You can layer (nest) decorators to compose features.
      ​       - Pattern is about **runtime behavior extension** via object composition.
      
             ## 2. **Decorator as a Python Source Code Syntax Feature (Function/Method/Class Annotation)**
      
             **Definition:**
      ​       - In Python, a *decorator* is language syntax (the `@decorator` notation) that allows alteration or augmentation of functions, methods, or classes.
      ​       - **Mental Model:** This is a form of metaprogramming: you take a callable (“the decorated”) and replace it with the result of passing it to a decorator function (or class). The original entity is replaced—effectively rewiring its meaning at definition time.
      ​       - This mechanism operates during **definition time**—when the source code is executed—and can be used for logging, access control, transformation, memoization, etc.
      
             **Example:**
             ```python
             def my_decorator(func):
                 def wrapper(*args, **kwargs):
                     print("Before")
                     return func(*args, **kwargs)
                 return wrapper
      
             @my_decorator
             def greet():
                 print("Hello")
             ```
      ​       - At load-time, `greet = my_decorator(greet)`.
      
             **Key Points:**
      ​       - Not limited to adding or layering behavior: you can totally replace functionality, register, introspect, or annotate.
      ​       - Strongly linked to Python’s first-class, runtime-accessible function objects.
      ​       - Used pervasively in frameworks (Flask, Django, pytest, etc.).
      
             ## 3. **Decorator as an Abstract Syntax Tree (AST) Transformer**
      
             **Definition:**
      ​       - In advanced metaprogramming scenarios, *decorator* can also refer to code that inspects and modifies the **AST itself**—that is, the parse tree Python generates before executing code.
      ​       - **Mental Model:** These are meta-tools (not the `@decorator` feature) that programmatically traverse, annotate, or even rewrite code during, for example, static analysis, linting, or source-to-source transformation.
      ​       - Examples include tools like `ast` module in Python’s standard library, or third-party packages for code analysis, type-checking, or metaclass generation.
      
             **Example (AST transformation):**
             ```python
             import ast
      
             class FunctionAnnotator(ast.NodeTransformer):
                 def visit_FunctionDef(self, node):
                     # Insert or rewrite a decorator in the AST node
                     ...
                     return node
      
             tree = ast.parse(open('somefile.py').read())
             # Walk/transform the tree
             FunctionAnnotator().visit(tree)
             ```
             **Key Points:**
      ​       - This usage of “decorator” is about source-level transformations, not runtime behavior or object composition.
      ​       - Such tools might annotate or rewrite code to reflect design-time properties or enforce invariants.
      ​       - Operates at a different abstraction layer: **syntax tree instead of runtime objects**.
      
             ## **Disambiguating the Terms**
      
             - **Decorator Pattern (OOP):**
             Object composition to add functionality via wrappers, used in class/instance structures. “Decorator” is a *pattern* in the sense of design, independent of language syntax.
      ​       - **Python Decorator Syntax:**
             Metaprogramming at definition time (`@decorator`), altering functions/methods through higher-order functions or callables. This is a *language feature* deeply integrated into Python’s runtime and loading mechanisms.
      ​       - **AST/Syntax Tree Decorator/Transformer:**
             Tools that mutate or annotate the language’s syntax tree itself, often before code ever runs. This is a *meta-level code transformation* technique, not a runtime object operation.
      
             ## **Mental Model and Practical Takeaway**
      
             - Whenever we encounter the word “decorator,” first clarify: **Are we talking about a design pattern, a Python syntactic feature, or code transformation tools working with the syntax tree?**
      ​       - For a **tech leader**, it’s crucial to recognize **the same term can encode different mental models**, and context defines which conceptual layer we’re operating in—object structure, runtime function composition, or compile-time/source-level transformation.
      
             > **In summary:**
             > - The term **“decorator”** in software can mean (1) a design pattern for dynamic behavioral extension, (2) the Python syntax feature for code transformation at definition time, or (3) a concept/tool for manipulating code at the syntax-tree level. These meanings are distinct in logic, purpose, and implementation, and using their precise terminology and mental models prevents architectural ambiguities in code review, design discussions, and tooling selection.
      
             **References:**
             1. Gamma, Helm, Johnson, Vlissides. “Design Patterns: Elements of Reusable Object-Oriented Software.”
             2. Python Language Reference, section on function definitions and decorators.
             3. Python `ast` module documentation and real-world linter/analysis tools (e.g., Black, mypy internals).
      
  • closures are a foundational concept that help us reason about:

    • how to write decorators and use the nonlocal keyword

    • programming that uses callbacks, or when writing python code in a functional fashion.

precursor learning objectives

• How Python evaluates decorator syntax • How Python decides whether a variable is local • Why closures exist and how they work • What problem is solved by nonlocal With this grounding, we can tackle further decorator topics: • Implementing a well-behaved decorator • Powerful decorators in the standard library: @cache, @lru_cache, and @single dispatch • Implementing a parameterized decorator

What’s New in This Chapter

Decorators 101

  • definition: “callable that takes another function as argument”

    • 3 facts to summarise it:

      1. a decorator is a function or another callable
      2. a decorator may replace the decorated function with a different one
      3. decorators are executed IMMEDIATELY when a module is LOADED (@ load-time)
    • When you “decorate” a function in Python (either with the @decorator syntax or by manual assignment), you are:

      • Passing the original function object to a decorator callable (function or class).

      • Receiving back the return value of the decorator, which is typically—but not always—an inner function that wraps (or sometimes replaces) the original.

      • Binding a variable (often the same name as the original function) to this new object.

    • class decorators also exist!

When Python Executes Decorators - import-time vs run-time

  • import-time vs runtime import time would refer to when the module is loaded.
    • within a script \(\rightarrow\) when the script starts
    • within a different module \(\rightarrow\) when the importing is actually done

to emphasize that function decorators are executed as soon as the module is imported, but the decorated functions only run when they are explicitly invoked. This highlights the difference between what Pythonistas call import time and runtime.

Registration Decorators

  • typically decorators define an inner function that uses the decorated function and return that inner function.

    this in a way, supports the GOF decorator pattern as well

Variable Scope Rules

this is a fundamental concept:

  • compare with JS, if we have a variable defined local to the scope of a function,

    then in python, it will always assume that it’s a local variable and it won’t look further in the outer scopes to resolve that name.

    in JS, the name resolution may go to higher scopes and we might potentially modify a global variable unknowingly

    in python, refs to a locally defined name will never bubble out to a larger scope

    But the fact is, when Python compiles the body of the function, it decides that b is a local variable because it is assigned within the function. The generated bytecode reflects this decision and will try to fetch b from the local scope. Later, when the call f2(3) is made, the body of f2 fetches and prints the value of the local variable a, but when trying to fetch the value of local variable b, it discovers that b is unbound.

  • so in this example, we have 3 scopes in action actually:

    1. module global scope

    2. the f3 function’s scope

    3. the nonlocal scope

      though this part is not super obvious yet

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
             b = 6
             def f3(a):
                 global b
                 print(a)
                 print(b)
      
                 b = 9
      
             print(f3(3))
      
             print(b)
      
  • the nature of the variable (local vs. global) can’t change in the body of a function

Closures

  • summary:

    a closure is a function that retains the bindings of the free variables that exist when the function is defined, so that they can be used later when the function is invoked and the defining scope is no longer available.

    the only situation in which a function may need to deal with external variables that are nonglobal is when it is nested in another function and those variables are part of the local scope of the outer function.

  • a closure is a function—let’s call it f—with an extended scope that encompasses variables referenced in the body of f that are not global variables or local variables of f.

    Such variables must come from the local scope of an outer function that encompasses f.

    It does not matter whether the function is anonymous or not; what matters is that it can access nonglobal variables that are defined outside of its body.

  • disambiguating closures from anon functions

    Actually, a closure is a function—let’s call it f—with an extended scope that encom‐ passes variables referenced in the body of f that are not global variables or local vari‐ ables of f. Such variables must come from the local scope of an outer function that encompasses f. It does not matter whether the function is anonymous or not; what matters is that it can access nonglobal variables that are defined outside of its body.

    • refer to this for the verbose disambiguation between closures and anon functions
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      
        To disambiguate **closures** and **anonymous functions** in programming languages, particularly in the context of Python, we need to clarify their definitions, characteristics, and how they relate to each other.
      
        ### Definitions
      
        1. **Anonymous Functions**:
      ​     - An **anonymous function** (often referred to as a **lambda function** in Python) is simply a function that does not have a name. It is defined using the `lambda` keyword and can be used wherever function objects are required.
      ​     - Example in Python:
             ```python
             add = lambda x, y: x + y
             print(add(2, 3))  # Output: 5
             ```
      ​     - Anonymous functions are useful for short, throwaway functions that are not reused elsewhere.
      
        2. **Closures**:
      ​     - A **closure** is a function that captures the lexical scope in which it was defined, allowing it to access variables from that scope even when it is executed outside of that scope.
      ​     - In Python, closures are typically created when a nested function references variables from its enclosing scope.
      ​     - Example in Python:
             ```python
             def outer_function(x):
                 def inner_function(y):
                     return x + y  # 'x' is captured from the outer scope
                 return inner_function
      
             closure = outer_function(10)
             print(closure(5))  # Output: 15
             ```
      ​     - Here, `inner_function` forms a closure that captures the variable `x` from `outer_function`.
      
        ### Key Differences
      
        1. **Nature**:
      ​     - **Anonymous Function**: A function without a name; it can be defined and used inline.
      ​     - **Closure**: A function that retains access to its lexical scope even when invoked outside of that scope.
      
        2. **Creation**:
      ​     - **Anonymous Function**: Can be created anywhere, typically for short-lived operations (e.g., as arguments to higher-order functions).
      ​     - **Closure**: Created when a nested function references variables from its enclosing scope.
      
        3. **Scope Access**:
      ​     - **Anonymous Function**: Does not inherently capture any external state unless it is defined within another function.
      ​     - **Closure**: Specifically designed to capture and retain access to its surrounding state.
      
        4. **Use Cases**:
      ​     - **Anonymous Functions**: Often used for callbacks or functional programming constructs (like `map`, `filter`, etc.).
      ​     - **Closures**: Useful for data hiding and encapsulation, maintaining state across function calls.
      
        ### Python Context
      
        In Python, both anonymous functions and closures are supported:
      
        - **Anonymous Functions (Lambdas)**:
          ```python
          square = lambda x: x * x
          print(square(4))  # Output: 16
          ```
      
        - **Closures**:
          ```python
          def make_multiplier(factor):
              def multiplier(x):
                  return x * factor  # 'factor' is captured from the outer scope
              return multiplier
      
          double = make_multiplier(2)
          print(double(5))  # Output: 10
          ```
      
        ### Summary
      
        - All closures can be considered anonymous functions if they are defined without a name, but not all anonymous functions create closures.
      ​  - The distinction lies in the ability of closures to capture and maintain access to their lexical environment, which anonymous functions do not inherently possess unless they are part of a nested structure.
      ​  - In Python, this distinction allows for powerful programming patterns such as decorators and higher-order functions while maintaining clean and concise code.
      
        By understanding these concepts, developers can effectively utilize both anonymous functions and closures in their programming practices, leveraging their unique characteristics for various scenarios.
      
        Citations:
        [1] https://stackoverflow.com/questions/4912116/closure-vs-anonymous-function-difference
        [2] https://steemit.com/php/@hvonsteemit/the-power-of-anonymous-functions-closures-in-php
        [3] https://www.reddit.com/r/PHP/comments/4s1f9l/anonymous_functions_vs_closures_the_way_you_like/
        [4] https://thoughtbot.com/blog/back-to-basics-anonymous-functions-and-closures
        [5] https://doc.rust-lang.org/book/ch13-01-closures.html
        [6] https://www.tutorialspoint.com/javascript-closures-vs-anonymous-functions
        [7] https://www.php.net/manual/en/functions.anonymous.php
        [8] https://realpython.com/python-type-hints-multiple-types/
      
  • free variables: variables that is not bound in the local scope

    Consider this example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    def make_averager():
            series = []
    
            def averager(new_value):
                    # this assignment here makes series locally bound for the lifespan of this averager function
                    series.append(new_value)
                    total = sum(series)
                    return total / len(series)
    
            return averager
    

    so avg is a reference to the inner function, averager which has the outer series in its scope.

    within averager, series WAS originally a local variable because of the assignment within its function body

    however after make_averager returns, that local scope is gone.

    within averager, series is a free variable, a variable that is NOT bound in the local scope.

    the closure for averager extends the scope of that function to include the binding for the free variable, series

    Figure 9-1. The closure for averager extends the scope of that function to include the binding for the free variable series. Inspecting the returned averager object shows how Python keeps the names of local and free variables in the code attribute that represents the compiled body of the function. Example 9-10 demonstrates. Example 9-10. Inspecting the function created by make_averager in Example 9-8 >>> avg._code_.co_varnames (’new_value’, ’total’) >>> avg._code_.co_freevars

  • how free variables are kept within __closure__ attributes

    :NOTER_PAGE: (344 0.5105177993527509 . 0.33121019108280253)

    • closures are actually represented using cell items / attrs

      avg.__closure__[0].cell_contents

    • alternatively, we could inspect the __code__ for that variable, just to observe this.

      avg.__code__.co_freevars which gives the free variables for that function

    is kept in the closure attribute of the returned function avg. Each item in avg.__closure__ corresponds to a name in avg. code .co_freevars. These items are cells, and they have an attribute called cell_con tents where the actual value can be found.

  • only situation for why a function may require access to variables outside its scope

    that the only situation in which a function may need to deal with external vari‐ ables that are nonglobal is when it is nested in another function and those variables are part of the local scope of the outer function.

The nonlocal Declaration

  • using nonlocal keyword to explicitly define free variables

    • lets you declare a variable as a free variable even when it is assigned within the function. If a new value is assigned to a nonlocal variable, the binding stored in the closure is changed.

    • prevents the use-before-definition error because the variable will be regarded as a free-variable if it’s indicated as nonlocal

    To work around this, the nonlocal keyword was introduced in Python 3. It lets you declare a variable as a free variable even when it is assigned within the function. If a new value is assigned to a nonlocal variable, the binding stored in the closure is changed.

Variable Lookup Logic - how python resolves names

  • Rules for defining how to fetch a variable x that appears in function:
    1. if we have a global x, then x comes from and is assigned to the x global variable module

      NOTE: python doesn’t have a program global scope, only module global scopes

    2. if we have a nonlocal x declaration then x comes from and is assigned to the x local variable of the nearest surrounding function where x is defined

    3. If x is a parameter / assigned a value in the function body, then x is the local variable

    4. If x is referenced (read only), but not assigned (no modification) and is NOT a parameter:

      • x is looked up the local scopes of the surrounding function bodies (non-local scopes) \(\rightarrow\) free variable!

      • if nothing, then will be read from the module global scope

      • if still nothing, then read from __builtins__.__dict__

:NOTER_PAGE: (346 . 0.496385)

Implementing a Simple Decorator

  • typical behavior of a decorator: it replaces the decorated function with a new function that accepts the same arguments and (usually) returns whatever the decorated function was supposed to return, while also doing some extra processing.

  • this is partly aligned with the GOF Decorator Pattern in OOP.

    In both cases, we extend behaviour, but the way we do it is different.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
      Your summary of a typical Python decorator—that it “replaces the decorated function with a new function that accepts the same arguments and (usually) returns whatever the decorated function was supposed to return, while also doing some extra processing”—captures how decorators are commonly used in Python. However, this pattern only partly aligns with the **Gang of Four (GOF) Decorator Pattern** as described in object-oriented design.
    
      ### GOF Decorator Pattern (Design Pattern)
    ​  - The **GOF Decorator Pattern** is about **dynamically attaching new responsibilities to objects** at runtime by wrapping them in decorator objects[2][7].
    ​  - In the traditional pattern, decorators are classes that implement the same interface as the objects they wrap. Additional behavior is layered by composition, and you can recursively chain decorators together[2][3][5].
    ​  - The emphasis is on **object composition** and runtime extension, not on function replacement or metaprogramming.
    
      ### Python Decorators (Language Feature)
    ​  - **Python decorators** (via the `@decorator` syntax) are a **language feature** for transforming functions or classes at definition time[1]. A Python decorator is usually a higher-order function or class that takes a function (or class) and returns a replacement—often a function that wraps the original, accepting the same parameters, returning the same output, and adding pre- or post-processing.
    ​  - This approach is similar in spirit to the design pattern because it allows extension or modification of behavior without changing the original function directly[1]. The decorator often "wraps" the original so that calling the decorated function results in the new ("decorator") logic being executed, often around a call to the original.
    
      ### Key Distinctions
    ​  - The **GOF pattern operates at the object/class level**, using composition and polymorphism.
    ​  - The **Python feature generally operates at the function (or class) level**, using higher-order functions for metaprogramming and code transformation at definition time.
    ​  - Both systems enable behavioral extension, and Python decorators can be used to implement the decorator pattern, but they're not always identical in mechanics or intention[1][5].
    
      ### In summary:
    ​  - Your description of the typical behavior matches how Python decorators are commonly implemented and used.
    ​  - This overlap is **inspired by the GOF Decorator Pattern**; the intent—to extend or alter behavior without modifying the original—is present in both.
    ​  - However, **Python's decorator syntax and idioms are a language feature enabling, but not limited to, the classical design pattern**. The mechanics and use cases often go beyond, including metadata registration, memoization, method validation, and many scenarios not contemplated by the original OOP design pattern[1][3][7].
    
      So: **your summary matches common Python decorator behavior, which parallels the GOF Decorator Pattern but is not restricted to it—they share intent, but not all implementation details**[1][2][5].
    
      [1] https://stackoverflow.com/questions/17927129/is-python-decorator-related-to-the-decorator-design-pattern
      [2] https://www.digitalocean.com/community/tutorials/gangs-of-four-gof-design-patterns
      [3] https://refactoring.guru/design-patterns/decorator
      [4] https://www.reddit.com/r/Python/comments/r0apia/which_of_the_design_patterns_of_the_og_gang_of/
      [5] https://python-patterns.guide/gang-of-four/decorator-pattern/
      [6] https://www.geeksforgeeks.org/system-design/decorator-pattern/
      [7] https://en.wikipedia.org/wiki/Decorator_pattern
      [8] https://github.com/tuvo1106/python_design_patterns
    
  • TIP: using @functools.wraps() will allow the decoration to have the same docs and variadic kwargs and such

    this is an example of a standard, ready-to-use decorator

``def clock(func): @functools.wraps(func) def clocked(*args, **kwargs):''

Decorators in the Standard Library

  • Memoization with functools.cache

    • stacking decorators composes them, starting from the bottom of the stack then applies outwards towards the top of the stack

      1
      2
      3
      
        @alpha
        @beta
        def my_fn():
      

      which is equivalent to nested functions my_fn = alpha(beta(my_fn))

    • args to the decorated function needs to be hashable

      this is because the internal representation for the lru_cache is actually a dict and the keys are from the positional and kwargs used in the calls.

      arguments taken by the decorated function must be hashable, because the underlying lru_cache uses a dict to store the results, and the keys are made from the positional and keyword arguments used in the calls.

    • a good usecase for @cache is caching remote API calls!

    • careful that we don’t end up using all the available memory ( may happen since this cache is unrestricted ). we can use functools.lru_cache(maxsize=<mymaxsize>) instead

  • Using @lru_cache to keep the memory bounded

    • there’s a default of 128 which is 128 entries in the cache
    • for optimal performance, maxsize parameter should be a power of 2 to make it optimal
    • typed parameter, if true, discriminates argument type within its store, so 1.0 and 1 are treated as two different entries in the store.
  • Single Dispatch to Contribute to Generic Functions (aka Multimethods)

    • python doesn’t have method overloading (like in Java, for example)

    • in this example for htmlize, with multiple rules, we can:

      • let htmlize act as a dispatcher that dispatches multiple specific functions e.g. htmlize_str and so on
    • so, functools.singledispatch decorator:

      • allows different modules to contribute to the overall solution, and lets you easily provide specialized functions even for types that belong to third-party packages that you can’t edit.

      • a generic function here is a group of functions to perform the same operation in different ways, depending on the type of the first argument

      • it’s singledispatch because only the first argument is used as the determinant, else it would have been called “multiple dispatch”

      • it’s going to use a custom @<generic_fn_name>.register to do the bundling of the specific functions

    • implementing it:

      • the function that we apply the @singledispatch decorator to will end up being the name of the generic function that we want to bundle functionality for. Also, the base case implementation (e.g. something that works with a generic Object type), will be found there.

      • for the other specialised functions (contributors to the generic function) , we can use the custom @<generic_fn>.register to register them

        • the specialised function

        • only 1st argument matters

        • we can define them at varying levels of class-specificity

          e.g. bool is a subtype-of numbers.Integral, but the singledispatch logic seeks the implementation with the most specific matching type, regardless of the order they appear in the code.

        • we can also stack other decorators atop the register decorator because the @<base>.register decorator returns the undecorated function.

      • TIP: register specialised functions to handle ABCs or Protocols to enjoy more polymorphic behaviour

        Using ABCs or typing.Protocol with @singledispatch allows your code to support existing or future classes that are actual or virtual subclasses of those ABCs, or that implement those protocols.

        KIV virtual subclasses till chapter 13

      • remember that we can register functions that works with 3rd party libraries!

    this analogous to function overloading in typed languages

    here’s the functools docs for it

    • NOTE it’s NOT supposed to be a parallel to java method overloading, it’s to support MODULAR EXTENSION

      • modular extension in the sense that each module can register a specialized function for each type it supports.

      @singledispatch is not designed to bring Java-style method over‐ loading to Python. A single class with many overloaded variations of a method is better than a single function with a lengthy stretch of if/elif/elif/elif blocks. But both solutions are flawed because they concentrate too much responsibility in a single code unit—the class or the function. The advantage of @singledispatch is supporting modular extension: each module can register a speci‐ alized function for each type it supports. In a realistic use case, you would not have all the implementations of generic functions in the same module as in

Parameterized Decorators

  • Key Idea: use a decorator factory to create the decorator

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
      registry = set()
    
      # this here is a decorator factory
      def register(active=True):
              def decorate(func):
                      print('running register'
                            f'(active={active})->decorate({func})')
              if active:
                      registry.add(func)
              else:
                      registry.discard(func)
              return func
           return decorate
    
      @register(active=False)
      def f1():
              print('running f1()')
    
      @register()
      def f2():
              print('running f2()')
    
      def f3():
              print('running f3()')
    
  • NOTE: we can’t just call @name anymore, we have to actually call that factory function

    read more here:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    
      It is correct to say that a **decorator factory** in Python must always be called using `@name()` (with parentheses), because it is a function that returns a decorator. The result of calling the factory—i.e., `name()`—must itself be a callable that accepts the function being decorated, and only this returned callable is applied as the actual decorator.
    
      - Using `@name` is for *simple decorators*: the decorator is itself a callable that takes a function as input (e.g., `def decorator(fn): ...`), and is applied as `@decorator`.
    ​  - A **decorator factory** is a function returning a decorator, often parameterized (e.g., `def decorator_factory(arg): ...`). You must invoke the factory—`@decorator_factory(args)`—so that what is returned will accept and wrap the underlying function[2][3][5].
    
      ### Does `@name` Ever Act as a Nullary Call?
    
      No, `@name` does **not** cause an implicit function call. The interpreter does *not* call `name()` for you. `@name` without parentheses applies exactly what’s bound to `name` as a decorator; it is not treated as `@name()`. This makes explicit invocation with parentheses (`@name()`) necessary if you intend to use a decorator factory, even with no arguments[2][3].
    
      - If you accidentally use `@name` (no parentheses) when `name` is actually a decorator factory, you will get a `TypeError` because the decorator factory is not the correct type—it hasn’t yet received its arguments and it doesn’t accept the function to be decorated as its first input.
    
      ### Example:
    
      ```python
      def decorator_factory(arg):
          def decorator(fn):
              ...
              return fn
          return decorator
    
      @decorator_factory(42)  # correct usage: applies the decorator returned from factory
      def foo(): ...
      ```
      If you write `@decorator_factory` (no parentheses) above, `foo` is actually passed to `decorator_factory`, not to the decorator it returns, and the protocol is broken.
    
      ### Summary Table
    
      | Decorator Syntax    | Used For              | Calls Made Implicitly  |
      |---------------------|-----------------------|------------------------|
      | `@name`             | Simple decorator      | No call (just binding) |
      | `@name()`           | Decorator factory     | Calls factory, supplies result as decorator |
      | `@name(arg1, arg2)` | Decorator factory     | Calls factory with arguments |
    
      In conclusion:
    ​  - **Decorator factories must be used as `@name()`** (or with arguments: `@name(arg, ...)`).
    ​  - **`@name` does NOT act as a nullary call** and does not call the factory.
    ​  - Using `@name` when only a decorator factory is defined will result in errors or unexpected behavior[2][3][5].
    
      [1] https://stackoverflow.com/questions/44358027/why-is-using-decorator-factory-factory-functions-a-syntax-error
      [2] https://www.thepythoncodingstack.com/p/demystifying-decorators-parts-3-to-7
      [3] https://blog.devgenius.io/decorator-factory-by-example-e3f2774b0baa
      [4] https://www.geeksforgeeks.org/python/decorators-in-python/
      [5] https://realpython.com/primer-on-python-decorators/
      [6] https://www.freecodecamp.org/news/the-python-decorator-handbook/
      [7] http://simeonfranklin.com/blog/2012/jul/1/python-decorators-in-12-steps/
    
  • possible to call the decorator factory function directly too without the @

    • though we would have to call it like so: register(active=False)(f)

    If, instead of using the @ syntax, we used register as a regular function, the syntax needed to decorate a function f would be register()(f) to add f to the registry, or register(active=False)(f) to not add it (or remove it).

  • The Parameterized Clock Decorator

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    import time
    
    DEFAULT_FMT = '[{elapsed:0.8f}s] {name}({args}) -> {result}'
    
    def clock(fmt=DEFAULT_FMT):
            def decorate(func):
                    def clocked(*_args):
                            t0 = time.perf_counter()
                            _result = func(*_args)
                            elapsed = time.perf_counter() - t0
                            name = func.__name__
                            args = ', '.join(repr(arg) for arg in _args)
                            result = repr(_result)
                            print(fmt.format(**locals())) # NB: see the locals passing like it's done here
                            return _result
                    return clocked
            return decorate
    
    
    if __name__ == '__main__':
            @clock()
            def snooze(seconds):
                    time.sleep(seconds)
    
    • we can pass local variables to a fn like fmt like so: print(fmt.format(**locals()))
  • A Class-Based Clock Decorator

    • these examples just used functions as a demo example
    • for non-trivial cases, better to think of decorators as classes with a __call__()=

    ``implemented as a class with call''

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    import time
    
    DEFAULT_FMT = '[{elapsed:0.8f}s] {name}({args}) -> {result}'
    
    class clock:
            def __init__(self, fmt=DEFAULT_FMT):
                    self.fmt = fmt
            def __call__(self, func):
                    def clocked(*_args):
                            t0 = time.perf_counter()
                            _result = func(*_args)
                            elapsed = time.perf_counter() - t0
                            name = func.__name__
                            args = ', '.join(repr(arg) for arg in _args)
                            result = repr(_result)
                            print(self.fmt.format(**locals()))
                            return _result
                    return clocked
    
    1. the clock class here is our parameterized decorator factory

    KIV class decorators for chapter 24.

Chapter Summary

Further Reading

  1. general series on python decorators, starting with “how you implemented your Python decorator is wrong”

dynamic vs lexical scoping

  • lexical scope is the norm: free variables are evaluated considering the environment where the function is defined. Lexical scope complicates the implementation of languages with first-class functions, because it requires the support of closures. On the other hand, lexical scope makes source code easier to read.

``Dynamic Scope Versus Lexical Scope''

coming to terms b/w decorator pattern and python decorators

Note that I am not suggesting that function decorators should be used to implement the decorator pattern in Python programs. Although this can be done in specific sit‐ uations, in general the decorator pattern is best implemented with classes to represent the decorator and the components it will wrap.

Chapter 10. Design Patterns with First-Class Functions

  • design pattern: general recipe for solving common design problems

  • language independent; however practically some languages already have inbuilt recipes for some of these patterns

    e.g. Generators in python are the inbuilt version of the Iterator pattern.

  • in the context of languages that support first-class functions, the fact that we can leverage functions as first-class objects is useful to make code simpler.

    the following classic patterns might need a rethink because functions can do the same work as classes while improving readability and reducing boilerplate:

    1. Strategy Pattern

    2. Command Pattern

    3. Template Method

    4. Visitor Pattern

What’s New in This Chapter

Case Study: Refactoring Strategy

  • the objective of this case study is to see how we can leverage functions as first-class objects.
  • Classic Strategy

    • what it is:
      • “Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.”
      • participants:
        1. Context

          • Provides a service by delegating some computation to interchangeable components that implement alternative algorithms.
          • in the example, this is the Order
        2. Strategy

          • interface common to the components that implement the different algorithms.

            it’s the Promotion abstract class

        3. Concrete Strategy

          • one of the concrete classes that implement the abstract class
  • Function-Oriented Strategy

    • useful characteristics form the class-based implementation:

      1. concrete strategies have a single useful method

      2. strategy instances are stateless and hold no instance attributes

      It’s because of these reasons, we can consider replacing the concrete strategies with simple functions, and removing the abstract class.

    • “strategy objects often make good flyweights” is the advice, wherein the cons of the Strategy pattern, which is its runtime cost (e.g. when instantiating the strategy) is addressed by using a Flyweight pattern.

      now we end up getting more boilerplate

    • the python way of using first class functions works well in general because:

      • in most cases, concrete strategies don’t need to hold internal state because they deal with data injected by the context \(\implies\) good enough to use plain old functions.

      • a function is more lightweight than an instance of a user-defined class + we can just create each function once and use it.

  • Choosing the Best Strategy using MetaStrategy: Simple Approach

    • Once you get used to the idea that functions are first-class objects, it naturally follows that building data structures holding functions often makes sense.
  • Finding Strategies in a Module

    • Modules are also first-class objects

    • globals() returns the current global symbol table. We can inspect attributes of the class object and get the function attributes defined within it like so: promos = [func for _, func in inspect.getmembers(promotions, inspect.isfunction)]

    • I see this as a “pull method” almost where we try to pull together attributes that might make sense. Naturally a registration decorator approach makes more sense already so that we can do a “pull method” approach

Decorator-Enhanced Strategy Pattern

Here’s the example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Promotion = Callable[[Order], Decimal]

promos: list[Promotion] = [] # NB: this is a module global

# this is a registration decorator, it simply registers this function
def promotion(promo: Promotion) -> Promotion:
        promos.append(promo)
        return promo

def best_promo(order: Order) -> Decimal:
        """Compute the best discount available"""
        return max(promo(order) for promo in promos)

@promotion
def fidelity(order: Order) -> Decimal:
        """5% discount for customers with 1000 or more fidelity points"""
        if order.customer.fidelity >= 1000:
                return order.total() * Decimal('0.05')
        return Decimal(0)

@promotion
def bulk_item(order: Order) -> Decimal:
        """10% discount for each LineItem with 20 or more units"""
        discount = Decimal(0)
        for item in order.cart:
                if item.quantity >= 20:
                        discount += item.total() * Decimal('0.1')
        return discount

@promotion
def large_order(order: Order) -> Decimal:
        """7% discount for orders with 10 or more distinct items"""
        distinct_items = {item.product for item in order.cart}
        if len(distinct_items) >= 10:
                return order.total() * Decimal('0.07')
        return Decimal(0)

Advantages of using this decorator:

  1. promo strategy functions don’t need special names, flexibility in naming

  2. the registration decorator also becomes a highlighting of the purpose of the function being decorated

    • also makes it easy to just comment out the decorator
  3. registration can be done from any other module, anywhere in the system as long as we use the same registering decorator

The Command Pattern

  • The goal of Command Pattern is to decouple an object that invokes an operation (the invoker) from the provider object that implements it (the receiver).

    put a Command object between the two, implementing an interface with a single method, execute, which calls some method in the receiver to perform the desired operation.

    • Invoker doesn’t need to know the interface of the receiver

    • different receivers can be adapted through different Command subclasses

    • the invoker is configed with a concrete command and calls its execute method to operate it.

  • some pointers from the example:

    • we have commands and command receivers.

      Command receivers are the objects that implement the action specific to a command.

      There can be multiple receivers that may respond to a command.

  • “Commands are an object-oriented repalcement for callbacks”. Nice. Depends on use-case but we could directly implement the callbacks if we want.

    How to use simple callback functions directly?

    Instead of giving the invoker a Command instance, we can simply give it a function. Instead of calling command.execute(), the invoker can just call command(). The MacroCommand can be implemented with a class implementing __call__. Instances of MacroCommand would be callables, each holding a list of functions for future invocation.

  • if we need more complex command usage (e.g. with undo) then we just need to keep necessary state, we could put it within classes like MacroCommand and we can use a closure to hold the internal state of a function between calls.

Chapter Summary

  • the GOF book’s patterns should be seen as steps in the design process of a system rather than end-points or structures that have to be implemented.

    this will allow us to not mindlessly add in boilerplate or structures that actually would have better ways of getting implemented if we had thought about the language’s idioms

    • In python’s case, functions or callable objects provide a more natural way of implementing callbacks in Python than mimicking the Strategy or the Command patterns

Further Reading

  • not many options available for python and design patterns in pythonic fashion, there’s a list here in this book

  • see design patterns in python from europython 2011 talk

  • funfact: If functions have a __call__ method, and methods are also callable, do __call__ methods also have a __call__ method?

    YES!!!

Part III. Classes and Protocols

Chapter 11. A Pythonic Object

Learning Objectives:

  1. builtins that convert objects to other types
  2. alternative constructors
  3. extending the formatting mini language

What’s New in This Chapter

Object Representations

  • python has the following ways to get common string representations. everything except for bytes should return Unicode strings:
    • repr(), depends on __repr__: developer’s POV

    • str() depends on __str__: user’s POV

    • bytes() depends on __bytes__ for byte sequence

    • format() depends on __format__: for f-strings

      str.format() method also relies on this.

Vector Class Redux

this example is really useful

some notes from the example:

  • __iter__ is what makes the object iterable, so that we can do things like:

    • unpacking and so on

    we can implement this by using a generator expression to yield the components one after the other

  • kiv the implentation of __eq__ and operator overloading until then

An Alternative Constructor

  • @classmethod allows a method to be called on a class.
    1
    2
    3
    4
    5
    6
    
      @classmethod
      def frombytes(cls, octets):
              typecode = chr(octets[0])
              memv = memoryview(octets[1:]).cast(typecode)
    
              return cls(*memv)
    
    Naturally, this ISN’T a static method. Here’s an outline of the diffs. The next section is about this actually.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    
      Yes, class methods and static methods are different in Python; they serve distinct purposes and have different access to class and instance data.
    
      - **Class methods** are defined using the `@classmethod` decorator. Their first parameter is `cls`, which refers to the class itself (not an instance). Class methods can access and modify class-level attributes and are often used for factory methods or operations that affect the class as a whole. They cannot directly access or modify instance-specific data[1][2][5][6][8].
    
      - **Static methods** are defined using the `@staticmethod` decorator. They do not take `self` or `cls` as their first argument and have no automatic access to either instance or class state. Static methods are utility functions, included in a class for organizational purposes, but they behave like plain functions that happen to live in a class’s namespace. They cannot access or modify class or instance data[1][2][4][5][6][8].
    
      ### Key Differences
    
      | Aspect                     | Class Method                  | Static Method                      |
      |----------------------------|-------------------------------|------------------------------------|
      | Decorator                  | `@classmethod`                | `@staticmethod`                    |
      | First parameter            | `cls` (the class)             | None required                      |
      | Accesses class data?       | Yes, via `cls`                | No                                 |
      | Accesses instance data?    | No                            | No                                 |
      | Use case                   | Factory methods, class state  | Utility functions                  |
      | Can modify class state?    | Yes                           | No                                 |
      | Can modify instance state? | No                            | No                                 |
    
      #### Example Usage
    
      ```python
      class Example:
          count = 0
    
          @classmethod
          def increment_count(cls):
              cls.count += 1
    
          @staticmethod
          def add(x, y):
              return x + y
      ```
    
      - `increment_count()` can change `Example.count`.
    ​  - `add(2, 3)` simply returns the sum and does not access or change any class or instance data[2][5][6][8].
    
      In summary: **Class methods operate on the class and its state; static methods are independent and do not touch class or instance data.**
    
      [1] https://realpython.com/instance-class-and-static-methods-demystified/
      [2] https://www.geeksforgeeks.org/python/class-method-vs-static-method-python/
      [3] https://stackoverflow.com/questions/136097/what-is-the-difference-between-staticmethod-and-classmethod-in-python
      [4] https://www.python-engineer.com/posts/difference-classmethod-and-staticmethod/
      [5] https://www.linkedin.com/pulse/exploring-differences-between-class-methods-static-python
      [6] https://accuweb.cloud/resource/articles/class-vs-static-method-in-python
      [7] https://www.boardinfinity.com/blog/class-and-static-method-in-python-differences/
      [8] https://flexiple.com/python/class-method-vs-static-method
      [9] https://www.youtube.com/watch?v=PIKiHq1O9HQ
    

@classmethod Versus @staticmethod

  • classmethod: to define a method that operates on the class and not on instances. classmethod changes the way the method is called, so it receives the class itself as the first argument, instead of an instance.

    a good use case is for alternative constructors like we see here in frombytes, or for operations that work at the class-level

    NOTE: having learnt java in school, this seems to directly map to the use of static functions in java.

    HOWEVER @classmethod in Python is not the same as Java static methods. In Python, classmethods are more powerful: they get the class as an argument, so they are aware of inheritance (subclasses), and can instantiate correct subclasses—something Java static methods cannot directly do, since they are not polymorphic in the same way and are resolved at compile-time

  • staticmethod: this changes a method so that it receives no special first argument. In essence, a static method is just like a plain function that happens to live in a class body, instead of being defined at the module level.

    this just has no access to the class or instance data.

    seems like there aren’t many good usecases for this.

Formatted Displays

  • the different invokers delegate the formatting logic to the dunder method __format__(<format_spec>)

    two ways to get the formatting specifier:

    1. the second arg in format(my_obj, format_spec)

    2. within replacement fields (which are {}) that delimit the specifier within an f-string or the fmt in fmt.str.format().

      e.g. '1 BRL = {rate:0.2f} USD'.format(rate=brl)

      in this example, the rate is not part of the specifier, it’s the kwarg for the replacement field. The actual specifier is just '0.2f'

      we could also directly reference this: f'1 USD = {1 / brl:0.2f} BRL' (notice the use of the )

  • Replacement fields:

    A format string that looks like '{0.mass:5.3e}' has 2 separate notations:

    1. field name:

      0.mass to the left is the field_name for the replacement syntax.

      it can be an arbitrary expression in an f-string.

    2. formatting specifier:

      5.3e after the colon is the formatting specifier

      this is just the formatspec mini language (ref)

  • FormatSpec Structure

    • fstrings vs str.format()
      • the fstrings allows in place replacement

      • the others separate it and make it more of an interpolation. This is great when we wanna separate it.

    let me do a bot dump for this.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    
    The **format specifier** (or `format_spec`) is a mini-language used inside replacement fields of Python format strings (e.g., in `'{0.mass:5.3e}'`) to control how the formatted value is presented.
    
    ### Breakdown of the Notation:
    
    - **Field name:**
    The part before the colon (`:`), like `0.mass` in your example, specifies what value to format. It can be a positional argument (`0`), an attribute access (`mass`), or an arbitrary expression in f-strings.
    
    - **Format specifier:**
    The part after the colon (`5.3e` in your example) is the **format spec mini-language** that instructs Python how to render the value. This includes alignment, width, precision, type (e.g., scientific notation), padding, and more.
    
    ### About the Format Spec Mini-Language
    
    The format spec language is defined in Python’s string formatting documentation and includes syntax like:
    
    ```
    [[fill]align][sign][#][0][width][,][.precision][type]
    ```
    
    Where:
    
    - **fill**: Any character used to fill the padding space.
    - **align**: `` (right), `=` (pad after sign), `^` (center).
    - **sign**: `+`, `-`, or space for numeric sign display.
    - **#**: Alternate form (e.g., include `0x` for hex).
    - **0**: Enables zero-padding.
    - **width**: Minimum field width as a number.
    - **,**: Enables thousands separator.
    - **.precision**: Number of decimal places (or significant digits).
    - **type**: Type of conversion, such as:
    - `e` or `E` for scientific notation,
    - `f` or `F` for fixed-point,
    - `g` or `G` for general format,
    - `d` for decimal integer,
    - `s` for string, etc.
    
    Your example `{0.mass:5.3e}` formats the value `mass` of the first argument in scientific notation (`e`), with a total width of 5 characters and 3 digits after the decimal point.
    
    ### Practical Use
    
    - Format specs can control numeric formatting (precision, notation).
    - They can align text inside fixed-width fields.
    - Support thousand separators and padding.
    - Allow highly readable and localized string outputs.
    
    ### References and Deep Dive
    
    - Official Python docs on [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#formatspec)
    - Real Python tutorial on [Python’s Format Mini-Language](https://realpython.com/python-format-mini-language/)
    - Examples for format spec syntax are available in many Python formatting guides.
    
    If you want, I can provide a detailed summary or a "bot dump" of the format spec mini-language with examples—just let me know!
    
    [1] https://realpython.com/python-format-mini-language/
    [2] https://docs.python.org/3/library/string.html
    [3] https://realpython.com/python-string-formatting/
    [4] https://stackoverflow.com/questions/44551535/access-the-cpython-string-format-specification-mini-language-parser
    [5] https://www.telerik.com/blogs/string-formatting-python
    [6] https://docs.python.org/3/tutorial/inputoutput.html
    [7] https://learnpython.com/blog/python-string-formatting/
    [8] https://pyformat.info
    [9] https://mimo.org/glossary/python/formatted-strings
    
    • The Format Specification Mini-Language is extensible because each class gets to interpret the format_spec argument as it likes.
  • Enhancing/extending FormatSpec for our custom objects

    • extending is as simple as just having a bunch of if-elses pased on the format spec that we get.

      • seems like it’s a good practice to remove the 'p' suffix from fmt_spec. This is so that the subsequent formatting calls on the individual components downstream this part of the code doesn’t get confused.

        Thus, to preserve compatibility with the built-in format machinery, the custom 'p' suffix must be removed before passing the fmt_spec down to format()

        elaboration:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        
            Great question! Let's analyze the code snippet and the rationale behind removing the custom `'p'` suffix from the `fmt_spec` in the `__format__` method.
        
            ### Context Recap: What Is Happening Here?
        
            The method `__format__` is being customized in a class that likely represents a 2D vector or complex number-like object with components and angle:
        
            ```python
            def __format__(self, fmt_spec=''):
                if fmt_spec.endswith('p'):
                    fmt_spec = fmt_spec[:-1]
                    coords = (abs(self), self.angle())
                    outer_fmt = ''
                else:
                    coords = self
                    outer_fmt = '({}, {})'
                components = (format(c, fmt_spec) for c in coords)
                return outer_fmt.format(*components)
            ```
        
            - **`fmt_spec`**: The format specification string passed when formatting, e.g., in an f-string.
        ​    - **Custom `'p'` suffix:** If the format spec ends with `'p'`, the method interprets it as a special request to format the vector in *polar coordinates* — providing `(r, θ)` instead of `(x, y)`.
        ​    - If `'p'` is detected, it is **stripped off** (`fmt_spec = fmt_spec[:-1]`), so it doesn't confuse the subsequent formatting calls on the individual float components.
        
            ### Why Remove the Custom `'p'` Suffix from `fmt_spec`?
        
            #### 1. **Format specifiers need to be valid for the inner formatting calls**
        
            - The method eventually calls `format(c, fmt_spec)` on each component, where each `c` is a float (radius and angle) or float-like.
        ​    - The Python built-in formatter (`format()` function) and the numeric format mini-language **do not recognize `'p'`** as a valid format code.
        ​    - If `'p'` were left in the format spec string, calling `format(float_value, fmt_spec)` would raise a `ValueError` because `'p'` is not a defined type specifier.
        
            Thus, to **preserve compatibility with the built-in format machinery**, the custom `'p'` suffix must be removed **before** passing the `fmt_spec` down to `format()`.
        
            #### 2. **Maintaining a clean interface for users**
        
            - The `'p'` suffix acts as a *signal* to the `__format__` method that **polar formatting** should be used.
        ​    - By removing it early, the code simply leverages the underlying float formatting logic for the numeric components while exposing a neat, domain-specific format feature.
        ​    - This lets users write something like `format(vec, ".2fp")` -- meaning "format this vector in polar coordinates, with two decimal places," where `" .2f"` is the standard float format and `'p'` is the custom extension.
        
            #### 3. **Separation of concerns**
        
            - The `'p'` flag is handled at the level of `__format__` **only** to choose the representation (Cartesian vs polar).
        ​    - The output formatting of numeric components is delegated to the built-in or standard format spec mini-language.
        ​    - Removing `'p'` keeps these concerns cleanly separated and avoids leaking the custom format logic into the lower formatting layers.
        
            ### Mental Model: Decorator Analogy to Formatter Chains
        
            We can think of this like a decorator on the `format` function:
        
            - The outer function (`__format__`) decorates or wraps the usual formatting logic.
        ​    - It handles the special parameter (`'p'`) that changes *what* is formatted (polar vs Cartesian).
        ​    - Then it **cleans** the format spec for the underlying calls that format numeric components, passing on only the valid portion (`.2f`, `.3e`, etc.).
        ​    - This preserves **compatibility, safety, and separation** between the custom feature and standard behavior.
        
            ### Summary
        
            | Reason                                   | Explanation                                 |
            |------------------------------------------|---------------------------------------------|
            | Prevent invalid format spec errors       | `'p'` is **not valid** in built-in float format codes; must be removed before numeric formatting calls. |
            | Clean user interface for extended formatting | `'p'` is a **custom flag** for polar representation, distinct from numeric formatting codes. |
            | Separation of concerns                    | Handle polar/cartesian choice separately from numeric formatting of components. |
            | Avoid breaking the built-in `format()` contract | Passing `'p'` would cause exceptions or undefined behavior. |
        
            ### Optional: What Happens if You Don’t Strip `'p'`?
        
            If the code passes on the whole string including `'p'`, e.g., `"5.2fp"`, to `format(1.23, "5.2fp")` on a float:
        
            - Python raises a `ValueError` like:
        
              ```
              ValueError: Unknown format code 'p' for object of type 'float'
              ```
        
            Because `p` is an unknown format type in the standard format mini-language.
        
            If you want, I can provide more detailed examples or links to the Python format spec mini-language documentation or even discuss how this relates to PEP 3101 formatting protocol and how to design robust `__format__` implementations.
        
            **In essence, the removal of the custom 'p' suffix from the format spec string ensures that this domain-specific extension coexists safely with Python’s built-in formatting system, maintaining robustness and good user experience.**
        
    • we should try to avoid confusion for the users of the custom object. Just try not to reuse a code-letter that already exists in similar object types (make reference to the formatspec mini language to see what is already available)

A Hashable Vector2d

  • what hashability unlocks:

    • we can keep sets of our custom object
    • we can use our custom object as dict keys
  • how?

    • have to implement __hash__ and __eq__ and make the object immutable.

      technically the immutability (by protecting accesses and using private attrs is not a strict requirement).

    • for immutability, fields can be converted to private fields (two leading underscores) and adding @property decorators to define the getter functions for these.

Supporting Positional Pattern Matching

  • what are positional patterns?

    the examples here are non-positional because each value is provided:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
      def keyword_pattern_demo(v: Vector2d) -> None:
              match v:
                      case Vector2d(x=0, y=0):
                              print(f'{v!r} is null')
                      case Vector2d(x=0):
                              print(f'{v!r} is vertical')
                      case Vector2d(y=0):
                              print(f'{v!r} is horizontal')
                      case Vector2d(x=x, y=y) if x==y:
                              print(f'{v!r} is diagonal')
                      case _:
                              print(f'{v!r} is awesome')
    

    so a positional pattern would look something like this: case Vector2d(_, 0):

  • have to add __match_args__: need to add a class attribute named __match_args__, listing the instance attributes in the order they will be used for positional pattern matching

    typically at least have the required args within __match_args__

Complete Listing of Vector2d, Version 3

Just to put a pin on the v3 of this didatic example:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
"""
A two-dimensional vector class

>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y)
3.0 4.0
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector2d(3.0, 4.0)
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector2d(0, 0))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector2d.frombytes(bytes(v1))
>>> v1_clone
Vector2d(3.0, 4.0)
>>> v1 == v1_clone
True

Tests of ``format()`` with Cartesian coordinates:
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of the ``angle`` method:
>>> Vector2d(0, 0).angle()
0.0
>>> Vector2d(1, 0).angle()
0.0
>>> epsilon = 10**-8
>>> abs(Vector2d(0, 1).angle() - math.pi/2) < epsilon
True
>>> abs(Vector2d(1, 1).angle() - math.pi/4) < epsilon
True

Tests of ``format()`` with polar coordinates:
>>> format(Vector2d(1, 1), 'p')  # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector2d(1, 1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1, 1), '0.5fp')
'<1.41421, 0.78540>'

Tests of `x` and `y` read-only properties:
>>> v1.x, v1.y
(3.0, 4.0)
>>> v1.x = 123
Traceback (most recent call last):
...
AttributeError: can't set attribute 'x'

Tests of hashing:
>>> v1 = Vector2d(3, 4)
>>> v2 = Vector2d(3.1, 4.2)
>>> len({v1, v2})
2
"""

from array import array
import math


class Vector2d:
    __match_args__ = ('x', 'y')
    typecode = 'd'

    def __init__(self, x, y):
        self.__x = float(x)
        self.__y = float(y)

    @property
    def x(self):
        return self.__x

    @property
    def y(self):
        return self.__y

    def __iter__(self):
        return (i for i in (self.x, self.y))

    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(array(self.typecode, self)))

    def __eq__(self, other):
        if isinstance(other, Vector2d):
            return tuple(self) == tuple(other)
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y))

    def __abs__(self):
        return math.hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))

    def angle(self):
        return math.atan2(self.y, self.x)

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]
            coords = (abs(self), self.angle())
            outer_fmt = '<{}, {}>'
        else:
            coords = self
            outer_fmt = '({}, {})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(*components)

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(*memv)
  • TO_HABIT: TIL that python has doctests too!

    damn.

Private and “Protected” Attributes in Python

  • there’s no way to actually make the variables private and immutable though

  • name mangling: why the double underscores is useful to use

    consider the case where if we don’t have a good mechanism for “private” attributes, it’s going to be possible for a child subclass to accidentally overwrite a parent-attribute

    with this in mind, the mechanism for using the double underscores is to mangle the name with the class name.

    Python stores the name in the instance __dict__ prefixed with a leading underscore and the class name, so in the Dog class, __mood (the “private” attr) becomes _Dog__mood, and in Beagle it’s _Beagle__mood.

    This language feature goes by the lovely name of name mangling.

    It’s a safety feature (to prevent accidentally access / modification) rather than a security feature.

  • “Protected attributes”:

    for those that don’t like the name mangling feature, they can just use a convention of a single underscore and explicitly naming the attribute in a “mangled” fashion.

    this has no special interpreter support though.

    • GOTCHA: In modules, a single _ in front of a top-level name does have an effect:

      if you write from mymod import *, the names with a _ prefix are not imported from mymod. However, you can still write from mymod import _privatefunc.

Saving Memory with __slots__

  • instance attributes’ storage model:

    • typically stored within a dict (__dict__), which has significant memory overhead

    • if we use __slots to hold a sequence of attr names, then it’s an alternative storage model.

      stored in a hidden array / references that use less memory than a dict

  • using slots:

    • the __slots__ attribute holds attrs in either a tuple (similar to the __match_args__) or list.

      Tuple is clearer since it implicitly shows that there’s no changing it.

    • can only be defined when the class is defined, can’t be updated (referring to its shape) thereafter

    • COUNTER-INTUITIVE:

      a subclass only partially inherits the effect of the parent’s __slots__.

      Partial because:

      1. slot attrs defined in the parent will still be stored in the reference array

      2. attrs not stored in the parent will end up being stored in a dict.

      Slots of the supercalsses are added to the slots of the current class.

    • we can have both slots (fixed attributes) and dict (dynamic attributes)

  • if we need weak refs AND we’re using slots, then the slots attributes should include '__weakref__'

  • NOTE: slots will end up referring to the internal storage variables and match args will ref to the public attribute names for positional pattern matching.

  • Simple Measure of slot Savings

    here’s how they did the checking of ram usage and such

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
    import importlib
    import sys
    import resource
    
    NUM_VECTORS = 10**7
    
    module = None
    if len(sys.argv) == 2:
        module_name = sys.argv[1].replace('.py', '')
        module = importlib.import_module(module_name)
    else:
        print(f'Usage: {sys.argv[0]} <vector-module-to-test>')
    
    if module is None:
        print('Running test with built-in `complex`')
        cls = complex
    else:
        fmt = 'Selected Vector2d type: {.__name__}.{.__name__}'
        print(fmt.format(module, module.Vector2d))
        cls = module.Vector2d
    
    mem_init = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    print(f'Creating {NUM_VECTORS:,} {cls.__qualname__!r} instances')
    
    vectors = [cls(3.0, 4.0) for i in range(NUM_VECTORS)]
    
    mem_final = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    print(f'Initial RAM usage: {mem_init:14,}')
    print(f'  Final RAM usage: {mem_final:14,}')
    

    Essentially, the slot version uses half as much RAM and is faster.

  • Summarizing the Issues with slots

    1. remember to redeclare __slots__ in each subclass to prevent their instances from having __dict__

    2. the instance slots are binded ONLY at the time when the class is defined

    3. classes can’t use @cached_property decorator unless the slot attributes includes '__dict__'

    4. class instances can’t be targets of weak references unless we add '__weakref__' in the slots.

Overriding Class Attributes

  • a common use case for class attributes is to provide default values for instance attributes

  • what happens is that self.typecode gets default resolved by treating it as a class attribute (in the case that there’s no such instance attribute).

  • If you want to change a class attribute, you must set it on the class directly, not through an instance.

    So do Vector2d.typecode = 'f' instead of doing v1.typecode = 'f'

  • the more idiomatic way is to subclass just to customise a class data attribute

    subclass to customise

Chapter Summary

  • we can add in more control to our code (adding getters, keeping attributes immutable) on a need-basis since the consumers / places the class will be referred from will not change.

    So we can stick to public attributes first.

    Implementing setters and getters up-front is a distraction from the goal. In Python, we can simply use public attributes, knowing we can change them to properties later, if the need arises.

  • extra: in java’s enforced privacy idioms, the privacy is only really enforced if there’s a Java Security Manager that is configured and deployed on top of the java app.

    this is because the reflections API (introspection tooling for Java) can be used to read the value of a private field.

Further Reading

Chapter 12. Special Methods for Sequences

  • Objectives for this chapter:

    1. Make the toy vector implementation behave like a standard Python immutable flat sequence. with float elements

    2. supports the following:

      1. basic sequence protocol __len__ and ___getitem__

      2. safe representation of instances with many items

      3. slicing supported properly

      4. aggregate hashing that considers every element

      5. custom formatting extensions

  • Useful TRICKS:

    • we can get the class of an instance by doing cls = type(self)

What’s New in This Chapter

Vector: A User-Defined Sequence Type

  • Vector space benefits: use cases of vectors > 3 dims:

    • for implementing N-dimensional vectors for info retrieval
    • for vector space models, cosine similarity is usually the key metric for relevance.
  • the takes on the vector implementation behaviour are not mutually exclusive, they build on each other

Vector Take #1: Vector2d Compatible

  • the best practice for a sequence constructor is to take the data as an iterable argument in the constructor, like all built-in sequence types do.

  • remember the goal for a good implementation of __repr__ is that it should give serviceable output such that a user can have a chance of identifying the receiver (self).

  • the reprlib.repr() can be used to get a limited-lenght representation

Protocols and Duck Typing

  • Protocols:
    • context of object-oriented programming, a protocol is an informal interface, defined only in documentation and not in code.

    • it’s ONLY a typing/tooling construct for static analysis, it supports structural subtyping / static duck-typing.

    • we can partially implement part of a protocol if we wish, depending on the contextual requirements

    • there’s 2 kinds of protocols:

      • static protocols

        Definition:

        Static protocols in Python refer to protocol classes (from typing.Protocol) that exist solely for static type analysis during development—they have no effect at runtime unless specially marked.

        Purpose:

        To provide interfaces that static type checkers (like mypy or Pyright) can use for verifying whether an object “matches” a required set of methods/attributes, regardless of explicit inheritance.

        Behavior:

        A class matches a static protocol if it provides ALL required methods/attributes (matching names and type signatures).

        There is no runtime enforcement or validation by default—type conformance is only checked when tools like mypy analyze your code.

        Classes do not need to inherit from the protocol to be considered as conforming to it for static analysis

        Use case:

        Ensuring that different objects used in a function provide a required interface (“static duck typing”), enabling type-safe polymorphism and generic programming.

        1
        2
        3
        4
        5
        6
        
              from typing import Protocol
        
              class SupportsClose(Protocol):
                  def close(self) -> None: ...
        
              # Any class with a .close() method matches SupportsClose for type checking
        
      • dynamic protocols Definition:

        Dynamic protocols are protocol classes designed to support runtime checking of protocol conformance, in addition to static analysis.

        Purpose:

        To enable both static type checking and runtime assertions that an object supports a given protocol interface.

        How:

        Achieved by decorating the protocol class with @typing.runtime_checkable

        Behavior:

        At runtime, you can use isinstance(obj, ProtocolClass) to check if an object supports the protocol (i.e., implements the required methods/attributes).

        The protocol still does not require explicit inheritance—conformance is structural.

        1
        2
        3
        4
        5
        6
        7
        8
        
              from typing import Protocol, runtime_checkable
        
              @runtime_checkable
              class SupportsClose(Protocol):
                  def close(self) -> None: ...
        
              obj = open("file.txt")
              isinstance(obj, SupportsClose)  # True if .close() exists with correct signature
        

Vector Take #2: A Sliceable Sequence

  • delegation is an easy way to support the protocol.

    Have to ensure that the types don’t change for the ones that are supposed to return our custom type, the example being used here is for slice functionality, it’s in these instances that we can’t just use delegation and have to explicitly handle it.

  • How Slicing Works

    • some observations on how slicing is handled:
      1. the accessor s[1:5] returns a slice object

      2. we can have multiple slices in our accessing if we do something like s[1:5, 8:10] and we’ll get something like this:

        (slice(1, 5, None), slice(8, 10, None))

        from which we conclude:

        1. it’s a tuple (of slice objects) that is being returned

        2. the tuple may return multiple slice objects

      3. slice is a builtin type, with attrs start, stop, step and indices

        we found this by doing dir(slice)

        • indices exposes the tricky logic that’s implemented in the built-in sequences to gracefully handle missing or negative indices and slices that are longer than the original sequence. This method produces “normalized” tuples of non-negative start, stop, and stride integers tailored to a sequence of the given length.

          NOTE: we don’t need to implement this for the vector example here because we’ll be delegating it to the _components array

  • A Slice-Aware __getitem__

    • to make Vector behave as a sequence, we need __len__ and __getitem__

      both are essential to handle slicing correctly

      There’s 2 cases to handle:

      1. case 1: we’re accessing via a slice

        in this case, we have to extract out the class and then build another Vector instance from the slice of the components array.

        this is what allows us to properly return Vector classes on sliced accesses.

      2. case 2: we’re accessing via a single index

        then we can extract out the index from the key using operator.index(key)

        operator.index() function calls the __index__ special method. The function and the special method. It’s defined in this PEP 357

        it’s different from int in the sense that operator.index() will return a TypeError for non-int arguments supplied as an attempt to access an index.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      
        def __len__(self):
                return len(self._components)
      
        def __getitem__(self, key):
                # case 1: we're accessing via a slice
                if isinstance(key, slice):
                        cls = type(self)
                        return cls(self._components[key])
      
                # case 2: we're accessing via a single index
                index = operator.index(key)
                return self._components[index]
      

Vector Take #3: Dynamic Attribute Access

  • the __getattr__ is the fallback function if a name is not found within the various hierarchy graphs (not in instance, not in class, not in inheritance graph)

    KIV part 4 of the textbook for more info on attribute lookups

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
      __match_args__ = ('x', 'y', 'z', 't') # allows positional pattern matching
    
      def __getattr__(self, name):
              cls = type(self)
              try:
                      pos = cls.__match_args__.index(name)
              except ValueError:
                      pos = -1
    
              if 0 <= pos < len(self._components):
                      return self._components[pos]
    
              msg = f'{cls.__name__!r} object has no attribute {name!r}'
              raise AttributeError(msg)
    
  • GOTCHA: since __getattr__ is a fallback, the following snippet behaves inaccurately

    this is because when we do the v.x, it gets accessed to a new attribute called v.x within instance v. Therefore, the name resolution never gets done by the fallback (__getattr__)

    The implementation for __getattr__ also doesn’t account for such names

    \(\implies\) we implement __setattr__ because the problem here is in the attribute setting, that’s not behaving properly here.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
      def __setattr__(self, name, value):
              cls = type(self)
              if len(name) == 1:
                      if name in cls.__match_args__:
                              error = 'readonly attribute {attr_name!r}'
                      elif name.islower():
                              error = "can't set attributes 'a' to 'z' in {cls_name!r}"
                      else:
                              error = ''
                      if error:
                              msg = error.format(cls_name=cls.__name__, attr_name=name)
                              raise AttributeError(msg)
    
              # default: use the superclass's __setattr__
              super().__setattr__(name, value)
    

    For this example, we want the x and y to be readonly, that’s why we’re throwing attribute errors.

  • NOTE: usually getters and setters come together to ensure some consistency in the use of the objects.

    here, we had to implement both __getattr__ and __setattr__

  • NOTE: we shouldn’t use __slots__ as a shortcut to prevent instance attribute creation, they should be used only to save memory, when needed. In this case, we prevent readonly attribute overwrites by implementing the __setattr__ properly that handles this.

Vector Take #4: Hashing and a Faster ==

  • implementing the hash function that is performant

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
      import functools
      import operator
    
      def __eq__(self, other):
              return tuple(self) == tuple(other)
    
      def __hash__(self):
              # NOTE: use generator here for lazy operations.
              hashes = (hash(x) for x in self._components)
              return functools.reduce(operator.xor, hashes, 0)
    

    alternatively, hash could have been implemented as:

    1
    2
    3
    
              def __hash__(self):
                      hashes = map(hash, self._components)
                      return functools.reduce(operator.xor, hashes)
    
    • so the fast hash here can use an XOR:

      functools.reduce(lambda a, b: a ^ b, range(n))

      or using operator.xor like so functools.reduce(operator.xor, range(n))

    • interesting: we can see the initializer ALSo as a value to return on empty sequence (in addition to the usual “first argument in the reducing loop”).

      for +, |, ^ the initializer should be 0, but for *, & it should be 1.

    • TO_HABIT: remember that operator provides the functionality of all Python infix operators in function form, so using it will prevent custom lambda definitions

    • TO_HABIT: using functools.reduce for the fast compute of a hash with huge number of components is a good use case for using reduce.

  • improving the performance of __eq__ doing the tuple conversion will be expensive for large vectors.

    the better implementation reminds me of Java style:

    1
    2
    3
    4
    5
    6
    7
    
      def __eq__(self, other):
              if len(self) != len(other):
                      return False
              for a, b in zip(self, other):
                      if a != b:
                              return False
              return True
    

    a one liner:

    1
    2
    
      def __eq__(self, other):
              return len(self) == len(other) and all(a == b for a, b in zip(self, other))
    

Vector Take #5: Formatting

Chapter Summary

So this is the final code, vector_v5.py:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
"""
A multidimensional ``Vector`` class, take 5

A ``Vector`` is built from an iterable of numbers:
>>> Vector([3.1, 4.2])
Vector([3.1, 4.2])
>>> Vector((3, 4, 5))
Vector([3.0, 4.0, 5.0])
>>> Vector(range(10))
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])

Tests with two dimensions (same results as ``vector2d_v1.py``):
>>> v1 = Vector([3, 4])
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector([3.0, 4.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector([0, 0]))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0])
>>> v1 == v1_clone
True

Tests with three dimensions:
>>> v1 = Vector([3, 4, 5])
>>> x, y, z = v1
>>> x, y, z
(3.0, 4.0, 5.0)
>>> v1
Vector([3.0, 4.0, 5.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0, 5.0)
>>> abs(v1) # doctest:+ELLIPSIS
7.071067811...
>>> bool(v1), bool(Vector([0, 0, 0]))
(True, False)

Tests with many dimensions:
>>> v7 = Vector(range(7))
>>> v7
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])
>>> abs(v7) # doctest:+ELLIPSIS
9.53939201...

Test of ``.__bytes__`` and ``.frombytes()`` methods:
>>> v1 = Vector([3, 4, 5])
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0, 5.0])
>>> v1 == v1_clone
True

Tests of sequence behavior:
>>> v1 = Vector([3, 4, 5])
>>> len(v1)
3
>>> v1[0], v1[len(v1)-1], v1[-1]
(3.0, 5.0, 5.0)

Test of slicing:
>>> v7 = Vector(range(7))
>>> v7[-1]
6.0
>>> v7[1:4]
Vector([1.0, 2.0, 3.0])
>>> v7[-1:]
Vector([6.0])
>>> v7[1,2]
Traceback (most recent call last):
...
TypeError: 'tuple' object cannot be interpreted as an integer

Tests of dynamic attribute access:
>>> v7 = Vector(range(10))
>>> v7.x
0.0
>>> v7.y, v7.z, v7.t
(1.0, 2.0, 3.0)

Dynamic attribute lookup failures:
>>> v7.k
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'k'
>>> v3 = Vector(range(3))
>>> v3.t
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 't'
>>> v3.spam
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'spam'

Tests of hashing:
>>> v1 = Vector([3, 4])
>>> v2 = Vector([3.1, 4.2])
>>> v3 = Vector([3, 4, 5])
>>> v6 = Vector(range(6))
>>> hash(v1), hash(v3), hash(v6)
(7, 2, 1)

Most hash codes of non-integers vary from a 32-bit to 64-bit CPython build:
>>> import sys
>>> hash(v2) == (384307168202284039 if sys.maxsize > 2**32 else 357915986)
True

Tests of ``format()`` with Cartesian coordinates in 2D:
>>> v1 = Vector([3, 4])
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of ``format()`` with Cartesian coordinates in 3D and 7D:
>>> v3 = Vector([3, 4, 5])
>>> format(v3)
'(3.0, 4.0, 5.0)'
>>> format(Vector(range(7)))
'(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0)'

Tests of ``format()`` with spherical coordinates in 2D, 3D and 4D:
>>> format(Vector([1, 1]), 'h') # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector([1, 1]), '.3eh')
'<1.414e+00, 7.854e-01>'
>>> format(Vector([1, 1]), '0.5fh')
'<1.41421, 0.78540>'
>>> format(Vector([1, 1, 1]), 'h') # doctest:+ELLIPSIS
'<1.73205..., 0.95531..., 0.78539...>'
>>> format(Vector([2, 2, 2]), '.3eh')
'<3.464e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 0, 0]), '0.5fh')
'<0.00000, 0.00000, 0.00000>'
>>> format(Vector([-1, -1, -1, -1]), 'h') # doctest:+ELLIPSIS
'<2.0, 2.09439..., 2.18627..., 3.92699...>'
>>> format(Vector([2, 2, 2, 2]), '.3eh')
'<4.000e+00, 1.047e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 1, 0, 0]), '0.5fh')
'<1.00000, 1.57080, 0.00000, 0.00000>'
"""

from array import array
import reprlib
import math
import functools
import operator
import itertools

class Vector:
    typecode = 'd'
    __match_args__ = ('x', 'y', 'z', 't')

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find('['):-1]
        return f'Vector({components})'

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __eq__(self, other):
        return (len(self) == len(other) and
                all(a == b for a, b in zip(self, other)))

    def __hash__(self):
        hashes = (hash(x) for x in self)
        return functools.reduce(operator.xor, hashes, 0)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
        if 0 <= pos < len(self._components):
            return self._components[pos]
        msg = f"{cls.__name__!r} object has no attribute {name!r}"
        raise AttributeError(msg)

    def angle(self, n):
        r = math.hypot(*self[n:])
        a = math.atan2(r, self[n-1])
        if (n == len(self) - 1) and (self[-1] < 0):
            return math.pi * 2 - a
        else:
            return a

    def angles(self):
        return (self.angle(n) for n in range(1, len(self)))

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('h'):  # hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = itertools.chain([abs(self)], self.angles())
            outer_fmt = '<{}>'
        else:
            coords = self
            outer_fmt = '({})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(', '.join(components))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)
  1. uses itertools.chain for the __format__ function
  2. KIV the generator tricks until chapter 17

Further Reading

  • reduce has other names in the CS world!

    The powerful reduce higher-order function is also known as fold, accumulate, aggregate, compress, and inject.

    See the wiki link.

  • you can often tell when a protocol is being discussed when you see language like “a file-like object.” This is a quick way of saying “something that behaves sufficiently like a file, by implementing the parts of the file interface that are relevant in the context.”

  • it’s not sloppy to implement a protocol partially (for dynamic protocols)

    When implementing a class that emulates any built-in type, it is important that the emulation only be implemented to the degree that it makes sense for the object being modeled. For example, some sequences may work well with retrieval of individual elements, but extracting a slice may not make sense.

    this KISS-es it.

    for more strictness, we can make it a static protocol wherein everything needs to be implemented

Chapter 13. Interfaces, Protocols, and ABCs

  • python has 4 ways to define and use interfaces:

    1. Duck typing

    2. goose typing: using ABCs

      ^ focus of this chapter

    3. static typing: traditional static typing using the typing module

    4. static duck typing

      popularised by GoLang, supported by typing.Protocol

  • this chapter is about the typing that revolves around interfaces.

The Typing Map

The two dimensions introduced here:

  1. runtime vs static checking

  2. structural (based on method’s provided by the object) vs nominal (based on the name of its class/superclass)

What’s New in This Chapter

Two Kinds of Protocols

In both cases, we don’t need to do any sort of explicit registration for the protocol (or to use inheritance).

  • Dynamic Protocol

    Implicit, defined by convention as per documentation.

    A good example is the protocols within the interpreter, seen in the “Data Model” of the language ref. e.g. Sequence, Iterable

    Can’t be verified by type checkers

  • Static Protocol

    An explicit definition as a subclass of typing.Protocol

  • ABCs ca n be used to define an explicit interface (similar in outcome to static protocols).

Programming Ducks

  • Python Digs Sequences

    • this is pretty cool: Python manages

    to make iteration and the in operator work by invoking __getitem__ when __iter__ and __contains__ are unavailable.

    The interpreter uses special methods (__getitem__, __iter__, etc.) dynamically to support iteration and membership tests even without explicit ABCs. This is a classic Python dynamic protocol idiom.

  • Monkey Patching: Implementing a Protocol at Runtime

    • Monkey patching is dynamically changing a module, class, or function at runtime, to add features or fix bugs.

    • in this example, we want a custom class to automatically work with random.shuffle() so that we can shuffle that sequence.

      We inspect random.shuffle() and figure out what it’s underlying functionality is, which is to rely on the __setitem__ function.

      So we can monkey patch the __setitem__ and we can achieve our desired outcome. This means that we change the module @ runtime.

    • Monkey patching is powerful, but the code that does the actual patching is very tightly coupled with the program to be patched, often handling private and undocumented attributes.

    • Python does not let you monkey patch the built-in types. I actually consider this an advantage, because you can be certain that a str object will always have those same methods. This limitation reduces the chance that external libraries apply conflicting patches.

  • Defensive Programming and “Fail Fast”

    TO_HABIT: the examples here show how to do a check by checking whether it can behave like a duck instead of checking whether it’s a duck. This is a superior way of doing meaningful type checks in my opinion but there’s some possible pitfalls into doing so.

    • we want to be able to detect dynamic protocols without explicit checks

    • Failing fast means raising runtime errors as soon as possible, for example, rejecting invalid arguments right a the beginning of a function body.

    • Duck type checking means we should check behaviour instead of doing explicit typechecks.

    • Some patterns:

      1. IDIOM: use a builtin function instead of doing type-checking \(\implies\) check for method presense

        • in the example, to check if the input arg is a list, instead of doing a type check at runtime, it’s suggested to use the list() constructor because that constructor will handle any iterable that fits in memory. Naturally, this copies the data.

        • If we can’t accept copying, then we can do runtime check using isinstance(x, abc.MutableSequence)

        • warning: what if infinite generator?

          eliminate that by calling len() on the arg, tuples, arrs and such will still pass this check

      2. Defensive code leveraging duck types can also include logic to handle different types without using isinstance() or hasattr() tests.

        • suppose we want to type hint that “field_names must be a string of identifiers separated by spaces or commas”,

          then our check could do something like this:

           1
           2
           3
           4
           5
           6
           7
           8
           9
          10
          
                 Example 13-5. Duck typing to handle a string or an iterable of strings
                 try: # this is an attempt, assumes that it's a string
                         field_names = field_names.replace(',', ' ').split()
                 except AttributeError:
                         pass # if not string, then can't continue testing, just pass it
          
                 # converting to a tuple ensures that it's iterable and we test our own copy of it (to prevent accidentally changing the input)
                 field_names = tuple(field_names)
                 if not all(s.isidentifier() for s in field_names):
                         raise ValueError('field_names must all be valid identifiers')
          

          This is an expressive form of using duck typing to our advantage for type checking.

Goose Typing

  • ABCs help to define interfaces for explicit type checking at runtime (and also work for static type checking).
    • complement duck typing

    • introduce virtual subclasses:

      • classes that don’t inherit from a class but are still recognized by isinstance() and issubclass()
  • Waterfowl and ABCs

    • the strong analogy of duck typing to actual phenetics (i.e. phenotype-based) classification is great, mimics how we do duck typing (based on shape and behaviour)

    • how important is the explicit type checking depends on the usage-context of an object

    • parallel objects can produce similar traits and this is the case where we may have false positives on the classifications

    • that’s why we need a more “explicit” way of typechecking and that’s where “goose typing” comes into the picture.

    • python’s ABCs provide the register class-method which lets us “declare” that a certain class becomes a “virtual” subclass of an ABC (meets name, signature and semantic contract requirements)

      we can declare this even if the class need not have been developed with any awareness of the ABC (and wouldn’t have inherited from it) \(\implies\) this is structural subtyping with ABCs where the structure is sufficient

    • registration can be implicit (without us needing to register custom classes), just have to implement the special methods.

    • key advice:
      1. When implementing a class that represents a concept from the standard library’s ABCs (e.g., Sequence, Mapping, Number, etc.):

        Explicitly inherit from or register with the appropriate ABC if your class fits the contract of that ABC.

        This helps make your class reliably compatible with tools, libraries, or Python code that expects these standard interfaces.

      2. If a library or framework you use defines classes but omits to formally subclass/register with the standard ABCs:

        Perform the ABC registration yourself at program startup (e.g., by manually registering the class with collections.abc.Sequence).

        This will ensure isinstance(obj, collections.abc.Sequence) checks work as intended, improving reliability and interoperability.

      3. When checking if an object matches a conceptual interface (such as being a “sequence”):

        Use isinstance(the_arg, collections.abc.Sequence) rather than checking for method presence (duck typing) or relying on type names.

        This is more future-proof and integrates with Python’s built-in and third-party tools.

      4. Avoid defining your own custom ABCs or metaclasses in production code:

        These advanced features are often overused and can lead to unnecessarily complex, harder-to-maintain code.

        The author likens custom ABCs/metaclasses to a “shiny new hammer”: appealing when you first learn them, but prone to misuse (“all problems look like a nail”).

        Emphasizes sticking to straightforward, simple code for better maintainability and happiness for you and future developers.

  • Tension between Duck Typing and Goose Typing

    They are a continuum, not an either/or. Use duck typing for everyday, flexible code, and goose typing (ABCs and explicit interface contracts) where precision, reliability, and maintainability matter (public APIs, reusable libraries, systems programming).

    See richer elaboration here:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    
    You've highlighted a key *tension* in Python between **duck typing** ("if it quacks like a duck, treat it as a duck") and **ABC-based explicit/virtual interface type checking** ("goose typing," using `isinstance` with domain-driven interfaces made explicit via ABCs). Let’s clarify how these approaches relate and when to use each.
    
    ### Duck Typing (Classic Pythonic Style)
    
    **Principle**:
    *Check for behavior, not type.*
    If you expect an object to behave like a sequence, just try to use it as such (e.g., iterate over it, index into it). If it fails at runtime, raise an error immediately (“fail fast”).
    
    **Code example:**
    ```python
    def head(seq):
        return seq[0]  # works for any object supporting indexing
    ```
    
    - If `seq` supports indexing (`__getitem__`), this just works.
    - This is *pure duck typing*—no explicit type checks or interface enforcement.
    
    ### Goose Typing: ABCs and Virtual Subclassing
    
    **Principle**:
    Use Python’s Abstract Base Classes (`collections.abc.Sequence`, `MutableMapping`, etc.) to define *explicit interfaces* for families of types (not just classes).
    
    - You can *register* a class as a "virtual subclass" of an ABC, even if it doesn't inherit from it.
    - Now, `isinstance(obj, collections.abc.Sequence)` will treat your class as a Sequence.
    
    **Benefits:**
    - Enables more robust and explicit type checks.
    - Useful when interoperating with other libraries, for documentation, for static-checking tools, or when you want to *fail fast* with a clear, predictable check.
    - Python’s standard library and third-party tools often use `isinstance(x, ABC)` checks internally, so registering your custom class improves **interoperability**.
    
    ### How do these approaches relate?
    
    - **Duck typing**: Write code that operates on any object with the required behaviors (methods/attributes), with no formal interface contract.
        - Pro: maximal flexibility; principle of least surprise for Pythonistas.
        - Con: can result in runtime errors that are sometimes hard to debug or anticipate.
    
    - **Goose typing**: Explicitly document and/or enforce the “duck” shape via ABCs—*preferably standard library ABCs*—and use `isinstance` checks for run-time or static confirmation.
        - Con: Slightly more rigid, but much clearer and more future-proof if you want predictable, explicit interfaces.
    
    **In modern large-scale or library codebases, combining duck typing for flexibility with ABCs for explicit interface definition (“goose typing”) gives you both power and predictability.**
    
    ### Practical Guidance from Your Notes
    
    - Use **duck typing** by default; just try the operation and handle exceptions—this is flexible, concise, and Pythonic.
    - For situations where:
        - You need to *fail fast*,
        - You want to document and defend API boundaries,
        - You’re interacting with other libraries/frameworks using ABCs,
    
      **it’s recommended to explicitly register your classes with appropriate ABCs and use `isinstance` checks**.
    
    - This isn't an either/or—it's a continuum. Use *duck typing* for everyday, flexible code, and *goose typing* (ABCs and explicit interface contracts) where precision, reliability, and maintainability matter (public APIs, reusable libraries, systems programming).
    
    ### Metaphor Recap
    
    - **Duck typing**: "Does it walk and quack like a duck? Use it as a duck."
    - **Goose typing**: "Let's agree on what waterfowl are. If you’re a duck, get your name on the official registry, so everyone knows for sure and tools can verify it."
    
    ### References
    - [Python's isinstance docs][1]
    - [Structural subtyping via ABCs and the register method][11]
    
    **Summary:**
    - *"Duck typing"* means acting on behaviors and handling failures.
    - *"Goose typing"* means using ABCs for explicit, reliable interface checks (`isinstance` and virtual subclassing).
    - Use each to the degree that your problem, API guarantees, and interoperability demand. Both are core to Pythonic code.
    
    Let me know if you want idiomatic examples for when to use each or more detailed pros/cons for specific scenarios!
    
    [1] https://realpython.com/what-does-isinstance-do-in-python/
    [2] https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance
    [3] https://ioflood.com/blog/python-isinstance-function-guide-with-examples/
    [4] https://switowski.com/blog/type-vs-isinstance/
    [5] https://www.reddit.com/r/learnpython/comments/hg6fyd/do_not_compare_types_use_isinstance_instead/
    [6] https://github.com/python/typing/issues/1363
    [7] https://stackoverflow.com/questions/3111611/is-this-use-of-isinstance-pythonic-good
    [8] https://realpython.com/python-type-checking/
    [9] https://www.curiousefficiency.org/posts/2004/12/type-checking-in-python/
    [10] https://typing.python.org/en/latest/guides/type_narrowing.html
    [11] https://peps.python.org/pep-0544/
    
  • Subclassing an ABC

    This is about using goose-typing in practice.

    • we can use the subclassing as a benchmark for this section on subclassing:
      • needed to adhere strictly to the interface.

        therefore there’s a need to implement concrete versions of all the abstract methods defined in the abc.

      • concrete methods are implemented in terms of the public interface of the class, so it’s possible for us to subclass without any knowledge of the internal structure of the instances.

  • ABCs in the Standard Library

    • some places we can find useful ABCs: collections.abc module (most widely used), io package, numbers package

    • from collections.abc

      NOTE: photo is outdated, from python 3.6 Sequence, Mapping and Set are subclassed from Collection, which is a child of Iterable, Container, Sized

      Remember that each of the immutable collections have a mutable subclass.

      • if insinstance(obj, Hashable) returns False, you can be certain that obj is not hashable. But if the return is True, it may be a false positive.

        also for isinstance(obj, Iterable), we might have false negatives. This is because Python may stil be able to iterate over obj using __getitem__

      • TO_HABIT: duck typing is the most accurate way to determine if an instance is hashable/iterable: if we just call hash(obj) / iter(obj)

  • Defining and Using an ABC

    • this is only for learning purposes, we should avoid implementing our own ABCs and metaclasses.

      A good usecase for ABCs, descriptors, metaclasses are for building frameworks.

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      
        # tag::TOMBOLA_ABC[]
      
        import abc
      
        class Tombola(abc.ABC):  # <1> subclass abc.ABC to define an ABC
      
            @abc.abstractmethod
            def load(self, iterable):  # <2> use this decorator, keep the body empty, can include in docstring
                """Add items from an iterable."""
      
            @abc.abstractmethod
            def pick(self):  # <3>
                """Remove item at random, returning it.
      
                This method should raise `LookupError` when the instance is empty.
                """
      
            def loaded(self):  # <4> ABC may include concrete methods.
                """Return `True` if there's at least 1 item, `False` otherwise."""
                return bool(self.inspect())  # <5>
      
            def inspect(self):
                """Return a sorted tuple with the items currently inside."""
                items = []
                while True:  # <6>
                    try:
                        items.append(self.pick())
                    except LookupError:
                        break
                self.load(items)  # <7>
                return tuple(items)
        # end::TOMBOLA_ABC[]
      

      some observations:

      1. since this is abstract, we can’t know what the concrete subclasses will actually use for the implementation \(\implies\) we end up trying to use the other abstract functions more so than assuming things.

        it’s OK to provide concrete methods in ABCs, as long as they only depend on other methods in the interface.

        For example, for inspect, we use the abstract pick function and load to return it to the original state.

      2. Before ABCs existed, abstract methods would raise NotImplementedError to signal that subclasses were responsible for their implementation.

      3. NOTE: an @abstractmethod method can have a base implementation. The subclass will still need to override it but the subclass will also be able to access it using super() and build onto / directly use the super functionality.

      4. LANG_LIMITATION: there’s no formal method for “adding” expected error types (exceptions) in Python Interfaces. No exception contracts.

        Docs are the only practical way to make expected exceptions explicit in Python interfaces today. Writing thoughtful docstrings and custom exception classes is the accepted best practice. ABCs enforce method presence not exception contracts.

        more elaboration here:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        
             When it comes to specifying or "adding" expected error types (exceptions) in Python interfaces like abstract base classes (ABCs) or general functions, **the language itself provides no formal mechanism** to declare which exceptions a method or function should raise, unlike some statically typed languages that have checked exceptions.
        
             ### How do we communicate expected error types in Python then?
        
             1. **Documentation is the de facto standard for specifying expected exceptions**
        ​        - Docstrings are the primary place to declare what errors a method can raise. This is how Python developers indicate usage interface contracts including possible exceptions.
        ​        - For example:
        
                  ```python
                  def divide(x, y):
                      """
                      Divide x by y.
        
                      Raises:
                          ZeroDivisionError: If y is zero.
                          TypeError: If inputs are not numbers.
                      """
                      return x / y
                  ```
        
             2. **ABCs and raising `NotImplementedError` for abstract methods**
        ​        - When defining abstract methods in ABCs, it is common to raise `NotImplementedError` to indicate subclasses *must* implement that method.
        ​        - This is the only *exception-related interface* that ABCs imply formally in code.
        
             3. **Static typing tools (e.g., MyPy) do not check for exceptions raised**
        ​        - Current Python type checkers mostly ignore exception flow or explicitly declared exceptions. There is no built-in or standard way to express exception contracts in type hints.
        
             4. **Custom exception classes for domain-specific errors**
        ​        - For clarity and maintainability, if your interface or library can raise expected errors, you should define and document custom exception classes.
        ​        - You communicate the valid exceptions by naming them in documentation and/or user guides.
        
             ### Why does Python not have explicit exception declarations on interfaces?
        
             - Python follows an **EAFP (Easier to Ask Forgiveness than Permission)** ethos.
        ​     - Explicit exception declarations would add verbosity and complexity.
        ​     - Pythonic style encourages **handling exceptions where you can recover** and letting others propagate upward naturally.
        ​     - **Fail-fast philosophy** encourages letting the system raise unexpected exceptions during development and handling them appropriately in higher layers.
        
             ### Summary table:
        
             | Approach               | Mechanism                    | Remarks                                         |
             |------------------------|------------------------------|------------------------------------------------|
             | Expected exceptions    | Documented in docstrings      | Widely accepted convention                      |
             | ABC interface contract | Raise `NotImplementedError`  | Defines required implementations, not errors raised in general |
             | Static typing          | No standard exception syntax | No checked exceptions like in Java, C#         |
             | Custom exceptions      | Define exception classes     | Clarifies error types, improves maintainability|
             | Runtime enforcement    | Try/except handlers          | Handle errors where recovery/alternative is feasible |
        
             ### Additional notes:
        
             - If you want to **make expected exceptions more discoverable**, consider tools that generate API docs (Sphinx, pdoc) that especially call out `:raises:` sections in your docstrings.
        ​     - In complex frameworks, **middleware or wrapper layers** may catch and re-raise or convert exceptions for clearer error handling without explicit declaration in the interface.
        ​     - Some third-party libraries or custom frameworks might support more formal error policies (contracts), but this is not core Python.
        
             **In essence:**
             **Docs are the only practical way to make expected exceptions explicit in Python interfaces today.** Writing thoughtful docstrings and custom exception classes is the accepted best practice. ABCs enforce method presence **not** exception contracts.
        
             If you want, I can help you draft a template for documenting expected exceptions clearly in your Python APIs.
        
             [1] https://docs.python.org/3/library/exceptions.html
             [2] https://realpython.com/python-built-in-exceptions/
             [3] https://stackoverflow.com/questions/57658862/making-an-abstract-base-class-that-inherits-from-exception
             [4] https://docs.python.org/3/library/abc.html
             [5] https://mypy.readthedocs.io/en/stable/error_code_list.html
             [6] https://labex.io/tutorials/python-how-to-handle-abstract-method-exceptions-437221
             [7] https://blog.sentry.io/practical-tips-on-handling-errors-and-exceptions-in-python/
             [8] https://accuweb.cloud/resource/articles/explain-python-valueerror-exception-handling-with-examples
        
  • ABC Syntax Details

    • we used to have the other abstract decorators: @abstractclassmethod, @abstractstaticmethod, @abstractproperty but they’re deprecated now because we can decorator stack

    • when decorator stacking, @abc.abstractmethod MUST be the innermost decorator

      the order of decorators matter.

      1
      2
      3
      4
      5
      
        class MyABC(abc.ABC):
                @classmethod
                @abc.abstractmethod
                def an_abstract_classmethod(cls, ...):
                        pass
      
  • Subclassing an ABC

    • delegation of functions (e.g. init delegates to another ABC’s functions) seems to be a good idea to keep the code consistent

    • whether to override the concrete implementations from the ABC is our choice to make

  • A Virtual Subclass of an ABC

    Here’s an example of a subclass:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    from random import randrange
    
    from tombola import Tombola
    
    @Tombola.register  # <1> NOTE: being used as a decorator here, could have been a plain function invocation as well
    class TomboList(list):  # <2>
    
        def pick(self):
            if self:  # <3>
                position = randrange(len(self))
                return self.pop(position)  # <4>
            else:
                raise LookupError('pop from empty TomboList')
    
        load = list.extend  # <5>
    
        def loaded(self):
            return bool(self)  # <6>
    
        def inspect(self):
            return tuple(self)
    
    # Tombola.register(TomboList)  # <7>
    
    • it’s a “trust me bro” but if we lie, we still get caught by the usual runtime exceptions

    • issubclass and isinstance will work but there’s no real inheritance of any methods or attributes from the ABC

      • this happens because inheritance is guided by the __mro__ class attribute ( for method resolution order ) and in this case, only “real” superclasses exist in the __mro__
    • syntax:

      • usually a plain function invocation, can be done in a decorator style as well

        Tombola.register(TomboList) function invocation style (called after the class definition)

        @Tombola.register (decorator style)

  • Usage of register in Practice
  • Structural Typing with ABCs

    • typically we use nominal typing for ABCs. it happens when we have an explicit inheritance, which registers a class with its parent and this links the name of the parent to the sub class and that’s how at runtime, we can do issubclass checks.

    • Dynamic and Static Duck Typing are two approaches to static typing

      • we can do consistent-with structural subtyping as well if the class implements the methods defined in the type

      • this works because parent subclass (abc.Sized) implements a special class method named __subclasshook__. The __subclasshook__ for Sized checks whether the class argument has an attribute named __len__

        this is the implementaion within ABCMeta

        1
        2
        3
        4
        5
        6
        
            @classmethod
            def __subclasshook__(cls, C):
                if cls is Sized:
                    if any("__len__" in B.__dict__ for B in C.__mro__):
                            return True
                return NotImplemented
        
      • we shouldn’t add the hook for our custom functions. It’s not dependable to rely on this implicit behaviour.

Static Protocols

  • The Typed double Function

    • duck typing allows us to write code that is future-compatible!
  • Runtime Checkable Static Protocols aka Dynamic Protocol

    • typing.Protocol can be used for both static and runtime checking

      if we want to use it for runtime checking, then we need to add @runtime_checkable to the protocol definition

      how this works is that typing.Protocol is an ABC and so it supports __subclass__ hook and adding the runtime checkable decorator allows us to make the protocol support isinstance / issubclass checks. Because Protocol inherits from ABC-related machinery, @runtime_checkable allows the __subclasshook__ to behave accordingly for runtime isinstance and issubclass checks.

      NOTE: it’s still checking for consistent-with to check if it’s the same type.

    • caveat: performance/side-effect trade-offs

      Careful if side effects or expensive operations if methods checked by __subclasshook__ have such costs

    • ready to use runtime checkables:

      • check numeric convertibility:
        • typing.SupportsComplex

          1
          2
          3
          4
          5
          6
          7
          8
          
                @runtime_checkable
                class SupportsComplex(Protocol):
                        """An ABC with one abstract method __complex__."""
                        __slots__ = ()
          
                        @abstractmethod
                        def __complex__(self) -> complex:
                                pass
          
          • RECIPE: TO_HABIT: if you want to test whether an object c is a complex or SupportsComplex, you can provide a tuple of types as the second arg to isinstance: isinstance(c, (complex, SupportsComplex))

            I had no idea this was a thing.

            alternatively, we can use the Complex ABC within the numbers module.

            1
            2
            
                    import numbers
                    isinstance(c, numbers.Complex)
            

            type checkers don’t seem to recognise the ABCs within the numbers abc

          • typing.SupportsFloat

    • “Duck Typing is Your Friend”

      Often, ducktyping is the better approach for runtime type checking. WE just try the operations you need to do on the object.

      So in the complex number situation, we have a few approaches we could take:

      • approach: runtime checkable static protocols

        1
        2
        3
        4
        
            if isinstance(o, (complex, SupportsComplex)):
                    # do something that requires `o` to be convertible to complex
            else:
                    raise TypeError('o must be convertible to complex')
        
      • approach: goose typing using numbers.Complex ABC

        1
        2
        3
        4
        
            if isinstance(o, numbers.Complex):
                    # do something with `o`, an instance of `Complex`
            else:
                    raise TypeError('o must be an instance of Complex')
        
      • approach:⭐️ duck typing and the EAFP (Easier to ask for forgiveness principle).

        1
        2
        3
        4
        
            try:
                    c = complex(o)
            except TypeError as exc:
                    raise TypeError('o must be convertible to complex') from exc
        
  • Limitations of Runtime Protocol Checks

    • @ runtime, type hints are ignored, so are isinstance and issubclass checks against static protocols

    • problem: isinstance / issubclass checks only look at the presence or absence of methods, without checking their signatures, much less their type annotations. That would have been too costly.

      this is because that type checking is not just a matter of checking whether the type of x is T: it’s about determining that the type of x is consistent-with T, which may be expensive.

      since they only do this, we can end up getting false positives on these type checks.

  • Supporting a Static Protocol

    • the point below is now deprecated. We can just run it as is.

    • using from __future__ import annotations allows typehints to be stored as strings, without being evaluated at import time, when functions are evaluated.

      so if we were to define the return type as the same class that we’re building, then we would have to use this import else it’s a use-before-definition error.

      this is the postponed evaluation of annotations

  • Designing a Static Protocol

    • trick: single-method protocols make static duck typing more useful and flexible

      After a while, if you realise a more complete protocol is required, then you can combine two or more protocols to define a new one

    • example Here’s the protocol definition, it has a single function

      1
      2
      3
      4
      5
      6
      
        from typing import Protocol, runtime_checkable, Any
      
        @runtime_checkable
        class RandomPicker(Protocol):
            # NOTE the elipsis operator usage
            def pick(self) -> Any: ...
      

    and here are some tests written for it

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    
    import random
    from typing import Any, Iterable, TYPE_CHECKING
    
    from randompick import RandomPicker  # <1>
    
    class SimplePicker:  # <2>
        def __init__(self, items: Iterable) -> None:
            self._items = list(items)
            random.shuffle(self._items)
    
        def pick(self) -> Any:  # <3>
            return self._items.pop()
    
    def test_isinstance() -> None:  # <4>
        popper: RandomPicker = SimplePicker([1])  # <5>
        assert isinstance(popper, RandomPicker)  # <6>
    
    def test_item_type() -> None:  # <7>
        items = [1, 2]
        popper = SimplePicker(items)
        item = popper.pick()
        assert item in items
        if TYPE_CHECKING:
            reveal_type(item)  # <8>
        assert isinstance(item, int)
    
    • observations:
      1. not necessary to import the static protocol to define a class that implements it
  • Best Practices for Protocol Design

    1. Align with Interface Segregation Principle: clients should not be forecd to depend on interfaces they don’t use. This gives the two following advice:

      1. Narrow interfaces (often with a single method) are more useful.

        Ref Martin Fowler post on role interfaces

      2. Client Code Protocols: Good to define the protocol near the “client code” (where it’s being used) instead of a library.

        Useful for extensibility and mock-testing.

    2. Naming:

      just name based on nouns that make sense and is minimalistic, nothing too fancy here.

      • clear concept \(\rightarrow\) plain names (Iterator, Container)

      • gives callback methods \(\rightarrow\) SupportsX e.g. SupportsRead

      • read/write attrs or getter/setter methods \(\rightarrow\) HasX eg. HasItems

    3. Create Minimalistic protocols and extend them later by creating derived protocols

  • Extending a Protocol

    1
    2
    3
    4
    5
    6
    
    from typing import Protocol, runtime_checkable
    from randompick import RandomPicker
    
    @runtime_checkable  # <1> need to reimport, this won't get inherited
    class LoadableRandomPicker(RandomPicker, Protocol):  # <2> have to define the Protocol
        def load(self, Iterable) -> None: ...  # <3> OOP-like, only need to include the extended function, the super protocol's functions will be "inherited"
    
    • instead of adding methods to the original protocol, it’s better to derive a new protocol from it.

      keeps protocols minimal and aligns with Interface Segregation Principle – is really narrow interfaces here.

    • GOTCHA: not entirely the same as inheritance

      • the decorator @runtime_checkable needs to be re-applied

      • in the super class fields, we still need to add Protocol along with the rest of the protocols that we are extending

      • similar to inheritance: the functions being extended will be inherited by the derived class. We only need to indicate the new functions in the derived class.

  • The numbers ABCs and Numeric Protocols

    • Objective: we want to be able to support static type checking, and we want to be able to do this for external libraries that register their types as virtual subclasses of numbers ABCs.

    • Current Approach: use the numeric protocols within typing module

    • numbers.Number has no methods \(\implies\) numeric tower not useful for static type checking (it’s useful for runtime type checking though)

    • GOTCHA: decimal.Decimal is not registered as a virtual subclass of numbers.Real. The reason is that, if you need the precision of Decimal in your program, then you want to be protected from accidental mixing of decimals with floating-point numbers that are less precise.

      because real (floats) are less precise and we don’t wanna interchange with them and have information losses.

    • Takeaways:

      1. The numbers ABCs are fine for runtime type checking, but unsuitable for static typing.

      2. The numeric static protocols SupportsComplex, SupportsFloat, etc. work well for static typing, but are unreliable for runtime type checking when complex numbers are involved.

Chapter Summary

  1. contrasted dynamic protocols (that support duck typing) and static protocols (static duck typing)

    1. for both, just need to implement necessary methods, no explicit registration needed

    2. runtime effect:

      Static protocol no runtime effect.

      Dynamic protocol is runtime checkable. Aka when we @runtime_checkable a static protocol, then it becomes a dynamic protocol.

    3. NOTE: this is a different contrast from Dynamic Duck Typing vs Static Duck typing

      Dynamic Duck typing is the fail fast approach, where we “try and see it”

      Static Duck Typing is the contract based use of Protocols

      This is a subtle but often confusing distinction. Dynamic duck typing is Python’s inherent runtime behavior, while static duck typing reflects the formal contract via protocols at type-checking time

  2. Python interpreter’s support for sequence and iterable dynamic protocols.

    The interpreter uses special methods (__getitem__, __iter__, etc.) dynamically to support iteration and membership tests even without explicit ABCs. This is a classic Python dynamic protocol idiom.

  3. monkey patching: adhering to the protocol @ runtime

  4. defensive programming: detect structural types using try/except and failing fast instead of explicit checks using isinstance or hasattr checks

    IDIOM: This is a widely advocated Python idiom: “EAFP (Easier to Ask Forgiveness than Permission)”,

  5. Goose typing:

    • creating and using ABCs

    • traditional subclassing and registration

    • __subclasshook__ special method as a way for ABCs to support structural typing based on methods that fulfill interface define in the ABCs (without a direct registration)

  6. Static protocols

    • is kind of the sttuctural interface in the python world.

    • @runtime_checkable actually leverages __subclasshoook__ to support structural typing at runtime,

      though the best use of these protocols is with static type checkers.

      type hints make structural typing more reliable.

    • design of static protocol:

      • keep the narrow interface
      • keep the definition near to usage
      • extend it when you need to add functionality; in line with interface segregation principle.
  7. Numbers ABCs and Numeric Protocols:

    • numeric static protocols (e.g . SupportsFloats) has shortcomings
  8. main message of this chapter is that we have four complementary ways of programming with interfaces in modern Python, each with different advantages and drawbacks.

    You are likely to find suitable use cases for each typing scheme in any modern Python codebase of significant size.

    Rejecting any one of these approaches will make your work as a Python programmer harder than it needs to be.

  • Possible Misconceptions

    Adjacent Gotchas and Difficult Concepts You Might Misconstrue or Overlook

    1. Runtime Checking Limits of Dynamic Protocols: Runtime `isinstance` checks with `@runtime_checkable` protocols are limited to checking presence of attributes/methods (using `hasattr` internally) and do not verify method signatures, argument types, or behavior correctness. This can give false positives if method signatures do not match—only static type checkers guarantee that.

    2. `_subclasshook_` Complexity and Pitfalls: While powerful, implementing or overriding `_subclasshook_` can be tricky because it must handle all subclass checks gracefully and correctly, respecting caching and fallback behaviors to avoid subtle bugs. Excessive or ill-considered use may confuse the MRO and class hierarchy assumptions.

    3. Difference Between ABC Registration and Protocol Conformance: Registering a class as a virtual subclass of an ABC influences `isinstance` checks but does not affect static type checking, whereas protocols influence static (and optionally runtime) interface conformance. Bridging these self-consistently in a codebase can sometimes be confusing.

    4. Protocols and Inheritance vs Nominal Typing: Protocols enable structural typing, eschewing nominal inheritance for interface compatibility, but this can lead to subtle type checking behaviors where classes unintentionally conform just by method names, masking incorrect assumptions. This requires developers to design protocols and type hints thoughtfully.

    5. Static Type Checking Requires Adoption of Tooling: The benefits of static protocols are realized only when using type checkers; pure runtime execution won’t enforce protocols unless combined with runtime checkable features. Adoption means introducing additional tooling and some learning curve for teams.

    6. Monkey Patching Risks: While useful at runtime for dynamic protocol adherence, monkey patching comes with maintainability and debugging risks, especially when changing behaviors of widely used or critical classes. It can also mask design flaws if overused.

    7. Difference Between Static and Runtime Failure Modes: Static protocols help catch interface noncompliance early, but dynamic duck typing detects mismatches only at runtime, often deeper within program flow, affecting error locality and debuggability.

    Supporting References

    • Real Python: Python Protocols: Leveraging Structural Subtyping (2024)
    • The Turing Taco Tales: Static Duck Typing With Python’s Protocols (2024)
    • Xebia: Protocols In Python: Why You Need Them (2022)
    • PEP 544 – Protocols: Structural Subtyping (2017) (Historical and spec source)
    • Python official docs on typing and Abstract Base Classes

    Mental Model Summary for You as a Tech Leader

    Your notes effectively capture the layered nature of interface programming in Python:

    • At the lowest layer, Python runtime embraces dynamic duck typing: just try it and fail fast.
    • To improve runtime type recognition and interoperability, Python uses ABCs with virtual subclassing (`register`) and `_subclasshook_` (“goose typing”), enabling `isinstance` semantics on structural grounds.
    • To further support static analysis tooling, Python offers static protocols that check structure without inheritance, giving formal contracts for type checkers.
    • Finally, runtime-checkable protocols bridge these worlds, allowing runtime `isinstance` checks on protocols designed primarily for static typing.

    Together, these patterns compose a robust, hybrid approach adaptable to many scales and requirements—**rejecting any will unnecessarily limit your Python design flexibility and safety guarantees**

Further Reading

Chapter 14. Inheritance: For Better or for Worse

Focus areas for this chapter:

  • The super() function
  • The pitfalls of subclassing from built-in types
  • Multiple inheritance and method resolution order
  • Mixin classes

chapter introduces multiple inheritance for those who have never used it, and provides some guidance on how to cope with single or multiple inheritance if you must use it.

What’s New in This Chapter

The super() Function

  • example use cases

    • when a subclass overrides a method of a superclass

      and we want to let the superclass method do its job then add more logic to it

    • when we let the superclasses do their part in init fns

  • LANG_LIMITATION / IDIOM: unlike java constructor that automatically calls the nullary super constructor, python doesn’t do this so we need to ALWAYS manually write this in.

    1
    2
    3
    
      def __init__(self, a, b) :
              super().__init__(a, b)
              ... # more initialization code
    
    • it will work (but not recommended) for us to hardcode the base class and call that base class’s function.

      Also won’t work well with the multiple inheritance stuff

Subclassing Built-In Types Is Tricky

  • Main takeaway:

    Subclassing built-in types like dict or list or str directly is error-prone because the built-in methods mostly ignore user-defined overrides. Instead of subclassing the built-ins, derive your classes from the collections module using UserDict, UserList, and UserString, which are designed to be easily extended.

    • it’s a flaw in method delegation within the C Language code of the builtin types (only affects classes derived directly from those types).
  • Major Caveat: bypassing behaviour

    the code of the built-ins (written in C) usually does not call methods overridden by user-defined classes.

    this applies for other dunder methods calling the overriden method.

    using the overriding method directly is likely to work still.

  • This built-in behavior is a violation of a basic rule of object-oriented programming: the search for methods should always start from the class of the receiver (self), even when the call happens inside a method implemented in a superclass.

  • virtual vs nonvirtual methods

    virtual: late-bound

    non-virtual: bound at compile time

    in python, every method is like latebound like a vritual method

    builtins written in C seem to be nonvirtual by default (at least in CPython).

Multiple Inheritance and Method Resolution Order

  • guiding question:

    if we do multiple inheritance and both super classes have overlapping method names, how to make reference to the correct super function from the subclass

    \(\implies\) this is the diamond problem and we wanna see how python solves this

  • 2 factors that determined the activation sequences:

    1. MRO of the leaf class

      Goes all the way from current class all the way to the object class

      Defines the activation order

    2. use of super() in each method

      Determines whether a particular method will be activated.

      So if the method calls super() then we move to the next class in the MRO order and execute that.

      How?

      It’s not necessarily a BFS, it uses the C3 Algorithm (not important to understand unless need to wrangle complex hierarchies.)

  • MRO accounts for inheritance graph. Amongst siblings, it’s determined by the subclass declaration.

    e.g. Leaf(B, A), Leaf(A, B) are two different subclass declarations.

  • Cooperative Methods: methods that call super()

    Cooperative methods enable cooperative multiple inheritance. These terms are intentional: in order to work, multiple inheritance in Python requires the active cooperation of the methods involved.

    GOTCHA: cooperative methods can be a cause of subtle bugs. \(\implies\) That’s why it is recommended that every method m of a nonroot class should call super().m().

A noncooperative method can be the cause of subtle bugs.

Manycoders reading Example 14-4 may expect that when method A.pong calls super.pong(), that will ultimately activate Root.pong. But if B.pong is activated before, it drops the ball.

That’s why it is recommended that every method m of a nonroot class should call super().m().

Mixin Classes

  • definition:

    • designed to be sub classed together with at least one other class as part of a multiple inheritance arrangement

    • won’t provide all the functionality of a concrete object

    • it’s supposed to be functionality mixins \(\implies\) customizes the behaviour of child or sibling classes.

      so naturally will have some concrete methods implemented

    • are a convention that has no explicit language support in python/cpp

  • Mixins must appear first in the tuple of base classes in a class declaration

    mixins typically depend on sibling classes that implements / inherits methods with the same signature

    therefore, they must appear early in the MRO of a subclass that uses it

  • Case-Insensitive Mappings

    See this beautiful example

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    
    """
    Short demos
    ===========
    
    ``UpperDict`` behaves like a case-insensitive mapping`::
    
    # tag::UPPERDICT_DEMO[]
        >>> d = UpperDict([('a', 'letter A'), (2, 'digit two')])
        >>> list(d.keys())
        ['A', 2]
        >>> d['b'] = 'letter B'
        >>> 'b' in d
        True
        >>> d['a'], d.get('B')
        ('letter A', 'letter B')
        >>> list(d.keys())
        ['A', 2, 'B']
    
    # end::UPPERDICT_DEMO[]
    
    And ``UpperCounter`` is also case-insensitive::
    
    # tag::UPPERCOUNTER_DEMO[]
        >>> c = UpperCounter('BaNanA')
        >>> c.most_common()
        [('A', 3), ('N', 2), ('B', 1)]
    
    # end::UPPERCOUNTER_DEMO[]
    
    Detailed tests
    ==============
    
    UpperDict uppercases all string keys.
    
        >>> d = UpperDict([('a', 'letter A'), ('B', 'letter B'), (2, 'digit two')])
    
    
    Tests for item retrieval using `d[key]` notation::
    
        >>> d['A']
        'letter A'
        >>> d['b']
        'letter B'
        >>> d[2]
        'digit two'
    
    
    Tests for missing key::
    
        >>> d['z']
        Traceback (most recent call last):
          ...
        KeyError: 'Z'
        >>> d[99]
        Traceback (most recent call last):
          ...
        KeyError: 99
    
    
    Tests for item retrieval using `d.get(key)` notation::
    
        >>> d.get('a')
        'letter A'
        >>> d.get('B')
        'letter B'
        >>> d.get(2)
        'digit two'
        >>> d.get('z', '(not found)')
        '(not found)'
    
    Tests for the `in` operator::
    
        >>> ('a' in d, 'B' in d, 'z' in d)
        (True, True, False)
    
    Test for item assignment using lowercase key::
    
        >>> d['c'] = 'letter C'
        >>> d['C']
        'letter C'
    
    Tests for update using a `dict` or a sequence of pairs::
    
        >>> d.update({'D': 'letter D', 'e': 'letter E'})
        >>> list(d.keys())
        ['A', 'B', 2, 'C', 'D', 'E']
        >>> d.update([('f', 'letter F'), ('G', 'letter G')])
        >>> list(d.keys())
        ['A', 'B', 2, 'C', 'D', 'E', 'F', 'G']
        >>> d  # doctest:+NORMALIZE_WHITESPACE
        {'A': 'letter A', 'B': 'letter B', 2: 'digit two',
        'C': 'letter C', 'D': 'letter D', 'E': 'letter E',
        'F': 'letter F', 'G': 'letter G'}
    
    UpperCounter uppercases all `str` keys.
    
    Test for initializer: keys are uppercased.
    
        >>> d = UpperCounter('AbracAdaBrA')
        >>> sorted(d.keys())
        ['A', 'B', 'C', 'D', 'R']
    
    Tests for count retrieval using `d[key]` notation::
    
        >>> d['a']
        5
        >>> d['z']
        0
    
    """
    # tag::UPPERCASE_MIXIN[]
    import collections
    
    def _upper(key):  # <1>
        try:
            return key.upper()
        except AttributeError:
            return key
    
    class UpperCaseMixin:  # <2>
        def __setitem__(self, key, item):
            super().__setitem__(_upper(key), item)
    
        def __getitem__(self, key):
            return super().__getitem__(_upper(key))
    
        def get(self, key, default=None):
            return super().get(_upper(key), default)
    
        def __contains__(self, key):
            return super().__contains__(_upper(key))
    # end::UPPERCASE_MIXIN[]
    
    # tag::UPPERDICT[]
    class UpperDict(UpperCaseMixin, collections.UserDict):  # <1>
        pass
    
    class UpperCounter(UpperCaseMixin, collections.Counter):  # <2>
        """Specialized 'Counter' that uppercases string keys"""  # <3>
    # end::UPPERDICT[]
    

Multiple Inheritance in the Real World

Usage of multiple inheritance is not the norm in cpp or python really.

There are some known good use cases though.

  • ABCs Are Mixins Too

    • Just like Java, we support multiple inheritance of interfaces

    • for those ABCs that have concrete methods, they play 2 roles:

      1. they play the role of an interface definition

      2. they also play the role of mixin classes

  • ThreadingMixIn and ForkingMixIn
  • Django Generic Views Mixins

    • the original generic views were functions so they were not extensible

      have to start from scratch if we want to do something similar but not the same

    • Concrete subclasses of View are supposed to implement the handler methods, so why aren’t those methods part of the View interface? The reason: subclasses are free to implement just the handlers they want to support. A TemplateView is used only to display content, so it only implements get

      reminder on why:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      
        Great question! What you’re touching on is a common and important aspect of Python’s interface and subclassing philosophy, especially in frameworks like Django (which uses **View** classes).
      
        ### In Python, do you have to implement *all* methods of an interface/abstract base class?
      
        **Short answer:**
        **Not necessarily.** Unlike some statically typed languages with strict interface contracts, Python’s interfaces and abstract base classes (ABCs) often allow (or even expect) subclasses to implement only the parts of the interface that make sense for them.
      
        ### Why?
      
        1. **Python’s dynamic and flexible nature**
      ​     - Python favors *duck typing* and optional implementation over strict enforcement.
      ​     - ABCs in Python can define optional methods, or provide default no-op or generic implementations in base classes.
      
        2. **Partial method implementation is common in frameworks**
      ​     - Consider a `View` base class that defines many “handler methods” like `get()`, `post()`, `put()`, etc.
      ​     - Concrete subclasses (e.g., `TemplateView`, `CreateView`) implement only the handlers relevant to the use case.
      ​     - This lets you write minimal classes that do exactly what you need, without burden.
      
        3. **Interface methods can be *optional* or *protocol-like***
      ​     - Instead of forcing every subclass to implement an entire interface, the base class or the framework’s machinery calls a method *if it exists*.
      ​     - If the method is not implemented, either a default behavior occurs, or a runtime error only arises if the method is actually called.
      
        4. **Explicit design: selective handler implementation**
      ​     - The `TemplateView` subclass only implements the `get()` method because it only responds to HTTP GET requests.
      ​     - If you called `post()` on a `TemplateView`, it might raise a `NotImplementedError` or return a “method not allowed” response.
      ​     - This selective implementation is a common design pattern in web frameworks.
      
        ### Contrast to stricter languages or interfaces
      
        In languages like Java or C#, interfaces typically require **all** declared methods to be implemented. Python’s philosophy is more permissive:
      
        - **You can think of many interfaces or ABCs in Python as “soft” contracts.**
      ​  - **You implement the “part you use.”**
      ​  - This works well because Python’s runtime dispatch is dynamic: calls to methods that don’t exist will fail only if you exercise them.
      
        ### Summary
      
        - In Python, you **do not always have to implement every method** defined by an interface or abstract base class.
      ​  - Interfaces or base classes often define *optional* or *selective* methods.
      ​  - Subclasses implement only the methods relevant to their behavior.
      ​  - This design promotes flexibility and simplicity in code, especially in toolkits like Django where views handle different HTTP methods.
      ​  - Calling a method that’s not implemented typically leads to a runtime error only if that method is invoked.
      
        If you want, I can provide a small Python example illustrating this partial implementation pattern (e.g., a base class with several methods, and subclasses implementing only some of them). Would that be helpful?
      
    • It does take some time to learn how to leverage class-based views and how to extend them to fulfill specific application needs, but I found that it was worthwhile to study them.

      Value that the mixins give for Django:

      They eliminate a lot of boilerplate code, make it easier to reuse solutions, and even improve team communication—for example, by defining standard names to templates, and to the variables passed to template contexts. Class-based views are Django views “on rails.”

  • Multiple Inheritance in Tkinter

    • the class heirarchy SHOULD NOT be very deep

      usually it’s around 3 or 4 levels of concrete classes

    • GUI toolkits are where inheritance is most useful. The hierarchies can get really deep in them.

⭐️ Coping with Inheritance

These are the rules of thumb that we need to rely on.

We have to do so because there’s no general theory about inheritance that can guide us against creating incomprehensible, brittle designs.

  • Favor Object Composition over Class Inheritance

    • do composition and delegation

      it can even replace the use of mixins and make behaviours available to different classes.

    • subclassing is a form of tight coupling and tall inheritance trees tend to be brittle.

  • Understand Why Inheritance Is Used in Each Case

    Reasons FOR using inheritance:

    1. creates a subtype, so it’s a is-a relationship best done with ABCs

    2. avoids code duplication by reuse, Mixins are useful for this too

    The realisation here is that to prevent code reuse, inheritance is only an implementation detail, we can do composition & delegation too. However, interface inheritance is separate matter.

  • Make Interfaces Explicit with ABCs

    1. Multiple inheritance of ABCs is not problematic.
      • An ABC should subclass only abc.ABC or other ABCs.
      • if a class is intended to define an interface, it should be an explicit ABC or a typing.Protocol subclass.
  • Use Explicit Mixins for Code Reuse

    • for reuse by multiple unrelated subclasses, without implying an “is-a” relationship,
    • not to be instantatied
    • since there’s no formal convention, try to Suffix the mixin name with Mixin
  • Provide Aggregate Classes to Users

    • A class that is constructed primarily by inheriting from mixins and does not add its own structure or behavior is called an aggregate class.

    • group together combinations of ABCs or mixins

      we can now just use the aggregate class without having to figure out in which order they should be declared to work as intended.

    • typically just has an empty body (with docstring / pass)

  • Subclass Only Classes Designed for Subclassing

    • some superclass methods may ignore the subclass overrides in unexpected ways.

      \(\implies\) we should subclass only those that are intended to be extended.

    • how to check?

      • see the docs, if it’s a base class named, that hints at it

      • the docs will also indicate which of the methods are intended to be overridden.

      • see if the @final decorator exists on the method (then it’s not intended for extension by overriding that method)

  • Avoid Subclassing from Concrete Classes

    • if you do this, any internal state within a concrete class might get corrupted

      even if we coorperate by calling super(), there’s still many ways bugs can be introduced

    • If you must use subclassing for code reuse, then the code intended for reuse should be in mixin methods of ABCs or in explicitly named mixin classes.

  • Tkinter: The Good, the Bad, and the Ugly

Chapter Summary

Further Reading

  • Smalltalk has traits which are a language construct that serves the role that a mixin class does, while avoiding some of the issues with multiple inheritance.

    Scala also has traits.

  • while working as an application developer, you find yourself building multilevel class hierarchies, it’s likely that one or more of the following applies:

    1. You are reinventing the wheel. Go look for a framework or library that provides

    2. components you can reuse in your application.

    3. You are using a badly designed framework. Go look for an alternative.

    4. You are overengineering. Remember the KISS principle.

    5. You became bored coding applications and decided to start a new framework.

      Congratulations and good luck!

  • Subclassing in Python Redux

Chapter 15. More About Type Hints

This is a new chapter in this edition of the book.

What’s New in This Chapter

Overloaded Signatures

  • it’s the signatures that we are overloading, not the function.

    remember that python doesn’t allow function overloading!

  • implementation:

    • the actual function will ned no type hints, because the overloads will take care of it

    • can be implemented within the same module:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      
              import functools
              import operator
              from collections.abc import Iterable
              from typing import overload, Union, TypeVar
      
              T = TypeVar('T')
              S = TypeVar('S')  # <1> for the second overload
      
              @overload
              def sum(it: Iterable[T]) -> Union[T, int]: ...  # <2>
              @overload
              def sum(it: Iterable[T], /, start: S) -> Union[T, S]: ...  # <3>
              def sum(it, /, start=0):  # <4>
                  return functools.reduce(operator.add, it, start)
      
  • Max Overload

    • pythonic apis are hard to annotate. this is because they strongly leverage the powerful dynamic features of python

      this section demonstrates what it takes to annotate the max function.

  • Takeaways from Overloading max

    • the expressiveness of annotation markings is very limited, compared to that of python

TypedDict

  • gotcha: remember for json objs we’ll need to do runtime checking. the pydantic package is great for this.

    Static type checking is unable to prevent errors with code that is inherently dynamic, such as json.loads()

  • objective: we want to be able to define the structure of a container type (heterogeneous)

    we should be able to provide a type specific to a key

  • TypedDict have no runtime effect, only for static analysis

    Gives:

    1. Class-like syntax to annotate a dict with type hints for the value of each “field.”

    2. A constructor that tells the type checker to expect a dict with the keys and values as specified.

      1
      2
      3
      4
      5
      6
      7
      
             from typing import TypedDict
      
             class BookDict(TypedDict):
                     isbn: str
                     title: str
                     authors: list[str]
                     pagecount: int
      

      looks very similar to a dataclass builder like a typing.NamedTuple but it isn’t.

  • @ runtime, the constructor just ends up creating a plain dict. No instance attributes, no init functions for the class, no method definitions.

    none of the types will be enforced, “illegal” assignments can happen

Type Casting

  • type casting is for type checkers to get assisted by us

    typing.cast() special function provides one way to handle type checking malfunctions or incorrect type hints in code we can’t fix.

    Casts are used to silence spurious type checker warnings and give the type checker a little help when it can’t quite understand what is going on.

    Does absolutely nothing @ runtime

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
      # tag::CAST[]
      from typing import cast
    
      def find_first_str(a: list[object]) -> str:
          index = next(i for i, x in enumerate(a) if isinstance(x, str))
          # We only get here if there's at least one string
          return cast(str, a[index])
      # end::CAST[]
    
    
      from typing import TYPE_CHECKING
    
      l1 = [10, 20, 'thirty', 40]
      if TYPE_CHECKING:
          reveal_type(l1)
    
      print(find_first_str(l1))
    
      l2 = [0, ()]
      try:
          find_first_str(l2)
      except StopIteration as e:
          print(repr(e))
    
  • too many uses of cast is likely a code-smell; Mypy is not that useless!

  • why casts still have some purpose:

    1. the other workarounds are worse:
      • # type: ignore is less informative

      • Any is contagious, it will have cascading effects through type inference and undermine the type checker’s ability to detect errors in other parts of the code

Reading Type Hints at Runtime

  • within the __annotations__ attribute, it’s a dict that has the names and their types

    the return type has the key "return"

  • annotations are evaluated by the interpreter at import time, just like param default values

  • Problems with Annotations at Runtime

    • extra CPU and memory load when importing

    • types not yet defined are strings instead of actual types \(\implies\) the forward-reference-problem

    • we can use introspection helpers for this

      e.g. inspect.get_type_hints

      this is the recommended way to read type hints at runtime

  • Dealing with the Problem

    • just keep an eye out on how to handle this, it’s likely to change from 3.10 onwards

Implementing a Generic Class

  • have to concretise the generic type by giving a type parameter: machine = LottoBlower[int](range(1, 11))

    here’s a generic LottoBlower:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
      import random
    
      from collections.abc import Iterable
      from typing import TypeVar, Generic
    
      from tombola import Tombola
    
      T = TypeVar('T')
    
      class LottoBlower(Tombola, Generic[T]):  # <1> have to subclass Generic to declare the formal type params
    
          def __init__(self, items: Iterable[T]) -> None:  # <2>
              self._balls = list[T](items)
    
          def load(self, items: Iterable[T]) -> None:  # <3>
              self._balls.extend(items)
    
          def pick(self) -> T:  # <4>
              try:
                  position = random.randrange(len(self._balls))
              except ValueError:
                  raise LookupError('pick from empty LottoBlower')
              return self._balls.pop(position)
    
          def loaded(self) -> bool:  # <5>
              return bool(self._balls)
    
          def inspect(self) -> tuple[T, ...]:  # <6>
              return tuple(self._balls)
    
  • Basic Jargon for Generic Types

    • Generic type: type with 1 or more type vars

    • Formal Type Parameter: the generic type var used to define a generic type

    • Parameterized type: type declared with actual type parameters (resolved)

    • Actual type param: the actual types given as params when a param type is declared

Variance

  • useful to know if we want to support generic container types or provide callback-based APIs.

    Practically speaking, most cases supported if we just support the invariant containers

  • the following sections use a concrete analogy to drive the point:

    Imagine that a school cafeteria has a rule that only juice dispensers can be installed. General beverage dispensers are not allowed because they may serve sodas, which are banned by the school board.

    code:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    
      from typing import TypeVar, Generic
    
    
      class Beverage:
          """Any beverage."""
    
    
      class Juice(Beverage):
          """Any fruit juice."""
    
    
      class OrangeJuice(Juice):
          """Delicious juice from Brazilian oranges."""
    
    
      T_co = TypeVar('T_co', covariant=True)
    
    
      class BeverageDispenser(Generic[T_co]):
          def __init__(self, beverage: T_co) -> None:
              self.beverage = beverage
    
          def dispense(self) -> T_co:
              return self.beverage
    
    
      class Garbage:
          """Any garbage."""
    
    
      class Biodegradable(Garbage):
          """Biodegradable garbage."""
    
    
      class Compostable(Biodegradable):
          """Compostable garbage."""
    
    
      T_contra = TypeVar('T_contra', contravariant=True)
    
    
      class TrashCan(Generic[T_contra]):
          def put(self, trash: T_contra) -> None:
              """Store trash until dumped."""
    
    
      class Cafeteria:
          def __init__(
              self,
              dispenser: BeverageDispenser[Juice],
              trash_can: TrashCan[Biodegradable],
          ):
              """Initialize..."""
    
    
      ################################################ exact types
    
      juice_dispenser = BeverageDispenser(Juice())
      bio_can: TrashCan[Biodegradable] = TrashCan()
    
      arnold_hall = Cafeteria(juice_dispenser, bio_can)
    
    
      ################################################ covariant dispenser
    
      orange_juice_dispenser = BeverageDispenser(OrangeJuice())
    
      arnold_hall = Cafeteria(orange_juice_dispenser, bio_can)
    
    
      ################################################ non-covariant dispenser
    
      beverage_dispenser = BeverageDispenser(Beverage())
    
      ## Argument 1 to "Cafeteria" has
      ## incompatible type "BeverageDispenser[Beverage]"
      ##          expected "BeverageDispenser[Juice]"
      # arnold_hall = Cafeteria(beverage_dispenser, bio_can)
    
    
      ################################################ contravariant trash
    
      trash_can: TrashCan[Garbage] = TrashCan()
    
      arnold_hall = Cafeteria(juice_dispenser, trash_can)
    
    
      ################################################ non-contravariant trash
    
      compost_can: TrashCan[Compostable] = TrashCan()
    
      ## Argument 2 to "Cafeteria" has
      ## incompatible type "TrashCan[Compostable]"
      ##          expected "TrashCan[Biodegradable]"
      # arnold_hall = Cafeteria(juice_dispenser, compost_can)
    
  • An Invariant Dispenser

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    
    # tag::BEVERAGE_TYPES[]
    from typing import TypeVar, Generic
    
    class Beverage:  # <1> we establish a type hierarchy
        """Any beverage."""
    
    class Juice(Beverage):
        """Any fruit juice."""
    
    class OrangeJuice(Juice):
        """Delicious juice from Brazilian oranges."""
    
    T = TypeVar('T')  # <2> simple typevar
    
    class BeverageDispenser(Generic[T]):  # <3> Parameterised on the beverage type
        """A dispenser parameterized on the beverage type."""
        def __init__(self, beverage: T) -> None:
            self.beverage = beverage
    
        def dispense(self) -> T:
            return self.beverage
    
    def install(dispenser: BeverageDispenser[Juice]) -> None:  # <4> module-global function
        """Install a fruit juice dispenser."""
    # end::BEVERAGE_TYPES[]
    
    ################################################ exact type
    
    # tag::INSTALL_JUICE_DISPENSER[]
    juice_dispenser = BeverageDispenser(Juice())
    install(juice_dispenser)
    # end::INSTALL_JUICE_DISPENSER[]
    
    
    ################################################ variant dispenser
    
    # tag::INSTALL_BEVERAGE_DISPENSER[]
    beverage_dispenser = BeverageDispenser(Beverage())
    install(beverage_dispenser)
    ## mypy: Argument 1 to "install" has
    ## incompatible type "BeverageDispenser[Beverage]"
    ##          expected "BeverageDispenser[Juice]"
    # end::INSTALL_BEVERAGE_DISPENSER[]
    
    
    ################################################ variant dispenser
    
    # tag::INSTALL_ORANGE_JUICE_DISPENSER[]
    orange_juice_dispenser = BeverageDispenser(OrangeJuice())
    install(orange_juice_dispenser)
    ## mypy: Argument 1 to "install" has
    ## incompatible type "BeverageDispenser[OrangeJuice]"
    ##          expected "BeverageDispenser[Juice]"
    # end::INSTALL_ORANGE_JUICE_DISPENSER[]
    
    • BeverageDispenser(Generic[T]) is invariant when BeverageDispenser[OrangeJuice] is not compatible with BeverageDispenser[Juice] — despite the fact that OrangeJuice is a subtype-of Juice.

    • It depends on how we have defined the typevar

      In this case, the function was defined with an actual type var: def install(dispenser: BeverageDispenser[Juice]) -> None:

  • A Covariant Dispenser

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    
    from typing import TypeVar, Generic
    
    
    class Beverage:
        """Any beverage."""
    
    
    class Juice(Beverage):
        """Any fruit juice."""
    
    
    class OrangeJuice(Juice):
        """Delicious juice from Brazilian oranges."""
    
    
    # tag::BEVERAGE_TYPES[]
    T_co = TypeVar('T_co', covariant=True)  # <1> convention to suffix it like that.
    
    
    class BeverageDispenser(Generic[T_co]):  # <2> we use the typevar as the param for the generic class
        def __init__(self, beverage: T_co) -> None:
            self.beverage = beverage
    
        def dispense(self) -> T_co:
            return self.beverage
    
    def install(dispenser: BeverageDispenser[Juice]) -> None:  # <3>
        """Install a fruit juice dispenser."""
    # end::BEVERAGE_TYPES[]
    
    ################################################ covariant dispenser
    
    # tag::INSTALL_JUICE_DISPENSERS[]
    
    # both Juice and OrangeJuice aer valid in a covariant BeverageDispenser:
    juice_dispenser = BeverageDispenser(Juice())
    install(juice_dispenser)
    
    orange_juice_dispenser = BeverageDispenser(OrangeJuice())
    install(orange_juice_dispenser)
    # end::INSTALL_JUICE_DISPENSERS[]
    
    ################################################ more general dispenser not acceptable
    
    # tag::INSTALL_BEVERAGE_DISPENSER[]
    beverage_dispenser = BeverageDispenser(Beverage())
    install(beverage_dispenser)
    ## mypy: Argument 1 to "install" has
    ## incompatible type "BeverageDispenser[Beverage]"
    ##          expected "BeverageDispenser[Juice]"
    # end::INSTALL_BEVERAGE_DISPENSER[]
    
    • covariance: the subtype relationship of the parameterized dispensers varies in the same direction as the subtype relationship of the type parameters.

      • two type of types: A: type vars B: dispenser type vars

        The question is whether the we allow the variance in the same direction (co-variant).

    • Supports Generic type and ALSO its subtypes

    • Implementation notes:

      • by convention, the typevar should be suffixed with _co
      • just need to set covariant=True when we declare the typevar
  • A Contravariant Trash Can

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    
    # tag::TRASH_TYPES[]
    from typing import TypeVar, Generic
    
    class Refuse:  # <1>
        """Any refuse."""
    
    class Biodegradable(Refuse):
        """Biodegradable refuse."""
    
    class Compostable(Biodegradable):
        """Compostable refuse."""
    
    T_contra = TypeVar('T_contra', contravariant=True)  # <2>
    
    class TrashCan(Generic[T_contra]):  # <3>
        def put(self, refuse: T_contra) -> None:
            """Store trash until dumped."""
    
    def deploy(trash_can: TrashCan[Biodegradable]):
        """Deploy a trash can for biodegradable refuse."""
    # end::TRASH_TYPES[]
    
    
    ################################################ contravariant trash can
    
    
    # tag::DEPLOY_TRASH_CANS[]
    bio_can: TrashCan[Biodegradable] = TrashCan()
    deploy(bio_can)
    
    trash_can: TrashCan[Refuse] = TrashCan()
    deploy(trash_can)
    # end::DEPLOY_TRASH_CANS[]
    
    
    ################################################ more specific trash can
    
    # tag::DEPLOY_NOT_VALID[]
    compost_can: TrashCan[Compostable] = TrashCan()
    deploy(compost_can)
    ## mypy: Argument 1 to "deploy" has
    ## incompatible type "TrashCan[Compostable]"
    ##          expected "TrashCan[Biodegradable]"
    # end::DEPLOY_NOT_VALID[]
    
    • implementation:

      • use _contra suffix for the contravariant typevar
    • in our example, TrashCan is contravariant on the type of refuse.

  • Variance Review

    • Invariant Types

      • if L (generic type) is invariant, then L[A] (parameterised type) is not a supertype or a subtype of L[B]

        This is regardless of the relationships between A and B (the actual types)

      • examples: mutable collections in python are invariant so list[int] is not consistent-with list[float and vice-versa

      • if a formal type param (T) appears in the type hints of the method args AS WELL AS the return types, then the parameter must be invariant

        this ensures type safety

      • by default, TypeVar creates invariant types

    • Covariant Types

      • nomenclature:

        • X :> Y: means that X is supertype of OR same as Y and vice versa
      • Covariant generic types follow the subtype relationship of the actual type parameters.

        if A :> B (type B is a subclass of type A) and a we consider type C (generic type). Iff C is contravariant then C[A] :> C[B].

        Here, A and B are the actual type params.

      • examples:

        • Frozen set

          float :> int and frozenset[float] :> frozenset[int]

          SAME DIRECTION

        • Iterators

          Any code expecting an abc.Iterator[float] yielding floats can safely use an abc.Iterator[int] yielding integers.

      • Callable types are covariant on the return type

        this is so that the subclass will also work

    • Contravariant Types

      • A :> B, a generic type K is contravariant if K[A] <: K[B]

        reverses the subtype relationship of the actual type parameters (opposite direction)

      • A contravariant container is usually a write-only data structure, aka a sink

        Callable[[ParamType, …], ReturnType] is contravariant on the parameter types, but covariant on the ReturnType

      • examples:

        1. Refuse :> Biodegradable so TrashCan[Refuse] <: TrashCan[Biodegradable]

        2. for callbacks,

          • contravariant formal parameter defines the type of args used to invoke / send data to this object \(\implies\) the callback object is therefore the sink

          • covariant formal params define the types of outputs produced by the object (yield type or return type) \(\implies\) callback object acts as a source the gives outputs.

    • Rules of Thumb

      • producing:

        If a formal type parameter defines a type for data that comes out of the object, it can be covariant. So we can produce more specific (narrower) results than strictly required.

      • consuming:

        If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant. So we can accept/consume more specific (narrower) results than strictly required.

      • consuming == producing \(\implies\) invariant

        If a formal type parameter defines a type for data that comes out of the object and the same parameter defines a type for data that goes into the object, it must be invariant.

      • To err on the safe side, make formal type parameters invariant.

      • Callbacks are special because they accept inputs and return output

        • Python’s typing system with Callable is directly designed to reflect this.

        See this elaboration:

          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
         11
         12
         13
         14
         15
         16
         17
         18
         19
         20
         21
         22
         23
         24
         25
         26
         27
         28
         29
         30
         31
         32
         33
         34
         35
         36
         37
         38
         39
         40
         41
         42
         43
         44
         45
         46
         47
         48
         49
         50
         51
         52
         53
         54
         55
         56
         57
         58
         59
         60
         61
         62
         63
         64
         65
         66
         67
         68
         69
         70
         71
         72
         73
         74
         75
         76
         77
         78
         79
         80
         81
         82
         83
         84
         85
         86
         87
         88
         89
         90
         91
         92
         93
         94
         95
         96
         97
         98
         99
        100
        101
        102
        103
        104
        105
        106
        107
        108
        109
        110
        111
        112
        113
        114
        115
        116
        117
        118
        119
        120
        
        You're absolutely on the right track! **Callbacks** (like callables or function-type objects) in Python's typing system **are indeed special** because **they play both roles:**
        - They **accept inputs** (they are a "sink" for arguments), and
        - They **return outputs** (they are a "source" for results)
        
        This duality is exactly why variance in callables is *split*:
        - **Parameter types are contravariant** (inputs; the callback can accept _more general_ arguments than strictly required)
        - **Return types are covariant** (outputs; the callback can produce _more specific_ results than strictly required)
        
        Let me show this with an example.
        
        ---
        ### Example: Callbacks, Variance, and Typing
        
        Imagine you define this type:
        
        ```python
        from typing import Callable
        
        # A callback that takes a Biodegradable and returns a Juice
        CallbackType = Callable[[Biodegradable], Juice]
        ```
        
        This means:
        - You need something that can accept a `Biodegradable` (or *any* base/parent of Biodegradable, i.e., `Refuse`)
        - And it must return a `Juice` (or any *subtype*, e.g., `OrangeJuice`)
        
        #### Function Matching the Signature
        
        ```python
        def process_refuse(refuse: Refuse) -> OrangeJuice:
            """Accepts any refuse, returns orange juice (a subtype of Juice)."""
            return OrangeJuice()
        ```
        
        This function:
        - Accepts a `Refuse` (broader than just `Biodegradable`) ⇒ **contravariant**
        - Returns an `OrangeJuice` (narrower than `Juice`) ⇒ **covariant**
        
        **We can use process_refuse wherever CallbackType is required, thanks to variance:**
        
        ```python
        def handle_event(cb: CallbackType) -> None:
            refuse = Biodegradable()
            juice = cb(refuse)
            print(f"{juice!r} dispensed.")
        
        handle_event(process_refuse)  # This typechecks!
        ```
        
        ---
        ### Why?
        
        - **Contravariant parameter**:
          A function that can handle any `Refuse` is good enough anywhere that needs just `Biodegradable` — it won't ever receive a non-biodegradable!
        - **Covariant return**:
          If someone expects a `Juice`, it's fine to give them something more specific (like `OrangeJuice`).
        
        ---
        ## Summary Table
        
        | Callback signature                      | Description               | Variance      |
        |----------------------------------------- |---------------------------|-------------- |
        | `Callable[[ParamType], ReturnType]`      | callback taking input(s), returning output | Parameter: **Contravariant**Return: **Covariant** |
        
        - Think of input parameters as arguments being "sent in" (*sink*, contravariant)
        - Think of return values as outputs "sent out" (*source*, covariant)
        
        ---
        ## Intuitive Analogy
        
        - **Input arguments**: You want callbacks that can accept *at least* what you have (so a broader parameter is better).
        - **Output/Return**: You want callbacks to promise to return *at least* what you asked for (so a narrower/specific return type is better).
        
        ---
        ### Expanded Example
        
        #### Class hierarchy
        
        ```python
        class Refuse: pass
        class Biodegradable(Refuse): pass
        class Compostable(Biodegradable): pass
        
        class Beverage: pass
        class Juice(Beverage): pass
        class OrangeJuice(Juice): pass
        ```
        
        #### Callbacks
        
        ```python
        from typing import Callable
        
        # Expected callback: takes a Biodegradable, returns a Juice
        CallbackType = Callable[[Biodegradable], Juice]
        
        def cb1(arg: Refuse) -> OrangeJuice:
            ... # Accepts any Refuse, returns a specific Juice
        
        def cb2(arg: Compostable) -> Beverage:
            ... # Accepts only Compostable, returns a general Beverage
        
        def cb3(arg: Biodegradable) -> Juice:
            ... # Accepts Biodegradable, returns Juice
        
        f: CallbackType
        
        f = cb1  # OK! Wider input, narrower output
        f = cb2  # mypy: error (input too narrow, output too broad)
        f = cb3  # OK! exact types
        ```
        
        ---
        ## Conclusion
        So your summary is **correct**:
        - **Callbacks _are both_ a sink (inputs) and a source (outputs)**.
            - **Input arguments:** Contravariant (wider is OK)
            - **Return type:** Covariant (narrower is OK)
        
        **Python’s typing system with Callable is directly designed to reflect this.**
        

Implementing a Generic Static Protocol

There’s some examples here for reference, nothing special though

Chapter Summary

We have covered:

  1. type @overload, including for the max function

  2. typing.TypedDict, which is not a class builder,

    this is useful for defining the type of a dict (keys and values) when a dict is used as a record, often with the handling of JSON data

    can give a false sense of security though, since it has no runtime effect

  3. typing.cast as a way to handle some issues with type checkers. If overdone, it’s a code smell.

  4. Runtime type hint access including name-forwarding approaches

  5. GENERICS!!

    back to typed world

  6. Generic Static Protocol

    • allows us to be specific in the original protocol form

Further Reading

  1. remember to keep up with Mypy’s docs because the official python docs on typing might lag because of the rate at which new features for typing are introduced.

  2. “Covariance or contravariance is not a property of a type variable, but a property of a generic class defined using this variable.”

    \(\implies\) this is why I was finding it so mindboggling when the topic of variance in generics is not new to me .

    In python, the typevar is what the notion of co/contra-variance is bound to. This happened because the authors worked under the severe self-imposed constraint that type hints should be supported without making any change to the interpreter.

    • that’s why the variance is tied to the TypeVar declaration

    • that’s why the [] is used instead of <> for defining the type param

Chapter 16. Operator Overloading

  • There’s a value in allowing infix operators to handle any arbitrary type (not just primitive types):

    • readable code that allows the non-primitive types to help with exactness of operations

    This is why operator overloading is important.

  • Objectives:

    1. how to overload properly
    2. How an infix operator method should signal it cannot handle an operand
    3. Using duck typing or goose typing to deal with operands of various types
    4. The special behaviour of the rich comparison operators (e.g., =, >, <, etc.)
    5. The default handling of augmented assignment operators such as +=, and how to overload them

What’s New in This Chapter

Operator Overloading 101

  • objective: interoperability of unary/infix/other operators with user defined objects

    other operators includes (), ., [] in python

  • LANG_LIMITATIONS: Python Limitations on operator overloading (to protect us):

    1. can’t change the meaning of the operators for built-in types

    2. can’t create new operators, only can overload existing ones

    3. some operators can’t be overloaded: is, and, or, not

      the bitwise versions can be overloaded though (so $, |, ~)

Unary Operators

  • random notes on these:

    • usually x = +x= but not in some cases
    • bitwise NOT is also ~x = -(x + 1)= if x is 2, then ~x = -3=
  • easy to implement the appropriate unary function, just make the function pure and immutable

    if the receiver itself is immutable, then we can just return self.

  • when is x and +x not equal?

    • e.g. when precision matters. E.g. when using Decimal you can set x based on a particular arithmetic precision, then change the precision and compute x=+x and because the precisions will be different we will get back a False

    • e.g. when using collections.Counter

      TRICK: Unary + produces a new Counter without zeroed or negative tallies. So we can use it to copy (and remove the negatives / zeros).

Overloading + for Vector Addition

  • typically, sequences should support the + operator for concatenation and * for repetition.

  • when we have operands of diff types, we try to look for add or r_add and take a best-effort approach:

    support operations involving objects of different types, Python implements a special dispatching mechanism for the infix operator special methods:

    1. If a has __add__, call a.__add__(b) and return result unless it’s NotImplemented.

    2. If a doesn’t have __add__, or calling it returns NotImplemented, check if b has __radd__, then call b.__radd__(a) and return result unless it’s NotImplemented.

    3. If b doesn’t have __radd__, or calling it returns NotImplemented, raise TypeError with an unsupported operand types message.

  • GOTCHA: NotImplemented is a singleton, not the same as NotImpelmentedError

    Do not confuse NotImplemented with NotImplementedError. The first, NotImplemented, is a special singleton value that an infix operator special method should return to tell the interpreter it cannot handle a given operand. In contrast, NotImplementedError is an exception that stub methods in abstract classes may raise to warn that subclasses must implement them.

  • note that if there’s any error, an overloaded operator function should return NotImplemented instead of other errors like TypeError.

    this is so that the dispatch mechanism is not aborted prematurely

Overloading * for Scalar Multiplication

Using @ as an Infix Operator

  • it’s been used for matrix multiplication, has both reflected version and an in-place version
  • this is a useful goose typing example as well, both the ABCs implement the __subclasshook__ methods so we don’t need explicit subclassing / registration

Wrapping-Up Arithmetic Operators

Rich Comparison Operators

  • differs from the arithmetic operators in these ways:
    1. same set of methods is used in forward and reverse operator calls (with the arguments changed as expected)
    2. for != and ==, if NotImplemented then fallback to id() checks.

Augmented Assignment Operators

  • for immutable objects, the augment assignment operators are just syntactic sugar for the expanded version, that’s why they return new objects
  • for mutable objects, depends on whether we implemented the dunder methods or not
  • Very important: augmented assignment special methods of mutable objects must return self. That’s what users expect.
  • IDIOM: In general, if a forward infix operator method (e.g., mul) is

designed to work only with operands of the same type as self, it’s useless to implement the corresponding reverse method (e.g.,=_rmul_=) because that, by definition, will only be invoked when dealing with an operand of a different type.

Chapter Summary

  • when handling mixed operands, we have 2 choices:

    • use duck typing:

      this is useful and flexible but the error messages may be less useful or even misleading

    • use goose typing:

      this is useful as a compromise between flexibility and safety beacuse existing / future user-defined types can be declared as actual or virtual subclasses of an ABC

      Also if ABC implements the __subclasshook__ then it’s even more convenient because no need explicit subclassing or registration.

  • the in place operator is usually more flexible than its infix operator in terms of type strictness.

Further Reading

Part IV. Control Flow

Chapter 17. Iterators, Generators, and Classic Coroutines

  • iterator design pattern is builtin to python.

  • Every standard collection in Python is iterable. An iterable is an object that provides an iterator,

What’s New in This Chapter

A Sequence of Words

Why Sequences Are Iterable: The iter Function

  1. the dispatch flow is like so:

    1. need to iterate on x \(\implies\) calls iter(x) builtin
    2. try __iter__ implementation
    3. elif try __getitem__, if present then fetch items by index, start from 0-index
    4. fail and raise TypeError
  2. all Python sequences are iterable:

    by definition, they all implement __getitem__ (especially for backward compatibility).

    std sequences also implement __iter__ and custom ones should also have this

  3. this is an extreme form of duck typing:

    an object is considered iterable not only when it implements the special method __iter__, but also when it implements __getitem__

  4. goose typing approach, it’s just checking the existence of __iter__ method. No registration needed because abc.Iterable impelements the __subclasshook__

  5. the ducktyped approach to typechecking for iterable is better than the goose-typing approach

  • Using iter with a Callable

    • when used with a callable, second arg is a sentinel value for detecting the stop iteration.

      sentinel value will never really be yielded because that’s the sentinel.

    • iterators may get exhausted.

    • the callable given to iter() MUST NOT require arguments. If necessary, remember to convert it to a partial function (where the arguments are pre-binded) so that it’s effectively a nullary function.

Iterables Versus Iterators

  • python obtains iterators from iterables

  • any obj for which the iter() builtin can get an iterator is an iterable

    • either gets it from __iter__ or indirectly from __getitem__
  • an iterator raises a StopIteration when there are no further items. there’s no way to check for empty other than this, and there’s no way to reset an iterator other than to create it again.

  • __issubclasshook__ implementation within Iterator:

    1
    2
    3
    4
    5
    
      @classmethod
      def __subclasshook__(cls, C):
              if cls is Iterator:
                      return _check_methods(C, '__iter__', '__next__')
              return NotImplemented
    

    the _check_methods is provided by the abc module

    it traverses the MRO for the class and checks if methods implemented or not

    MISCONCEPTION: virtual subclassing doesn’t ONLY need to be explicitly registered. The use of __issubclasshook__ that relies on __checkmethods__ is an example of implicit virtual subclassing

  • easiest way to typecheck for iterator is to do goosetypecheck: isinstance(x, abc,Iterator)

Sentence Classes with iter

  • iterators are supposed to implement both __next__ and __iter__. the iter dunder method is so that they work well in places that expect and iterable.
  • Sentence Take #2: A Classic Iterator

    • this is just a didatic example, uses a custom class that keeps track of a cursor for the next idx to present value from and if out of bounds, marks as stop iter.
  • Don’t Make the Iterable an Iterator for Itself

    • iterators are also iterable (because they have the __iter__ method that returns self) but iterables are NOT iterators (they can create iterators)

    • common source of error is to confuse the two.

      common antipattern:

      to implement __next__ for an iterable so that an iterable is also an iterator over itself.

    • so a proper implementation of the pattern requires each call to iter(my_iterable) to create a new, independent, iterator.

  • Sentence Take #3: A Generator Function

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
    """
    Sentence: iterate over words using a generator function
    """
    
    # tag::SENTENCE_GEN[]
    import re
    import reprlib
    
    RE_WORD = re.compile(r'\w+')
    
    
    class Sentence:
    
        def __init__(self, text):
            self.text = text
            self.words = RE_WORD.findall(text)
    
        def __repr__(self):
            return 'Sentence(%s)' % reprlib.repr(self.text)
    
        def __iter__(self):
            for word in self.words:  # <1>
                yield word  # <2>
            # <3>
    
    # done! <4>
    
    # end::SENTENCE_GEN[]
    
    • pythonic way is to use a generator instead of a custom class that acts as the iterator

    • here, __iter__ is a generator function

    • a generator function doesn’t raise StopIteration, it just exits when it gets exhausted

  • How a Generator Works

    • a generator function is a generator factory

      it is a function, when called, returns a generator object

      generator function generates generator objects

      generator function and generator objects are not the same

    • not necessary to have just a single yield (typically within a loop construct), we can have as many yield s as we like in our generator function

    • on each next() applied to the generator object, we’ll just end up continuing the control flow until the next yield statement

    • the fallthrough at the end of a generator function is for the generator object to raise StopIteration

      the consumer of the generator object may handle things cleanly

      When the generator function runs to the end, the generator object raises StopIteration. The for loop machinery catches that exception, and the loop terminates cleanly.

    • Language:

      • functions “return” values, generators “yield” values

        generator functions return generator objects

Lazy Sentences

  • Sentence Take #4: Lazy Generator

    • we know that the findall method for the regex was being eager so we use the lazy version: re.finditer. This returns a generator yielding re.MatchObject instances on demand \(\implies\) it’s not lazy.

      finditer builds an iterator over the matches of RE_WORD on self.text, yielding MatchObject instances.

      code:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      
        """
        Sentence: iterate over words using a generator function
        """
      
        # tag::SENTENCE_GEN2[]
        import re
        import reprlib
      
        RE_WORD = re.compile(r'\w+')
      
      
        class Sentence:
      
            def __init__(self, text):
                self.text = text  # <1>
      
            def __repr__(self):
                return f'Sentence({reprlib.repr(self.text)})'
      
            def __iter__(self):
                for match in RE_WORD.finditer(self.text):  # <2>
                    yield match.group()  # <3>
      
        # end::SENTENCE_GEN2[]
      
  • Sentence Take #5: Lazy Generator Expression

    • intent here is to replace generator functions with generator expressions. should be seen as syntactic sugar.

    • we can write generator expressions using generator objects that do not directly consume the generator objects, thereby preserving the lazy behaviour

    • code:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      
        """
        Sentence: iterate over words using a generator expression
        """
      
        # tag::SENTENCE_GENEXP[]
        import re
        import reprlib
      
        RE_WORD = re.compile(r'\w+')
      
      
        class Sentence:
      
            def __init__(self, text):
                self.text = text
      
            def __repr__(self):
                return f'Sentence({reprlib.repr(self.text)})'
      
            def __iter__(self):
                return (match.group() for match in RE_WORD.finditer(self.text))
        # end::SENTENCE_GENEXP[]
      
      
        def main():
            import sys
            import warnings
            try:
                filename = sys.argv[1]
                word_number = int(sys.argv[2])
            except (IndexError, ValueError):
                print(f'Usage: {sys.argv[0]} <file-name> <word-number>')
                sys.exit(2)  # command line usage error
            with open(filename, 'rt', encoding='utf-8') as text_file:
                s = Sentence(text_file.read())
            for n, word in enumerate(s, 1):
                if n == word_number:
                    print(word)
                    break
            else:
                warnings.warn(f'last word is #{n}, {word!r}')
      
        if __name__ == '__main__':
            main()
      
      • the __iter__ method here is no longer a generator function (since it has no yield), it uses a generator expression to build a generator object and returns it

        same outcome though, both cases return a generator object

When to Use Generator Expressions

  • should be seen as a syntactic shortcut to create a generator without defining and calling a function.

  • syntax stuff:

    • if we’re passing in a genexpr as the only argument to a function, we can omit the surrounding () and it will work. This doesn’t work if there’s more than one argument that we’re supplying though.
  • compared with generator functions:

    • generator functions can be seen as coroutines even, supports complex logic with multiple statements

    • should use generator functions when the genexpr looks too complex.

  • Contrasting Iterators and Generators

    • iterators:

      • anything implementing __next__ method
      • produce data for client code consumption:
        • consumed via drivers such as for loops
        • consumed via the explicit calling of next(it)
      • practicall, most iterators in python are Generators.
    • Generators

      • an iterator that the python compiler builds

      • ways to create a generator:

        1. implement a generator function, with a yield keyword. this is a factory of generator objects

        2. use a generator expression to build a generator object

      • it’s the generator objects that provide __next__ so that they are iterators. The generators (generator functions) don’t need to implement __next__

      • we can have async generators

An Arithmetic Progression Generator

  • TRICK: we can see the range function as a built in that generates a bounded arithmetic progression of integers!

  • TRICK: python 3 doesn’t have an explicit type coersion method, but we can work around this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
      def __iter__(self):
              result_type = type(self.begin + self.step)
              # NOTE: by keeping the target result type, we can then coerce it like so:
              result = result_type(self.begin)
              forever = self.end is None
              index = 0
    
              while forever or result < self.end:
                      yield result
                      index += 1
                      result = self.begin + self.step * index
    
  • if the whole point of a class is to build a generator by implementing __iter__, we can replace the class with a generator function. A generator function is, after all, a generator factory.

    code:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
      """
      Arithmetic progression generator function::
    
          >>> ap = aritprog_gen(1, .5, 3)
          >>> list(ap)
          [1.0, 1.5, 2.0, 2.5]
          >>> ap = aritprog_gen(0, 1/3, 1)
          >>> list(ap)
          [0.0, 0.3333333333333333, 0.6666666666666666]
          >>> from fractions import Fraction
          >>> ap = aritprog_gen(0, Fraction(1, 3), 1)
          >>> list(ap)
          [Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]
          >>> from decimal import Decimal
          >>> ap = aritprog_gen(0, Decimal('.1'), .3)
          >>> list(ap)
          [Decimal('0'), Decimal('0.1'), Decimal('0.2')]
    
      """
    
    
      # tag::ARITPROG_GENFUNC[]
      def aritprog_gen(begin, step, end=None):
          result = type(begin + step)(begin)
          forever = end is None
          index = 0
          while forever or result < end:
              yield result
              index += 1
              result = begin + step * index
      # end::ARITPROG_GENFUNC[]
    
  • Arithmetic Progression with itertools

    • ready to use generators in itertools, which we can combine

    • some useful ones:

      1. itertools.count is infinite generator, accepts a start and a step

      2. itertools.takewhile function: it returns a generator that consumes another generator and stops when a given predicate evaluates to False

        example: gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))

    • code:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      
        # tag::ARITPROG_ITERTOOLS[]
        import itertools
      
        def aritprog_gen(begin, step, end=None):
            first = type(begin + step)(begin)
            ap_gen = itertools.count(first, step)
            if end is None:
                return ap_gen
            return itertools.takewhile(lambda n: n < end, ap_gen)
        # end::ARITPROG_ITERTOOLS[]
      

      NOTE: aritprog_gen is not a generator function because it has no yield in its body, it still returns a generator though, like a generator function does.

      1
      
      None
      
    • when implementing generators, know what is available in the standard library, otherwise there’s a good chance you’ll reinvent the wheel.

Generator Functions in the Standard Library

This section focuses on general-purpose functions that take arbitrary iterables as arguments and return generators that yield selected, computed, or rearranged items.

Some groups of standard generators:

  1. Filtering generator functions:

    they yield a subset of items produced by the input iterable, without changing the items themselves.

  2. Mapping generators: (in the context of map functions, not map objects)

    they yield items computed from each individual item in the input iterable—or iterables,

    • starmap is cool. it does an unpacking from an iterator that yields tuples e.g. this gives us a running average: list(itertools.starmap(lambda a, b: b / a, enumerate(itertools.accumulate(sample), 1)))
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      
           The **`itertools.starmap` function** in Python is a specialized iterator that applies a given function to elements from an iterable, where each element is itself an iterable (typically a tuple) and is unpacked as arguments to the function.
      
           ### Detailed Explanation
      
           - `starmap(function, iterable)` is similar to the built-in `map()` function, but while `map()` passes each element as a whole argument to the function, **`starmap` unpacks each element (tuple) as positional arguments to the function**.
      ​     - This means if you have an iterable of tuples like `[(a1, b1), (a2, b2), ...]`, `starmap` calls `function(a1, b1)`, `function(a2, b2)`, etc.
      ​     - It returns an iterator that yields the results of these function calls lazily, i.e., items are computed only as you iterate.
      
           ### Mental Model
      
           - Think of `starmap` as a clever iterator transform that "unzips" or unpacks arguments for you when you have multiple parameters.
      ​     - It is particularly useful when working with functions or operations that expect multiple positional arguments but you have your data already grouped as tuples in your iterable.
      ​     - This differs from `map`, which treats each element as a single argument.
      
           ### Prototype of `starmap` behavior (simplified)
      
           ```python
           def starmap(function, iterable):
               for args in iterable:
                   yield function(*args)  # unpack each tuple as arguments
           ```
      
           ### Practical Example
      
           ```python
           from itertools import starmap
      
           def multiply(x, y):
               return x * y
      
           pairs = [(2, 3), (4, 5), (6, 7)]
           result = starmap(multiply, pairs)
      
           print(list(result))  # Output: [6, 20, 42]
           ```
      
           In this example, `multiply` requires two arguments; `pairs` contains tuples with two elements each, and `starmap` calls `multiply` with unpacked tuple elements.
      
           ### Comparison with `map`
      
           - Using `map(pow, [(2, 3), (4, 5)])` would fail because `pow` expects two separate arguments, but `map` passes the entire tuple as one argument.
      ​     - Using `starmap(pow, [(2, 3), (4, 5)])` works because it unpacks the tuples as separate arguments automatically.
      
           ### Use Cases Beyond Simple Functions
      
           - It is often leveraged to apply functions like `pow`, arithmetic operators, or user-defined functions that take multiple arguments.
      ​     - Useful in multiprocessing scenarios (e.g., `multiprocessing.Pool.starmap`) for applying functions with multiple inputs concurrently.
      
           ### Summary Table
      
           | Aspect                    | Description                                                         |
           |---------------------------|---------------------------------------------------------------------|
           | Function signature        | `itertools.starmap(function, iterable_of_arg_tuples)`              |
           | Functional behavior       | Applies function as `function(*args)` for each tuple in iterable    |
           | Returns                   | An iterator yielding results lazily                                |
           | Difference from `map`     | `map` passes each element as-is; `starmap` unpacks tuple arguments  |
           | Use case                  | Applying multi-argument functions over an iterable of argument tuples|
      
           ### References
      
           - Python official docs for itertools: `starmap` applies a function to unpacked arguments from tuples in an iterable.
      ​     - Tutorialspoint, GeeksforGeeks, and Educative.io provide practical examples demonstrating the use and difference from `map`.
      ​     - Multiprocessing's `Pool.starmap()` uses exactly the same concept to map multi-argument functions in parallel.
      
           This understanding helps senior engineers grasp how `starmap` elegantly bridges the gap between iterable data structures and multi-argument function applications in Python’s iterator toolkit.
      
           [1] https://www.tutorialspoint.com/python/python_itertools_starmap_function.htm
           [2] https://www.geeksforgeeks.org/python/python-itertools-starmap/
           [3] https://www.educative.io/answers/what-is-the-itertoolsstarmap-method-in-python
           [4] https://www.mybluelinux.com/python-map-and-starmap-functions/
           [5] https://superfastpython.com/multiprocessing-pool-starmap/
           [6] https://docs.python.org/3/library/multiprocessing.html
           [7] https://indhumathychelliah.com/2020/09/14/exploring-map-vs-starmap-in-python/
           [8] https://stackoverflow.com/questions/56672348/applying-the-pool-starmap-function-with-multiple-arguments-on-a-dict-which-are
           [9] https://www.youtube.com/watch?v=aUUJRF6Zako
      
  3. Merging Generators: yield items from multiple input iterables

    • chain.from_iterable: It’s almost like flattening.
  4. Generator functions that expand each input into multiple output items:

    • pairwise is interesting: each item in the input, pairwise yields a 2-tuple with that item and the next — if there is a next item.

      list(itertools.pairwise(range(7)))

  5. TRICK: Combinatorics Generators see the elaboration here:

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    
       The **combinatorics generators in the `itertools` module** of Python are a suite of powerful, memory-efficient iterator-based functions designed to generate combinatorial collections such as permutations, combinations, and Cartesian products systematically without building them fully in memory. These functions are foundational for algorithmic tasks involving combinatorial enumeration, sampling, or search.
    
       Here is a detailed outline tailored for a senior engineer who values mental models, rigor, and first principles:
    
       ***
    
       ### 1. Overview of Combinatorics Generators in `itertools`
    
       Pythons `itertools` module offers **four primary combinatoric iterators** that generate combinatorial constructs lazily:
    
       | Iterator Name                  | Purpose                                     | Key Characteristics                                      |
       |-------------------------------|---------------------------------------------|----------------------------------------------------------|
       | `product()`                   | Cartesian product of input iterables      | Generates tuples combining every element with every other (with optional repetition) |
       | `permutations()`               | All possible orderings (permutations)      | Generates all possible ordered arrangements of a specified length |
       | `combinations()`               | Combinations without replacement            | Generates all possible selections of a specified length without regard to order |
       | `combinations_with_replacement()` | Combinations allowing repeated elements     | Like combinations but allows repeated elements in output  |
    
       ***
    
       ### 2. Detailed Description with Mental Models
    
       #### a. `itertools.product(*iterables, repeat=1)`
    
       - **Conceptual model:** The Cartesian product can be thought of as the "all pairs/all tuples" construction, where you combine every element of iterable 1 with every element of iterable 2, and so forth.
       - **Use case:** Explores all possible selections when repetition and order matter.
       - **Implementation detail:** Produces tuples where each position corresponds to one iterable element. The `repeat` argument simulates self cartesian products.
       - **Example:**
    
         ```python
         from itertools import product
    
         list(product([1, 2], repeat=2))
         # Output: [(1, 1), (1, 2), (2, 1), (2, 2)]
         ```
    
       - **Scaling note:** The output size grows multiplicatively  caution with large inputs.
    
       #### b. `itertools.permutations(iterable, r=None)`
    
       - **Conceptual model:** All possible orderings of `r` distinct elements from the iterable, where order matters.
       - **Use case:** Problems requiring permutations without replacement.
       - **Implementation:** Yields tuples of length `r` (default `r` equals length of iterable).
       - **Example:**
    
         ```python
         from itertools import permutations
    
         list(permutations('ABC', 2))
         # Output: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
         ```
    
       - **Key mental model:** Unlike combinations, `'AB'` and `'BA'` are distinct.
    
       #### c. `itertools.combinations(iterable, r)`
    
       - **Conceptual model:** Select `r` elements from iterable, ignoring order and disallowing repetitions.
       - **Use case:** Choosing subsets or unique groupings.
       - **Implementation:** Yields sorted tuples of length `r`.
       - **Example:**
    
         ```python
         from itertools import combinations
    
         list(combinations('ABC', 2))
         # Output: [('A', 'B'), ('A', 'C'), ('B', 'C')]
         ```
    
       - **Mental model:** For `['A', 'B', 'C']` picking 2 is like choosing pairs regardless of arrangement.
    
       #### d. `itertools.combinations_with_replacement(iterable, r)`
    
       - **Conceptual model:** Like combinations but elements can appear multiple times.
       - **Use case:** Selecting combinations where repeats are allowed.
       - **Implementation:** Yields sorted tuples where elements can recur.
       - **Example:**
    
         ```python
         from itertools import combinations_with_replacement
    
         list(combinations_with_replacement('AB', 2))
         # Output: [('A', 'A'), ('A', 'B'), ('B', 'B')]
         ```
    
       ***
    
       ### 3. Functional and Performance Notes
    
       - All these combinatoric iterators **return generator objects**, supporting lazy evaluation to handle potentially large combinatorial spaces without memory blowup.
       - Outputs are tuples representing fixed-size sequences or selections.
       - The generation order and structure adhere to mathematical combinatorics rules (e.g., lexicographic ordering for combinations).
       - Be mindful of the combinatorial explosion, where output size can grow factorially or exponentially with input size.
    
       ***
    
       ### 4. Related Tools and Concepts in `functools` and Others
    
       - While `functools` does not include combinatorics generators directly, its role complements iterator tools by providing **function composition**, **partial application** (`partial`), and caching mechanisms (`lru_cache`) common in higher-order functional programming.
       - For complex pipeline building involving combinatorics generators, these tools enhance composability and performance in functional-style codebases.
    
       ***
    
       ### Mental Model Summary Table
    
       | Function                           | Category           | Usage Mental Model                              | Output                         | Notes                          |
       |----------------------------------|--------------------|------------------------------------------------|-------------------------------|--------------------------------|
       | `product(*iterables, repeat=1)`  | Cartesian product  | Cross-combine elements, repeat allows self-product | Tuples representing Cartesian product elements | Size = $$\prod |iterables_i|$$ or $$|iterable|^{repeat}$$ |
       | `permutations(iterable, r=None)` | Permutations       | Order-sensitive arrangements of length r       | Tuples of length r             | Number of permutations = n!/(n-r)! |
       | `combinations(iterable, r)`       | Combinations       | Unordered selection of r items without replacement | Tuples of length r             | Number = C(n, r)               |
       | `combinations_with_replacement(iterable, r)` | Combinations with replacement | Unordered selection allowing duplicates        | Tuples of length r             | Number = C(n+r-1, r)           |
    
       ***
    
       ### References
    
       - Python official documentation  `itertools` [combinatoric iterators section](https://docs.python.org/3/library/itertools.html#itertools.combinations)
       - GeeksforGeeks "Combinatoric Iterators in Python" (2025)[1][2]
       - Blog posts and overviews such as Juha-Matti Santalas "Combinatoric iterators from itertools" (2024)[3]
       - Real Python and Towards Data Science tutorials on `itertools`
    
       This combinatorics toolkit in `itertools` forms the backbone for algorithmic exploration in exhaustive, sampled, or random combinatorial problems while preserving efficiency and clarity in Python's iterator model. Let me know if you'd like me to dive into usage patterns, performance characteristics, or idiomatic compositions with these combinatorics generators.
    
       [1] https://www.geeksforgeeks.org/python-itertools/
       [2] https://www.geeksforgeeks.org/combinatoric-iterators-in-python/
       [3] https://hamatti.org/posts/combinatoric-iterators-from-itertools/
       [4] https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Itertools.html
       [5] https://towardsdatascience.com/a-guide-to-python-itertools-like-no-other-454da1ddd5b8/
       [6] https://realpython.com/python-itertools/
       [7] https://mathspp.com/blog/module-itertools-overview
       [8] https://docs.python.org/3/library/itertools.html
       [9] https://labex.io/tutorials/python-how-to-use-itertools-combinations-in-python-398083
       [10] https://arjancodes.com/blog/python-itertools-module-tutorial-for-efficient-data-handling/
    
  6. Rearranger Generators: yield all items in the input iterables, in some rearranged ways

    They all accept at most one input variable.

    • itertools.groupby, itertool.tee return multiple generators

      • GOTCHA: itertools.groupby assumes that the input iterable is sorted by the grouping criterion, or at least that the items are clustered by that criterion — even if not completely sorted.

        e.g. use case: you can sort the datetime objects chronologically, then groupby weekday to get a group of Monday data, followed by Tuesday data, etc., and then by Monday (of the next week) again, and so on.

    • itertools.tee similar to unix tee, gives us multiple generators to consume the yielded values independently.

      which has a unique behavior: it yields multiple generators from a single input iterable, each yielding every item from the input. Those generators can be consumed independently,

  • reversed only works with sequences

Iterable Reducing Functions

  • given an iterable, they return a single result \(\implies\) “reducing”/ “folding” / “accumulating” functions.

    Naturally, they have to work with bounded iterables, won’t work with infinite iterables.

  • all and any have the ability to short-circuit!

Subgenerators with yield from

  • objective is to let a generator delegate to a subgenerator
  • uses yield from
  • Reinventing chain

    Here’s the implementation without yield from

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    def chain(*iterables):
            for it in iterables:
                    for i in it:
                            yield i
    
    s = 'ABC'
    r = range(3)
    
    return list(chain(s, r))
    

    here’s how we can implement itertools.chain using yield from

    ABC012
    1
    2
    3
    4
    5
    6
    7
    8
    
    def chain(*iterables):
            for it in iterables:
                    yield from it
    
    s = 'ABC'
    r = range(3)
    
    return list(chain(s, r))
    
  • Traversing a Tree

    • Step 2: using a subgenerator for the subtrees

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      
      def tree(cls):
          yield cls.__name__, 0
          yield from sub_tree(cls)              # <1> here's the delegation from tree to sub_tree. here, the tree generator is suspended, and sub_tree takes over yielding values
      
      
      def sub_tree(cls):
          for sub_cls in cls.__subclasses__():
              yield sub_cls.__name__, 1         # <2>
      
      
      def display(cls):
          for cls_name, level in tree(cls):     # <3>
              indent = ' ' * 4 * level
              print(f'{indent}{cls_name}')
      
      
      if __name__ == '__main__':
          display(BaseException)
      
      • the delegation from generator to sub-generator is interesting

        here, the tree generator is suspended, and sub_tree takes over yielding values

      • we soon observe the following pattern:

        We do a for loop to get the subclasses of level N. Each time around the loop, we yield a subclass of level N, then start another for loop to visit level N+1.

    • Step 5

      • we use the pattern seen before and call the same generator function again as a subgenerator:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        
          def tree(cls):
              yield cls.__name__, 0
              yield from sub_tree(cls, 1)
        
        
          def sub_tree(cls, level):
              for sub_cls in cls.__subclasses__():
                  yield sub_cls.__name__, level
                  yield from sub_tree(sub_cls, level+1)
        
        
          def display(cls):
              for cls_name, level in tree(cls):
                  indent = ' ' * 4 * level
                  print(f'{indent}{cls_name}')
        
        
          if __name__ == '__main__':
              display(BaseException)
        

        This is limited only by Python’s recursion limit. The default limit allows 1,000 pending functions.

        This also has an implicit base case:

        sub_tree has no if, but there is an implicit conditional in the for loop: if cls.__subclasses__() returns an empty list, the body of the loop is not executed, therefore no recursive call happens. The base case is when the cls class has no subclasses. In that case, sub_tree yields nothing. It just returns.

    • Step 6: merge into a single generator

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      
      def tree(cls, level=0):
          yield cls.__name__, level
          for sub_cls in cls.__subclasses__():
              yield from tree(sub_cls, level+1)
      
      
      def display(cls):
          for cls_name, level in tree(cls):
              indent = ' ' * 4 * level
              print(f'{indent}{cls_name}')
      
      
      if __name__ == '__main__':
          display(BaseException)
      

      yield from connects the subgenerator directly to the client code, bypassing the delegating generator. That connection becomes really important when generators are used as coroutines and not only produce but also consume values from the client code,

Generic Iterable Types

  • Mypy, reveals that the Iterator type is really a simplified special case of the Generator type.
  • Iterator[T] is a shortcut for Generator[T, None, None]. Both annotations mean “a generator that yields items of type T, but that does not consume or return values.”
  • Generators can consume and return values \(\implies\) they are classic coroutines

Classic Coroutines via Enhanced Generators

  • “generators that can consume and return values”

  • these are not supported by asyncio

  • the modern, native coroutines are just called “coroutines” now.

  • 2 ways to typehint generators:

    Underlying C implementation is the same, they are just USED differently.

    1. as an iterator: readings: Iterator[float]

      Bound to an iterator / generator object that yields float items

    2. as a coroutine: sim_taxi: Generator[Event, float, int]

      The `sim_taxi` variable can be bound to a coroutine representing a taxi cab in a discrete event simulation. It yields events, receives `float` timestamps, and returns the number of trips made during the simulation

  • The type is named Generator, when in fact it describes the API of a generator object intended to be used as a coroutine, while generators are more often used as simple iterators.

    Generator[YieldType, SendType, ReturnType]

    Generator type has the same type parameters as typing.Coroutine: Coroutine[YieldType, SendType, ReturnType] (deprecated in favour of collections.abc.Coroutine) which is to annotate only native co-routines, not classic coroutines.

  • Some guidelines to avoid confusion:

    • Generators produce data for iteration
    • Coroutines are consumers of data
    • To keep your brain from exploding, don’t mix the two concepts together
    • Coroutines are not related to iteration
    • Note: There is a use of having `yield` produce a value in a coroutine, but it’s not tied to iteration.
  • Example: Coroutine to Compute a Running Average

    Old example of running average using closures. This is a higher order function.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    def make_averager():
            count = 0
            total = 0
    
            def averager(new_value):
                    nonlocal count, total
                    count += 1
                    total += new_value
                    return total / count
            return averager
    

    yield statement here suspends the coroutine, yields a result to the client, and — later — gets a value sent by the caller to the coroutine, starting another iteration of the infinite loop.

    The coroutine can keep internal state without needing any instance attrs or closures. They keep local state between activations \(\implies\) attractive replacement for callbacks in async programming

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    
    """
    A coroutine to compute a running average
    
    # tag::CORO_AVERAGER_TEST[]
        >>> coro_avg = averager()  # <1>
        >>> next(coro_avg)  # <2>
        0.0
        >>> coro_avg.send(10)  # <3>
        10.0
        >>> coro_avg.send(30)
        20.0
        >>> coro_avg.send(5)
        15.0
    
    # end::CORO_AVERAGER_TEST[]
    # tag::CORO_AVERAGER_TEST_CONT[]
    
        >>> coro_avg.send(20)  # <1>
        16.25
        >>> coro_avg.close()  # <2>
        >>> coro_avg.close()  # <3>
        >>> coro_avg.send(5)  # <4>
        Traceback (most recent call last):
          ...
        StopIteration
    
    # end::CORO_AVERAGER_TEST_CONT[]
    
    """
    
    # tag::CORO_AVERAGER[]
    from collections.abc import Generator
    
    def averager() -> Generator[float, float, None]:  # <1> yields float, accepts float, nothing useful returned
        total = 0.0
        count = 0
        average = 0.0
        while True:  # <2> will keep accepting as long as there are values sent to this coroutine
            term = yield average  # <3> =yield= statement here suspends the coroutine, yields a result to the client, and — later — gets a value sent by the caller to the coroutine, starting another iteration of the infinite loop.
            total += term
            count += 1
            average = total/count
    # end::CORO_AVERAGER[]
    
    • Priming/Starting the Coroutine

      We can do an initial next(my_coroutine)

      OR, we can send(None) to start it off. Only None works here because a coroutine can’t accept a sent value, unless it is suspended at a yield line.

    • Multiple activations

      • After each activation, the coroutine is suspended precisely at the yield keyword, waiting for a value to be sent.

      • coro_avg.send(10): yield expression resolves to the value 10, assigning it to the term variable. The rest of the loop updates the total, count, and average variables. The next iteration in the while loop yields the average, and the coroutine is again suspended at the yield keyword.

      • i notice that there’s 2 states to the co-routine: active and suspended.

    • Terminating a coroutine

      • can just stop referring to it and the coroutine can be garbage collected

      • for explicit termination, we can call coro_avg.close()

      • .close() method raises GeneratorExit at the suspended yield expression. If not handled in the coroutine function, the exception terminates it. GeneratorExit is caught by the generator object that wraps the coroutine—that’s

      • calling close on a closed coroutine does nothing, but sending to a closed coroutine raises StopIteration

  • Returning a Value from a Coroutine

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    
    """
    A coroutine to compute a running average.
    
    Testing ``averager2`` by itself::
    
    # tag::RETURNING_AVERAGER_DEMO_1[]
    
        >>> coro_avg = averager2()
        >>> next(coro_avg)
        >>> coro_avg.send(10)  # <1>
        >>> coro_avg.send(30)
        >>> coro_avg.send(6.5)
        >>> coro_avg.close()  # <2>
    
    # end::RETURNING_AVERAGER_DEMO_1[]
    
    Catching `StopIteration` to extract the value returned by
    the coroutine::
    
    # tag::RETURNING_AVERAGER_DEMO_2[]
    
        >>> coro_avg = averager2()
        >>> next(coro_avg)
        >>> coro_avg.send(10)
        >>> coro_avg.send(30)
        >>> coro_avg.send(6.5)
        >>> try:
        ...     coro_avg.send(STOP)  # <1>
        ... except StopIteration as exc:
        ...     result = exc.value  # <2>
        ...
        >>> result  # <3>
        Result(count=3, average=15.5)
    
    # end::RETURNING_AVERAGER_DEMO_2[]
    
    Using `yield from`:
    
    
    # tag::RETURNING_AVERAGER_DEMO_3[]
    
    NOTE: this uses a delegating generator
    
        >>> def compute():
        ...     res = yield from averager2(True)  # <1>
        ...     print('computed:', res)  # <2>
        ...     return res  # <3>
        ...
        >>> comp = compute()  # <4>
        >>> for v in [None, 10, 20, 30, STOP]:  # <5>
        ...     try:
        ...         comp.send(v)  # <6>
        ...     except StopIteration as exc:  # <7> Have to capture the StopIteration, else the GeneratorExit exception is raised at the yield line in the coroutine, so the return statement is never reached.
        ...         result = exc.value
        received: 10
        received: 20
        received: 30
        received: <Sentinel>
        computed: Result(count=3, average=20.0)
        >>> result  # <8>
        Result(count=3, average=20.0)
    
    # end::RETURNING_AVERAGER_DEMO_3[]
    """
    
    # tag::RETURNING_AVERAGER_TOP[]
    from collections.abc import Generator
    from typing import Union, NamedTuple
    
    class Result(NamedTuple):  # <1>
        count: int  # type: ignore  # <2>
        average: float
    
    class Sentinel:  # <3>
        def __repr__(self):
            return f'<Sentinel>'
    
    STOP = Sentinel()  # <4>
    
    SendType = Union[float, Sentinel]  # <5> modern python, write it as SendType: TypeAlias = float | Sentinel, or directly use the =|= union in the generator SendType type param
    
    # end::RETURNING_AVERAGER_TOP[]
    # tag::RETURNING_AVERAGER[]
    def averager2(verbose: bool = False) -> Generator[None, SendType, Result]:  # <1> None data yielded, returns Result type, which is a named tuple (subclass of tuple)
        total = 0.0
        count = 0
        average = 0.0
        while True:
            term = yield  # <2> this consumes data (when resuming, assigned to variable named "term")
            if verbose:
                print('received:', term)
            if isinstance(term, Sentinel):  # <3> if received the Sentinel, break from infinite loop.
                break
            total += term  # <4>
            count += 1
            average = total / count
        return Result(count, average)  # <5> reachable only if Sentinel is sent to the coroutine
    
    # end::RETURNING_AVERAGER[]
    
    • coroutines can consume data, (based on the SendType type param). So that’s why it makes sense to use yield directly.

    • Calling .close() in this coroutine makes it stop but does not return a result, because the GeneratorExit exception is raised at the yield line in the coroutine, so the return statement is never reached.

      Instead, we do:

      1
      2
      3
      4
      5
      
                try:
                        coro_avg.send(STOP)
                except StopIteration as exc:
                        result = exc.value
        return result
      
      1. STOP sentinel makes the coroutine break from the loop and return a Result The generator object that wraps the coroutine then raises StopIteration.

      2. The instance of StopIteration has a value attribute bound to the value of the return statement that terminated the coroutine.

  • Generic Type Hints for Classic Coroutines

    • stick to my existing mental model:

      • producers can produce more specific types, i.e. can be narrower \(\implies\) they are covariant \(\implies\) they are output types

        so this corresponds to the YieldType and ReturnType

                           float :&gt; int
        

        Generator[float, Any, float] :> Generator[int, Any, int]

        Corresponds to variance rule of thumb 1:

        If a formal type parameter defines a type for data that comes out of the object, it can be covariant.

      • consumers can accept more generic types i.e. they can accept wider \(\implies\) they are contravariant \(\implies\) they are input types

        so this corresponds to SendType

                         float :&gt; int
        

        Generator[Any, float, Any] <: Generator[Any, int, Any]

        Corresponds to variance rule of thumb 2:

        If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant.

    • elaboration on why my mental model is correct:

        1
        2
        3
        4
        5
        6
        7
        8
        9
       10
       11
       12
       13
       14
       15
       16
       17
       18
       19
       20
       21
       22
       23
       24
       25
       26
       27
       28
       29
       30
       31
       32
       33
       34
       35
       36
       37
       38
       39
       40
       41
       42
       43
       44
       45
       46
       47
       48
       49
       50
       51
       52
       53
       54
       55
       56
       57
       58
       59
       60
       61
       62
       63
       64
       65
       66
       67
       68
       69
       70
       71
       72
       73
       74
       75
       76
       77
       78
       79
       80
       81
       82
       83
       84
       85
       86
       87
       88
       89
       90
       91
       92
       93
       94
       95
       96
       97
       98
       99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      
        Your **mental model about generic type hints for classic coroutines in Python is fundamentally correct** and aligns well with the design principles of variance in type systems and how they apply to the parameters of the `Generator` or `Coroutine` generic types.
      
        To unpack this rigorously:
      
        ***
      
        ### Context: Generic Type Parameters for Classic Coroutines
      
        Classic (generator-based) coroutines and generators can be annotated using the generic `Generator` type hint which has three type parameters:
      
        $$
        Generator[YieldType, SendType, ReturnType]
        $$
      
        - **YieldType** — the type of values **produced** (yielded) by the generator/coroutine.
      ​  - **SendType** — the type of values the generator/coroutine **accepts** via `.send()` (like parameters you can pass in).
      ​  - **ReturnType** — the type of the value returned upon completion (from the `return` statement inside the generator).
      
        ***
      
        ### Your mental model restated
      
        - **Producers produce more specific types (narrower types).** This corresponds to output types, and hence these are **covariant types**.
          This includes `YieldType` (values yielded) and `ReturnType` (final returned value).
      
        - **Consumers accept more generic types (wider types).** These correspond to input types, hence they are **contravariant types**.
          This corresponds to `SendType` — the values you send **into** the coroutine, which can be more general.
      
        ***
      
        ### What does this mean in terms of type variance?
      
        - **Covariance** means that if $$A$$ is a subtype of $$B$$, then a producer of $$A$$ is also a producer of $$B$$ (safe to substitute). For example, if you expect `YieldType` to be a `Animal`, then yielding `Dog` (subclass of Animal) is safe.
      
        - **Contravariance** means that consumers are the opposite — if you can accept a `B`, then you can accept an `A` if $$A$$ is a subtype of $$B$$ (i.e., input types vary contravariantly). The coroutine can accept wider types for sending.
      
        **Consequently:**
      
        - `YieldType` and `ReturnType` are **covariant** (output positions).
      ​  - `SendType` is **contravariant** (input position).
      
        ***
      
        ### Supporting evidence from Python typing docs and references:
      
        From the official `typing` module documentation and the typing system used in Python:
      
        - The `Generator` and `Coroutine` generic types define variance as follows:
      
        ```python
        from typing import Generator
      
        # Generator[YieldType, SendType, ReturnType]
        # YieldType is covariant
        # SendType is contravariant
        # ReturnType is covariant
        ```
      
        This is also explicitly mentioned in the Python 3.12 typing docs [typing — Support for type hints](https://docs.python.org/3/library/typing.html):
      
        > "Note that unlike many other generic classes in the standard library, the `SendType` of `Generator` behaves contravariantly, not covariantly or invariantly."
      
        ***
      
        ### Mental model mapping for classic coroutines:
      
        | Role                     | Type Parameter     | Type Variance        | Reason / Interpretation                          |
        |--------------------------|--------------------|---------------------|------------------------------------------------|
        | Values *output* (yielded) | `YieldType`        | Covariant (produced) | Coroutine produces values of this type          |
        | Value passed *in* to `.send()` | `SendType`      | Contravariant (consumed) | Coroutine consumes values of this type from caller |
        | Value *returned* upon completion | `ReturnType`     | Covariant (produced) | Coroutine returns this type after finishing     |
      
        ***
      
        ### Intuition:
      
        - When you **consume** a coroutine, you can **send** values that are more general (superclasses, supertypes) than what the coroutine internally expects — because the coroutine must handle all these general cases safely.
      ​  - When a coroutine **yields** or **returns** values, it provides more specific types (subclasses or narrower types) back to the caller.
      
        ***
      
        ### Example to make it concrete
      
        Suppose you have:
      
        ```python
        from typing import Generator
      
        def coro() -> Generator[int, str, float]:
            # Yield ints
            val = yield 1
            # Accept strings sent via send
            ...
            return 3.14
        ```
      
        - `YieldType` = `int`: you produce integers.
      ​  - `SendType` = `str`: you accept strings sent to the coroutine.
      ​  - `ReturnType` = `float`: coroutine returns a float on completion.
      
        This fits the mental model that:
      
        - You produce (yield) integers (specific concrete type).
      ​  - You consume (accept via `.send()`) strings (input).
      ​  - You return a float (specific type).
      
        ***
      
        ### Summary
      
        Your mental model:
      
        - **Producers = covariant = YieldType, ReturnType**
      ​  - **Consumers = contravariant = SendType**
      
        is **correct and aligns perfectly with the Python typing system** for classic coroutines and generators.
      
        ***
      
        ### Supporting references for further deep-dive:
      
        - [Python typing module documentation - Annotating generators and coroutines](https://docs.python.org/3/library/typing.html#annotating-generators-and-coroutines) (2025-08-08)
      ​  - Fluent Python by Luciano Ramalho, "Classic Coroutines" chapter, which discusses these typing details around classic coroutines and variance.
      ​  - Python typing source and PEP 484 (Type Hints) discussing variance in generic types.
      ​  - Stack Overflow [The right way to type hint a Coroutine function?](https://stackoverflow.com/questions/73240620/the-right-way-to-type-hint-a-coroutine-function) detailing coroutine generic parameters and variance.
      
        ***
      
        Please let me know if you want me to expand on how this mental model impacts safe API design, variance variance violations, or practical examples of generic coroutine typing!
      
        [1] https://stackoverflow.com/questions/73240620/the-right-way-to-type-hint-a-coroutine-function
        [2] https://docs.python.org/3/library/typing.html
        [3] https://www.r-bloggers.com/2023/06/leveraging-generic-type-hints-of-classes-in-python/
        [4] https://www.fluentpython.com/extra/classic-coroutines/
        [5] https://www.reddit.com/r/Python/comments/10zdidm/why_type_hinting_sucks/
        [6] https://www.linkedin.com/pulse/python-generators-elegant-efficient-often-underused-peter-eldritch-a7faf
        [7] https://docs.python.org/3/reference/datamodel.html
      

Chapter Summary

  • kiv native coroutines will come soon, the yield from is now just await in the native coroutine syntax.

Further Reading

Chapter 18. with, match, and else Blocks

  • this chapter is about control flow structures that are especially powerful in python

  • magic of with statements and how the context manager gives safety

  • the magic of match statements and how that is expressive for languages (including custom DSLs)

What’s New in This Chapter

Context Managers and with Blocks

  • context managers exist to control a with statement

  • analogous to for statements controlled by iterators

  • MISCONCEPTIONS:

    this is correct: a finally block is always guaranteed to run, even if the try block has a return, sys.exit() or an exception raised.

    I just never paid attention to this.

    That’s why it’s good for cleanup: resource release / reverting or undoing temporary state changes

  • with blocks don’t define a new scope like how functions do, that’s why the names are accessible outside of the block

  • syntax:

    • in with open('mirror.py') as fp:,

      • evaluating the expression after the with gives the context manager object, i.e. open('mirror.py')

        the context manager object here is an instance of TextIOWrapper, this is what the open() function returns.

        the __enter__ method of TextIOWrapper returns self

      • the target variable is within the as clause is bound to something

        the as clause is optional

      • that something is the result returned by the __enter__ method of the context manager object (TextIOWrapper), which we determined was self (i.e. the context manager instance)

    • for any reason, when the control flow exists the with block, then __exit__ is called on the context manager object.

      This is NOT called on whatever that was returned by __enter__ and stored by the target variable.

  • example code Custom Context Manager for mirror

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    
      """
      A "mirroring" ``stdout`` context.
    
      While active, the context manager reverses text output to
      ``stdout``::
    
      # tag::MIRROR_DEMO_1[]
    
          >>> from mirror import LookingGlass
          >>> with LookingGlass() as what:  # <1>
          ...      print('Alice, Kitty and Snowdrop')  # <2>
          ...      print(what)
          ...
          pordwonS dna yttiK ,ecilA
          YKCOWREBBAJ
          >>> what  # <3>
          'JABBERWOCKY'
          >>> print('Back to normal.')  # <4>
          Back to normal.
    
      # end::MIRROR_DEMO_1[]
    
    
      This exposes the context manager operation::
    
      # tag::MIRROR_DEMO_2[]
    
          >>> from mirror import LookingGlass
          >>> manager = LookingGlass()  # <1>
          >>> manager  # doctest: +ELLIPSIS
          <mirror.LookingGlass object at 0x...>
          >>> monster = manager.__enter__()  # <2>
          >>> monster == 'JABBERWOCKY'  # <3>
          eurT
          >>> monster
          'YKCOWREBBAJ'
          >>> manager  # doctest: +ELLIPSIS
          >... ta tcejbo ssalGgnikooL.rorrim<
          >>> manager.__exit__(None, None, None)  # <4>
          >>> monster
          'JABBERWOCKY'
    
      # end::MIRROR_DEMO_2[]
    
      The context manager can handle and "swallow" exceptions.
    
      # tag::MIRROR_DEMO_3[]
    
          >>> from mirror import LookingGlass
          >>> with LookingGlass():
          ...      print('Humpty Dumpty')
          ...      x = 1/0  # <1>
          ...      print('END')  # <2>
          ...
          ytpmuD ytpmuH
          Please DO NOT divide by zero!
          >>> with LookingGlass():
          ...      print('Humpty Dumpty')
          ...      x = no_such_name  # <1>
          ...      print('END')  # <2>
          ...
          Traceback (most recent call last):
            ...
          NameError: name 'no_such_name' is not defined
    
      # end::MIRROR_DEMO_3[]
    
      """
    
    
      # tag::MIRROR_EX[]
      import sys
    
      class LookingGlass:
    
          def __enter__(self):  # <1>
              self.original_write = sys.stdout.write  # <2>
              sys.stdout.write = self.reverse_write  # <3>
              return 'JABBERWOCKY'  # <4>
    
          def reverse_write(self, text):  # <5>
              self.original_write(text[::-1])
    
          def __exit__(self, exc_type, exc_value, traceback):  # <6>
              sys.stdout.write = self.original_write  # <7>
              if exc_type is ZeroDivisionError:  # <8>
                  print('Please DO NOT divide by zero!')
                  return True  # <9>
              # <10> NOTE: if exit returns None or any falsy value, any exception raised in the =with= block will be propagated.
      # end::MIRROR_EX[]
    
  • enter and exit:

    1. __enter__ is called without any arguments other than the implicit self (which is the context manager instance)

    2. __exit__ is called with 3 arguments:

      • exc_type
      • exc_value the actual exception instance
      • traceback

      these 3 args received by self are the same as what happens if we call sys.exc_info() in the finally block of a try/finally. in the past, calling that was necessary to determine how to do the cleanup.

  • we can now do parenthesized context managers:

    1
    2
    3
    4
    5
    6
    
      with (
              CtxManager1() as example1,
              CtxManager2() as example2,
              CtxManager3() as example3,
      ):
              ...
    

    thanks to a new parser from python 3.10 onwards

  • The contextlib Utilities

    • first-reach before writing custom context managers
    • things that look useful:
      • using @contextmanager to build a context manager from a generator function

      • ContextDecorator to define class-based context managers

      • the async versions of all of them

  • Using @contextmanager

    • just implement a generator with a single yield statement that should produce whatever you want the __enter__ method to return

    • the yield splits the function body into two parts:

      • before yield: gets executed at the beginning of the with block when interpreter calls __enter__

      • after yield: gets executed when __exit__ is called at the end of the block

    • correct example:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      
        """
        A "mirroring" ``stdout`` context manager.
      
        While active, the context manager reverses text output to
        ``stdout``::
      
        # tag::MIRROR_GEN_DEMO_1[]
      
            >>> from mirror_gen import looking_glass
            >>> with looking_glass() as what:  # <1>
            ...      print('Alice, Kitty and Snowdrop')
            ...      print(what)
            ...
            pordwonS dna yttiK ,ecilA
            YKCOWREBBAJ
            >>> what
            'JABBERWOCKY'
      
        # end::MIRROR_GEN_DEMO_1[]
      
      
        This exposes the context manager operation::
      
        # tag::MIRROR_GEN_DEMO_2[]
      
            >>> from mirror_gen import looking_glass
            >>> manager = looking_glass()  # <1>
            >>> manager  # doctest: +ELLIPSIS
            <contextlib._GeneratorContextManager object at 0x...>
            >>> monster = manager.__enter__()  # <2>
            >>> monster == 'JABBERWOCKY'  # <3>
            eurT
            >>> monster
            'YKCOWREBBAJ'
            >>> manager  # doctest: +ELLIPSIS
            >...x0 ta tcejbo reganaMtxetnoCrotareneG_.biltxetnoc<
            >>> manager.__exit__(None, None, None)  # <4>
            False
            >>> monster
            'JABBERWOCKY'
      
        # end::MIRROR_GEN_DEMO_2[]
      
        The context manager can handle and "swallow" exceptions.
        The following test does not pass under doctest (a
        ZeroDivisionError is reported by doctest) but passes
        if executed by hand in the Python 3 console (the exception
        is handled by the context manager):
      
        # tag::MIRROR_GEN_DEMO_3[]
      
            >>> from mirror_gen_exc import looking_glass
            >>> with looking_glass():
            ...      print('Humpty Dumpty')
            ...      x = 1/0  # <1>
            ...      print('END')  # <2>
            ...
            ytpmuD ytpmuH
            Please DO NOT divide by zero!
      
        # end::MIRROR_GEN_DEMO_3[]
      
            >>> with looking_glass():
            ...      print('Humpty Dumpty')
            ...      x = no_such_name  # <1>
            ...      print('END')  # <2>
            ...
            Traceback (most recent call last):
              ...
            NameError: name 'no_such_name' is not defined
      
      
      
        """
      
      
        # tag::MIRROR_GEN_EXC[]
        import contextlib
        import sys
      
        @contextlib.contextmanager
        def looking_glass():
            original_write = sys.stdout.write
      
            def reverse_write(text):
                original_write(text[::-1])
      
            sys.stdout.write = reverse_write
            msg = ''  # <1>
            try:
                yield 'JABBERWOCKY'
            except ZeroDivisionError:  # <2>
                msg = 'Please DO NOT divide by zero!'
            finally:
                sys.stdout.write = original_write  # <3>
                if msg:
                    print(msg)  # <4>
        # end::MIRROR_GEN_EXC[]
      
      • it’s unavoiadable to use the try/except when using the yield when using @contextmanager since we never know what the users of the context managers will do.

      • GOTCHA:

        Generally,

        if __exit__ returns truthy even if there’s an exception, then the exception is suppressed. If it’s not truthy, then the exception is propagated out

        HOWEVER, with @contextmanager, the default behaviour is inverted. the __exit__ method provided by the decorator assumes any exception sent into the generator is handled and should be suppressed.

    • (flawed) example
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      
        """
        A "mirroring" ``stdout`` context manager.
      
        While active, the context manager reverses text output to
        ``stdout``::
      
        # tag::MIRROR_GEN_DEMO_1[]
      
            >>> from mirror_gen import looking_glass
            >>> with looking_glass() as what:  # <1>
            ...      print('Alice, Kitty and Snowdrop')
            ...      print(what)
            ...
            pordwonS dna yttiK ,ecilA
            YKCOWREBBAJ
            >>> what
            'JABBERWOCKY'
            >>> print('back to normal')
            back to normal
      
      
        # end::MIRROR_GEN_DEMO_1[]
      
      
        This exposes the context manager operation::
      
        # tag::MIRROR_GEN_DEMO_2[]
      
            >>> from mirror_gen import looking_glass
            >>> manager = looking_glass()  # <1>
            >>> manager  # doctest: +ELLIPSIS
            <contextlib._GeneratorContextManager object at 0x...>
            >>> monster = manager.__enter__()  # <2>
            >>> monster == 'JABBERWOCKY'  # <3>
            eurT
            >>> monster
            'YKCOWREBBAJ'
            >>> manager  # doctest: +ELLIPSIS
            >...x0 ta tcejbo reganaMtxetnoCrotareneG_.biltxetnoc<
            >>> manager.__exit__(None, None, None)  # <4>
            False
            >>> monster
            'JABBERWOCKY'
      
        # end::MIRROR_GEN_DEMO_2[]
      
        The decorated generator also works as a decorator:
      
      
        # tag::MIRROR_GEN_DECO[]
            >>> @looking_glass()
            ... def verse():
            ...     print('The time has come')
            ...
            >>> verse()  # <1>
            emoc sah emit ehT
            >>> print('back to normal')  # <2>
            back to normal
      
        # end::MIRROR_GEN_DECO[]
      
        """
      
      
        # tag::MIRROR_GEN_EX[]
        import contextlib
        import sys
      
        @contextlib.contextmanager  # <1>
        def looking_glass():
            original_write = sys.stdout.write  # <2>
      
            def reverse_write(text):  # <3>
                original_write(text[::-1])
      
            sys.stdout.write = reverse_write  # <4>
            yield 'JABBERWOCKY'  # <5>
            sys.stdout.write = original_write  # <6>
        # end::MIRROR_GEN_EX[]
      

    this is flawed because if an exception is raised in the body of the with block, the Python interpreter will catch it and raise it again in the yield expression inside looking_glass. But there is no error handling there, so the looking_glass generator will terminate without ever restoring the original sys.stdout.write method, leaving the system in an invalid state.

    Cleanup not done if there’s an exception raised within the with block.

    • TRICK: generators decorated with it can also be used as decorators themselves.

      happens because @contextmanager is implemented with the contextlib.ContextDecorator class.

      1
      2
      3
      4
      5
      
        @looking_glass()
        def verse():
            print("the time has come")
      
        verse() # returns in reverse
      

      Here, looking_glass does its job before and after the body of verse runs.

Pattern Matching in lis.py: A Case Study

  • Scheme Syntax
  • Imports and Types
  • The Parser
  • The Environment
  • The REPL
  • The Evaluator
  • Procedure: A Class Implementing a Closure
  • Using OR-patterns

Do This, Then That: else Blocks Beyond if

  • use cases: avoids the need to setup extra control flags or coding extra if statements

    • raising pattern:

      1
      2
      3
      4
      5
      
          for item in my_list:
                  if item.flavor == 'banana':
                          break
                  else:
                          raise ValueError('No banana flavor found!')
      
    • keep the try blocks lean in try/except

      The body of the try block should only have the statements that generate the expected exceptions.

      Instead of doing this:

      1
      2
      3
      4
      5
      
          try:
                  dangerous_call()
                  after_call()
          except OSError:
                  log('OSError...')
      

      We should do:

      1
      2
      3
      4
      5
      6
      
          try:
                  dangerous_call()
          except OSError:
                  log('OSError...')
          else:
                after_call()
      

    This gives clarity.

    try block is guarding against possible errors in dangerous_call() and not in after_call(). It’s also explicit that after_call() will only execute if no exceptions are raised in the try block.

    IDIOM: try/except is NOT only for error handling, it can be used for control flow as well. E.g. duck typing type checks.

    Follows EAFP

    EAFP

    Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

    this contrasts LBYL:

    LBYL

    Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements. In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping.” For example, the code, if key in mapping: return mapping[key] can fail if another thread removes key from mapping after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach.

  • else blocks apply to most control flow constructs, they are closely related to each other but very different from if/else

    LANGUAGE_LIMITIATION: GOTCHA: the else keyword is more of more of a “run this loop, then do that” instead of “Run this loop, otherwise do that”

    • for: else block will run only if and when the for loop runs to completion (i.e., not if the for is aborted with a break).

    • while: else block will run only if and when the while loop exits because the condition became falsy (i.e., not if the while is aborted with a break).

    • try: The else block will run only if no exception is raised in the try block.

      NOTE: “Exceptions in the else clause are not handled by the preceding except clauses.”

  • the else clause is also skipped if an exception or a return, break, or continue statement causes control to jump out of the main block of the compound statement.

Chapter Summary

  • an insight:

    subroutines are the most important invention in the history of computer languages. If you have sequences of operations like A;B;C and P;B;Q, you can factor out B in a subroutine. It’s like factoring out the filling in a sandwich: using tuna with different breads. But what if you want to factor out the bread, to make sandwiches with wheat bread, using a different filling each time? That’s what the with statement offers. It’s the complement of the subroutine.

Further Reading

Chapter 19. Concurrency Models in Python

  • concurrency vs parallelism; informally speaking

    • concurrency: dealing with multiple things done at once \(\implies\) it’s about structure of a solution

      the structure provided by concurrent solutions may help solve a problem (though not necessarily) in a parallelized fashion.

    • parallelism: doing lots of things at once \(\implies\) execution of the solution

      in this informal view, it’s a special case of concurrency, so parallel \(\implies\) concurrent

  • Python’s three approaches to concurrency: threads, processes, and native coroutine.

  • python’s fitness for concurrent and parallel computing is not limited to what the std lib provides. Python can scale.

What’s New in This Chapter

The Big Picture

  • factor of difficulty when writing concurrent programs: starting threads or processes is easy enough, but how do you keep track of them?

  • non concurrent programs, function call is blocking so useful for us

  • concurrent programs, non blocking, need to rely on some form of communication to get back results or errors

  • starting a thread is not cheap \(\implies\) amortize costs by using “worker” threads/procs \(\implies\) coordinating them is tough e.g. how to terminate?

    resolved using messages and queues still

  • coroutines are useful:

    • cheap to start
    • returns values
    • can be safely cancelled
    • specific area to catch exceptions

    But they have problems:

    • they’re handled by the async framework \(\implies\) hard to monitor as threads / procs

    • not good for CPU-intensive tasks

A Bit of Jargon

  • Concurrency: ability to handle multiple pending tasks (each eventually succeeding or failing) \(\implies\) can multitask

  • Parallelism: ability to compute multiple computations at the same time \(\implies\) multicore CPU, multiple CPU, GPU, multiple computers in a cluster

  • Execution Unit: objects executing concurrent code. Each has independent state and call stack

    Python execution units:

    1. processes

      • definition:

        instance of computer program while it’s running, using memory and CPU time-slices, all of which has its own private memory space

      • communication:

        objects communicated as raw bytes (so must be serialised) to pass from one proc to another. Communicated via pipes, sockets or memory-mapped files

      • spawning:

        can spawn child procs which are all isolated from the parent

      • scheduling:

        can be pre-emptively scheduled, supposed to be that a frozen proc won’t freeze the whole system

    2. threads

      • definition:

        execution unit within a single process

        consumes less resources than a process (if they both did the same job)

      • lifecycle:

        @ start of process, there’s a single thread. Procs can create more threads by calling OS APIs

      • Shared Memory management:

        Threads within a process share the same memory space \(\implies\) holds live Python object. Shared memory may be corrupted via read/write race conditions

      • Supervision:

        Also supervised by OS Scheduler, threads can enable pre-emptive multitasking

    3. coroutines

      • Definition:

        A function that can suspend itself and resume later.

        Classic Coroutines: built from generator functions

        Native Coroutines: defined using async def

      • Supervising coroutines:

        Typically, coroutines run within a single thread, supervised by an event loop that is in the same thread.

        Async frameworks provide an event loop and supporting libs that support nonblocking, coroutine-based I/)

      • Scheduling & Cooperative Multitasking:

        each coroutine must explicitly cede control with the yield or await keyword, so that another may proceed concurrently (but not in parallel).

        so if there’s any blocking code in a coroutine block, it would block the execution of the event loop and hence all other coroutines

        this contrasts preemptive multitasking supported by procs and threads.

        nevertheless, coroutine consumes less resources than a thread or proc doing the same job

  • Mechanisms useful to us:

    1. Queue:

      • purpose:

        allow separate execution units to exchange application data and control messages, such as error codes and signals to terminate.

      • implementation:

        depends on concurrency model:

        • python stdlib queue gives queue classes to support threads

          this also provides non-FIFO queues like LifoQueue and PriorityQueue

        • multiprocessing, asyncio packages have their own queue classes

          asyncio also provides non-FIFO queues like LifoQueue and PriorityQueue

    2. Lock:

      • purpose:

        Sync mechanism object for execution units to sync actions and avoid data corruption

        While updating a shared data structure, the running code should hold an associated lock.

      • implementation:

        depends on the concurrency model

        simplest form of a lock is just a mutex

    3. Contention: dispute over a limited asset

      • Resource Contention

        When multiple exeuction units try to access a shared resoruce (e.g. a lock / storage)

      • CPU Contention

        Compute-intensive procs / threads must wait for the OS scheduler to give them a share of CPU time

  • Processes, Threads, and Python’s Infamous GIL

    Here’s 10 points that consolidate info about python’s concurrency support:

    1. Instance of python interpreter \(\implies\) a process

      We can create additional Python processes \(\leftarrow\) use multiprocessing / concurrent.futures libraries

      We can also start sub-processes that run any other external programs. \(\leftarrow\) using subprocess library

    2. Interpreter runs user program and the GC in a single thread. We can start additional threads using threading / concurrent.futures libraries.

    3. GIL (Global Interpreter Lock) controls internal interpreter state (process state shared across threads) and access to object ref counts.

      Only one python thread can hold the GIL at any time \(\implies\) only one thread can execute Python code at any time, regardless the number of CPU cores.

      GIL is NOT part of the python language definition, it’s a CPython Implementation detail. This is critical for portability reasons.

    4. Default release of the GIL @ an interval:

      Prevents any particular thread from holding the GIL indefinitely.

      It’s the bytecode interpreter that pauses the current thread every 5ms default (can be changed) and the OS Scheduler picks who (which thread) gets access to the GIL next (could be the same thread that just released the GIL also).

    5. Python source code can’t control the GIL but extension / builtin written in C (or lang that interfaces at the Python/C API level) can release the GIL when it’s running time-consuming tasks.

    6. Every python stdlib that does a syscall (for kernel services) will release the GIL. This avoids contention of resources (mem as well as CPU)

      • functions that perform I/O operations (disk, network, sleep)

      • functions that are CPU-intensive (e.g. NumPy / SciPy), compressing/decompressing functions (e.g. zlib, bz2)

    7. GIL-free threads:

      • can only be launched by extensions that integrate at the Python/C API level

      • can’t change python objects generally, but can R/W to memory objects that support buffer protocols (bytearray, array.array, NumPy arrays)

      • GIL-free python is under experimentation at the moment (but not mainstream)

    8. Network I/O is GIL-insensitive

      GIL minimally affects network programming because Network I/O is higher latency than memory I/O.

      Each individual thread would have spent long time waiting anyway so interleaving their execution doesn’t majory impact the overall throughput.

    9. Compute-intensive python threads \(\implies\) will be slowed down by GIL contention.

      Better to use sequential, single-threaded code here. Faster and simpler.

    10. CPU-intensive python code to be ran on multiple cores requires multiple python processes.

    • Extra Notes:

      1. Coroutines are not affected by the GIL

        by default they share the same Python thread among themselves and with the supervising event loop provided by an asynchronous framework, therefore the GIL does not affect them.

      2. We technically can use multiple therads in an async program. This is not best practice.

        Typically, we have one coordinating thread running the event loops, which delegates to additional threads that carry out specific tasks.

        KIV “delegating tasks to executors”

A Concurrent Hello World

  • a demo of how python can “walk and chew gum”, using multiple approaches: multiprocessing, threading, asyncio
  • Spinner with Threads

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    
    # spinner_thread.py
    
    # credits: Adapted from Michele Simionato's
    # multiprocessing example in the python-list:
    # https://mail.python.org/pipermail/python-list/2009-February/675659.html
    
    # tag::SPINNER_THREAD_TOP[]
    import itertools
    import time
    from threading import Thread, Event
    
    def spin(msg: str, done: Event) -> None:  # <1> this fn runs in a separate thread, Event instance is for syncing of threads
        for char in itertools.cycle(r'\|/-'):  # <2> infinite loop because infinite iterator
            status = f'\r{char} {msg}'  # <3> the carriage return TRICK
            print(status, end='', flush=True)
            if done.wait(.1):  # <4> ??? the timeout value sets the "framerate" of the animation (0.1s => 10FPS)
                break  # <5> break inf loop
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')  # <6> clears the status line
    
    def slow() -> int: # called by the main thread
        time.sleep(3)  # <7> this is a blocking syscall, so GIL is released, which will allow other threads to be executed
        return 42
    # end::SPINNER_THREAD_TOP[]
    
    # tag::SPINNER_THREAD_REST[]
    def supervisor() -> int:  # <1> eventually returns the result of =slow=
        done = Event()  # <2> to coordinate =main= and =spinner= thread
        spinner = Thread(target=spin, args=('thinking!', done))  # <3> spawn thread
        print(f'spinner object: {spinner}')  # <4> displays as <Thread(Thread-1, initial)> ; initial means the thread not started yet
        spinner.start()  # <5>
        result = slow()  # <6> call slow, blocks the =main= thread, while the secondary =spinner= thread still runs the animation
        done.set()  # <7> signals spin function to exit, terminates the fot loop inside the spin function
        spinner.join()  # <8> wait until spinner finishes (fork-join!)
        return result
    
    def main() -> None:
        result = supervisor()  # <9> just a didatic purpose, to make it similar to the asyncio version
        print(f'Answer: {result}')
    
    if __name__ == '__main__':
        main()
    
    # end::SPINNER_THREAD_REST[]
    

    Notes:

    • within slow(), time.sleep blocks the calling thread but releases the GIL, so other Python threads (in this case our secondary thread for spinner) can run.

    • spin and slow executed concurrently, the supervisor coordinates the threads using an instance of threading.Event

    • creating threads:

      create a new Thread, provide a function as the target keyword argument, and positional arguments to the target as a tuple passed via args spinner = Thread(target=spin, args=('thinking!', done)) # <3> spawn thread

      we can also pass in kwargs using kwargs named parameter to Thread constructor

    • threading.Event:

      1. Python’s simplest signalling mechanism to coordinate threads.

      2. Event instance has an internal boolean flag that starts as False. Calling Event.set() sets the flag to True.

        • when flag is False (unset):
          • if a thread calls Event.wait(), the thread is blocked until another thread calls Event.set(). When this happens, Event.wait() returns True

          • If timeout is provided Event.wait(s), the call returns False when timeout elapses.

            As soon as another thread calls Event.set() then the wait function will return True.

    • TRICK: for text-mode animation: move the cursor back to the start of the line with the carriage return ASCII control character ('\r').

  • Spinner with Processes

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    
    # spinner_proc.py
    
    # credits: Adapted from Michele Simionato's
    # multiprocessing example in the python-list:
    # https://mail.python.org/pipermail/python-list/2009-February/675659.html
    
    # tag::SPINNER_PROC_IMPORTS[]
    import itertools
    import time
    from multiprocessing import Process, Event  # <1>
    from multiprocessing import synchronize     # <2> this import supports the type hinting for the Event
    
    def spin(msg: str, done: synchronize.Event) -> None:  # <3> attention to the typehint
    # end::SPINNER_PROC_IMPORTS[]
        for char in itertools.cycle(r'\|/-'):
            status = f'\r{char} {msg}'
            print(status, end='', flush=True)
            if done.wait(.1):
                break
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')
    
    def slow() -> int:
        time.sleep(3)
        return 42
    
    # tag::SPINNER_PROC_SUPER[]
    def supervisor() -> int:
        done = Event()
        spinner = Process(target=spin,               # <4>
                          args=('thinking!', done))
        print(f'spinner object: {spinner}')          # <5> displays <Process name='Process-1' parent=14868 initial> so it tells you the PID and the initial state.
        spinner.start()
        result = slow()
        done.set()
        spinner.join()
        return result
    # end::SPINNER_PROC_SUPER[]
    
    def main() -> None:
        result = supervisor()
        print(f'Answer: {result}')
    
    
    if __name__ == '__main__':
        main()
    
    • multiprocessing package supports running concurrent tasks in separate Python processes instead of threads.

      each instance has its own python interpreter, procs will be working in the background.

      Each proc has its own GIL \(\implies\) we can exploit our multicore CPU well because of this (depends on the OS scheduler though)

    • multiprocessing API emulates the threading API \(\implies\) can easily convert between them.

      Comparing multiprocessing and therading APIs

      • similarities

        1. Event objects are similar in how they function with the bit setting / unsetting

        2. Event objects can wait on timeouts

      • differences:

        1. Event is of different type between them multiprocessing.Event is a function (not a class like threading.Event)

        2. multiprocessing has a larger API because it’s more complex

          e.g. python objects that would need to be communicated across process need to be serialized/deserialized because it’s an OS-level isolation (of processes). This adds overhead.

          the Event state is the only cross-proccess state being shared, it’s implemented via an OS semaphore

          memory sharing can be done via multiprocessing.shared_memory. Only raw bytes, can use a ShareableList (mutable sequence) with a fixed number of items of some primitives up to 10MB per item.

  • Spinner with Coroutines

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    
    # spinner_async.py
    
    # credits: Example by Luciano Ramalho inspired by
    # Michele Simionato's multiprocessing example in the python-list:
    # https://mail.python.org/pipermail/python-list/2009-February/675659.html
    
    # tag::SPINNER_ASYNC_TOP[]
    import asyncio
    import itertools
    
    async def spin(msg: str) -> None:  # <1> no need Event as a coordinating mechanism
        for char in itertools.cycle(r'\|/-'):
            status = f'\r{char} {msg}'
            print(status, flush=True, end='')
            try:
                await asyncio.sleep(.1)  # <2> this is a non-blocking pause
            except asyncio.CancelledError:  # <3> when the cancelled method is called on the task that is controlling this coroutine
                break
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')
    
    async def slow() -> int:
        await asyncio.sleep(3)  # <4> also uses the non blocking sleep
        return 42
    # end::SPINNER_ASYNC_TOP[]
    
    # tag::SPINNER_ASYNC_START[]
    def main() -> None:  # <1> only regular function here, rest are coroutines
        result = asyncio.run(supervisor())  # <2> supervisor coroutine will block the main function
        print(f'Answer: {result}')
    
    async def supervisor() -> int:  # <3> native coroutine definition
        spinner = asyncio.create_task(spin('thinking!'))  # <4> returns a Task, schedules the eventual execution of spin
        print(f'spinner object: {spinner}')  # <5> <Task pending name='Task-2' coro=<spin() running at /path/to/spinner_async.py:11>>
        result = await slow()  # <6> =await= calls slow, blocks =supervisor= until =slow= returns
        spinner.cancel()  # <7> Task.cancel() raises =CancelledError= inside the coro task
        return result
    
    if __name__ == '__main__':
        main()
    # end::SPINNER_ASYNC_START[]
    
    • who manages the event loop?

      • for threads and processes, it’s the OS Scheduler

      • for coroutines, it’s app-level event loop

        drives coroutines one by one, manages queue of pending coroutines, passes control back to corresponding coroutine when each event happens

        all of these execute in a single thread: event loop, library coroutines, user coroutines

        that’s why coroutines logic is blocking

    • Concurrency is achieved by control passing from one coroutine to another.

      Python code using asyncio has only one flow of execution, unless you’ve explicitly started additional threads or processes.

      means only one coroutine executes at any point in time.

      Concurrency is achieved by control passing from one coroutine to another. This happens when we use the await keyword.

    • Remember when using asyncio coroutines, if we ever need some time for NOOPs, to use non-blocking sleep (asyncio.sleep(DELAY)) instead of blocking sleep (time.sleep())

    • explaining the example

      • asyncio.run starts the event loop, drives the coroutine (supervisor) that sets other coroutines in motion.

        supervisor will block the main function until it’s done

        asyncio.run returns what supervisor returns

      • await calls slow, blocks supervisor until slow returns

        I think it’s easier to see it as a control flow handover to slow. That’s why it’s blocking and that’s why when the control flow returns, we carry on with the assignment operator.

      • Task.cancel() raises CancelledError inside the coro task

    • NOTE: if we directly invoke a coro like coro() it immediately returns (because it’s async) but doesn’t return the body of the coro function

      the coro needs to be driven by an event loop.

    • We see 3 ways to run a coro (driven by an event loop):

      1. asyncio.run(coro())

        • a regular function will call this

        • usually the first coro is the entry point, that supervisor

        • return value of run is whatever the body of coro returns

      2. asyncio.create_task(coro())

        • called from a coroutine, returns a Task instance. Task wraps the coro and provides methods to control and query its state.

        • schedules another coroutine to be eventually run

        • does not suspend current coroutine

      3. await coro()

        • transfers control from current coro to coro returned by coro()
        • suspends the current coro until the other coro returns
        • value of await expression is whatever the body of the coro returns
  • Supervisors Side-by-Side

    • asyncio.Task vs threading.Thread (roughly equivalent)

      • Task trives a coroutine object, Thread invokes a callable

      • yielding control: coroutine yields explicitly with await

      • we don’t instantiate Task objects ourselves , we get them by using asyncio.create_task()

      • explicit scheduling:

        • create_task gives a Task object that is already waiting to run, Thread instance must be explicitly told to run via start
      • Termination:

        • threads can’t be terminated from the outside, we can only pass in a signal (eg. setting done in Event)

        • tasks Task.cancel() can be cancelled from the outside, raises CancelledError at the await expression where the coro body is currently suspended

          this can happen because coros are always in-sync because only one of them is running at any time, that’s why the outside can come and cancel it vs outside suggesting to terminate via a signal.

    • Instead of holding locks to synchronize the operations of multiple threads, coroutines are “synchronized” by definition: only one of them is running at any time.

      coroutines, code is protected against interruption by default because we’re in charge of driving the event loop

The Real Impact of the GIL

  • Quick Quiz

    • the main question here is that are the mechanisms interruptable by the entity that coordinates the control flow.

    • processes are controlled by OS scheduler so this is interruptable \(\implies\) the multiprocessing version will still carry on as usual

    • threads are controlled by the OS scheduler as well and the GIL lock can be released at a default interval, so this is useful to us \(\implies\) the threading approach will not have a noticeable difference.

      this has negligible effect only because the number of threads were minimal (2). If any more, it may be visible.

    • the asyncio coroutine version will be blocked by this compute-intensive call.

      we can try doing this hack though: make the is_prime a coroutine and await asyncio.sleep(0) to yield control flow.

      This is slow though

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      
        # spinner_prime_async_nap.py
      
        # credits: Example by Luciano Ramalho inspired by
        # Michele Simionato's multiprocessing example in the python-list:
        # https://mail.python.org/pipermail/python-list/2009-February/675659.html
      
        import asyncio
        import itertools
        import math
        import functools
      
        # tag::PRIME_NAP[]
        async def is_prime(n):
            if n < 2:
                return False
            if n == 2:
                return True
            if n % 2 == 0:
                return False
      
            root = math.isqrt(n)
            for i in range(3, root + 1, 2):
                if n % i == 0:
                    return False
                if i % 100_000 == 1:
                    await asyncio.sleep(0)  # <1>
            return True
        # end::PRIME_NAP[]
      
      
        async def spin(msg: str) -> None:
            for char in itertools.cycle(r'\|/-'):
                status = f'\r{char} {msg}'
                print(status, flush=True, end='')
                try:
                    await asyncio.sleep(.1)
                except asyncio.CancelledError:
                    break
            blanks = ' ' * len(status)
            print(f'\r{blanks}\r', end='')
      
        async def check(n: int) -> int:
            return await is_prime(n)
      
        async def supervisor(n: int) -> int:
            spinner = asyncio.create_task(spin('thinking!'))
            print('spinner object:', spinner)
            result = await check(n)
            spinner.cancel()
            return result
      
        def main() -> None:
            n = 5_000_111_000_222_021
            result = asyncio.run(supervisor(n))
            msg = 'is' if result else 'is not'
            print(f'{n:,} {msg} prime')
      
        if __name__ == '__main__':
            main()
      

      Using await asyncio.sleep(0) should be considered a stopgap measure before you refactor your asynchronous code to delegate CPU-intensive computations to another process.

A Homegrown Process Pool

  • Process-Based Solution

    • starts a number of worker processes equal to the number of CPU cores, as determined by multiprocessing.cpu_count()
    • some overhead in spinning up processes and in inter-process communication
  • Understanding the Elapsed Times
  • Code for the Multicore Prime Checker

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    
    #!/usr/bin/env python3
    
    """
    procs.py: shows that multiprocessing on a multicore machine
    can be faster than sequential code for CPU-intensive work.
    """
    
    # tag::PRIMES_PROC_TOP[]
    import sys
    from time import perf_counter
    from typing import NamedTuple
    from multiprocessing import Process, SimpleQueue, cpu_count  # <1> use the SimpleQueue to build a queue
    from multiprocessing import queues  # <2> use queues.SimpleQueue for typing
    
    from primes import is_prime, NUMBERS
    
    class PrimeResult(NamedTuple):  # <3>
        n: int
        prime: bool
        elapsed: float
    
    JobQueue = queues.SimpleQueue[int]  # <4> TypeAlias for a simple queue to send numbers to the procs that will do the job
    
    ResultQueue = queues.SimpleQueue[PrimeResult]  # <5> TypeAlias for simple queue for building the results
    
    def check(n: int) -> PrimeResult:  # <6>
        t0 = perf_counter()
        res = is_prime(n)
        return PrimeResult(n, res, perf_counter() - t0)
    
    def worker(jobs: JobQueue, results: ResultQueue) -> None:  # <7> gets a queue to read from (jobs) and a queue to write to (results)
        while n := jobs.get():  # <8> uses n = 0 as the poison pill
            results.put(check(n))  # <9> primality check then enqueue the result
        results.put(PrimeResult(0, False, 0.0))  # <10> signals that the worker is done
    
    def start_jobs(
        procs: int, jobs: JobQueue, results: ResultQueue  # <11> procs: number of parallel prime checks
    ) -> None:
        for n in NUMBERS:
            jobs.put(n)  # <12> enqueue number to be checked
        for _ in range(procs):
            proc = Process(target=worker, args=(jobs, results))  # <13> Fork a child proc for each worker, runs until it fetches a 0 from jobs queue
            proc.start()  # <14> starting the child proc
            jobs.put(0)  # <15> poision pill it after starting, will be read after all the actual jobs get read
    # end::PRIMES_PROC_TOP[]
    
    # tag::PRIMES_PROC_MAIN[]
    def main() -> None:
        if len(sys.argv) < 2:  # <1>
            procs = cpu_count()
        else:
            procs = int(sys.argv[1])
    
        print(f'Checking {len(NUMBERS)} numbers with {procs} processes:')
        t0 = perf_counter()
        jobs: JobQueue = SimpleQueue()  # <2>
        results: ResultQueue = SimpleQueue()
        start_jobs(procs, jobs, results)  # <3> starts the workers
        checked = report(procs, results)  # <4>
        elapsed = perf_counter() - t0
        print(f'{checked} checks in {elapsed:.2f}s')  # <5>
    
    def report(procs: int, results: ResultQueue) -> int: # <6>
        checked = 0
        procs_done = 0
        while procs_done < procs:  # <7>
            n, prime, elapsed = results.get()  # <8>
            if n == 0:  # <9>
                procs_done += 1
            else:
                checked += 1  # <10>
                label = 'P' if prime else ' '
                print(f'{n:16}  {label} {elapsed:9.6f}s')
        return checked
    
    if __name__ == '__main__':
        main()
    # end::PRIMES_PROC_MAIN[]
    
    • when delegating computing to threads / procs, code doesn’t call the worker function directly

      the worker is driven by the thread or proc library

      the worker eventually produces a result that is stored somewhere

      worker coordination & result collection are common uses of queues in concurrent programming

    • IDIOM: loops, sentinels and poison pills:

      • worker function useful for showing common concurrent programming pattern:
        • we loop indefinitely while taking items from a queue and processing each with a fn that does the actual work (check)

        • we end the loop when the queue produces a sentinel value

          the sentinel value that shuts down a worker is often called a poison pill

          • TRICK/IDIOM: poison pilling to signal the worker to finish

            notice the use of the poison-pill in point 8 of the code above

          • common sentinels: (here’s a comment thread on sentinels)

            • None, but may not work if the data stream legitimately may produce None

            • object() is a common sentinel but Python objects must be serialised for IPC, so when we pickle.dump and pickle.load and object, the unpickled instance is distinct from the original and doesn’t compare equal.

            • ⭐️ ... Ellipsis builtin is a good option, it will survive serialisation without losing its identity.

    • Debugging concurrent code is always hard, and debugging multiprocessing is even harder because of all the complexity behind the thread-like façade.

  • Experimenting with More or Fewer Processes

    • typically after the number of cores available to us, we should expect runtime to increase because of CPU Contention
  • Thread-Based Nonsolution

    • Due to the GIL and the compute-intensive nature of is_prime, the threaded version is slower than the sequential code

      it gets slower as the number of threads increase, because of CPU contention and the cost of context switching.

      OS contention: all the stack frame changes required is what causes the extra overhead

    • KIV managing threads and processes using concurrent.futures (chapter 20) and doing async programming using asyncio (chapter 21)

Python in the Multicore World

  • GIL makes the interpreter faster when running on a single core, and its implementation simpler. It was a no-brainer when CPU performance didn’t hinge on concurrency.

  • Despite the GIL, Python is thriving in applications that require concurrent or parallel execution, thanks to libraries and software architectures that work around the limitations of CPython.

  • System Administration

    • use cases: manage hardware like NAS, use it for SDN (software defined networking), hacking

    • python scripts help with these tasks, commanding remote machines \(\implies\) aren’t really CPU bound operations \(\implies\) Threads & Coroutines are Good for this

      we can use the concurrent futures to perform the same operation on multiple remote machines at the same time without much complexity

  • Data Science

    • compute-intensive applications, supported by an ecosystem of libs that can leverage multicore machines, GPUs / distribued parallel computing in heterogeneous clusters
    • some libs:
      • project jupyter
      • tensorflow (Google) and pytorch (Facebook)
      • dask: parallel computing lib to cordinate work on clusters
  • Server-Side Web/Mobile Development

    • both for app caches and HTTP caches (CDNs)
  • WSGI Application Servers

    • WSGI a standard API for a Python framework or application to receive requests from an HTTP server and send responses to it.

      WSGI apps manage one or more procs running your application, maximising the use of available CPUs

    • main point: all of these application servers can potentially use all CPU cores on the server by forking multiple Python processes to run traditional web apps written in good old sequential code in Django, Flask, Pyramid, etc. This explains why it’s been possible to earn a living as a Python web developer without ever studying the threading, multiprocessing, or asyncio modules: the application server handles concurrency transparently.

  • Distributed Task Queues

    • Distributed Task Queues wrap a message queue and offer a high-level API for delegating tasks to workers, possibly running on different machines.

      use cases:

      • run background jobs

      • trigger jobs after responding to the web request

      • async retries to ensure something is done

      • scheduled jobs

      e.g. Django view handler produces job requests which are put in the queue to be consumed by one or more PDF rendering processes

    • Supports horizontal scalability

      producers and consumers are decoupled

    • I’ve used Celery before!!

Chapter Summary

  • the demo on the effect of the GIL

    demonstrated graphically that CPU-intensive functions must be avoided in asyncio, as they block the event loop.

  • the prime demo highlighted the difference between multiprocessing and threading, proving that only processes allow Python to benefit from multicore CPUs.

    GIL makes threads worse than sequential code for heavy computations.

Further Reading

  • The GIL
  • Concurrency and Scalability Beyond Python

Chapter 20. Concurrent Executors

  • concurrent.futures.Executor classes that encapsulate the pattern of “spawning a bunch of independent threads and collecting the results in a queue,” described by Michele Simionato.

    can be used with threads as well as processes

  • introduces futures, similar to JS promises. futures are the low level objects here

  • this chapter is more demo, less theoretical

What’s New in This Chapter

Concurrent Web Downloads

  • the concurrent scripts are about 5x faster

  • typically when well done, concurrent scripts can outpace sequential ones by a factor of 20x or more

  • TRICK: I didn’t know that the HTTPX library is more modern and the go-to vs requests lib. HTTPX gives both async and sync functions but requests will only give sync versions.

  • for server-side, servers that may be hit by many clients, there is a difference between what concurrency primitive we use (threading vs coroutines):

    coroutines scale better because they use much less memory than threads, and also reduce the cost of context switching

  • A Sequential Download Script

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    
    #!/usr/bin/env python3
    
    """Download flags of top 20 countries by population
    
    Sequential version
    
    Sample runs (first with new domain, so no caching ever)::
    
        $ ./flags.py
        BD BR CD CN DE EG ET FR ID IN IR JP MX NG PH PK RU TR US VN
        20 downloads in 26.21s
        $ ./flags.py
        BD BR CD CN DE EG ET FR ID IN IR JP MX NG PH PK RU TR US VN
        20 downloads in 14.57s
    
    
    """
    
    # tag::FLAGS_PY[]
    import time
    from pathlib import Path
    from typing import Callable
    
    import httpx  # <1> non stdlib import, conventionally comes after stdlib imports
    
    POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
                'MX PH VN ET EG DE IR TR CD FR').split()  # <2>
    
    BASE_URL = 'https://www.fluentpython.com/data/flags'  # <3>
    DEST_DIR = Path('downloaded')                         # <4>
    
    def save_flag(img: bytes, filename: str) -> None:     # <5> saving bytes to file
        (DEST_DIR / filename).write_bytes(img)
    
    def get_flag(cc: str) -> bytes:  # <6> downloads the thing, returns byte contents of the response
        url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
        resp = httpx.get(url, timeout=6.1,       # <7> good to have timeouts if we are making blocking calls like in this demo
                         follow_redirects=True)  # <8>
        resp.raise_for_status()  # <9> prevents silent failures because of non 2XX responses
        return resp.content
    
    def download_many(cc_list: list[str]) -> int:  # <10> sequential version, to be compared across the other examples
        for cc in sorted(cc_list):                 # <11> to observe that the order will be preserved
            image = get_flag(cc)
            save_flag(image, f'{cc}.gif')
            print(cc, end=' ', flush=True)         # <12> the flush is to flush the print buffer
        return len(cc_list)
    
    def main(downloader: Callable[[list[str]], int]) -> None:  # <13> allows the downloader to be injectable, to be used for the other examples
        DEST_DIR.mkdir(exist_ok=True)                          # <14> create dir if necessary
        t0 = time.perf_counter()                               # <15>
        count = downloader(POP20_CC)
        elapsed = time.perf_counter() - t0
        print(f'\n{count} downloads in {elapsed:.2f}s')
    
    if __name__ == '__main__':
        main(download_many)     # <16>
    # end::FLAGS_PY[]
    
  • Downloading with concurrent.futures

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    
    #!/usr/bin/env python3
    
    """Download flags of top 20 countries by population
    
    ThreadPoolExecutor version
    
    Sample run::
    
        $ python3 flags_threadpool.py
        DE FR BD CN EG RU IN TR VN ID JP BR NG MX PK ET PH CD US IR
        20 downloads in 0.35s
    
    """
    
    # tag::FLAGS_THREADPOOL[]
    from concurrent import futures
    
    from flags import save_flag, get_flag, main  # <1> reusing things
    
    def download_one(cc: str):  # <2> single downloader, this is what each worker will execute
        image = get_flag(cc)
        save_flag(image, f'{cc}.gif')
        print(cc, end=' ', flush=True)
        return cc
    
    def download_many(cc_list: list[str]) -> int:
        with futures.ThreadPoolExecutor() as executor:         # <3> ThreadPoolExecutor is the context manager here, exit method will be blocking until all threads are done
            res = executor.map(download_one, sorted(cc_list))  # <4> map is similar in style as map builtin, returns a generator that we have to iterate to get the value from each function call
    
        return len(list(res))                                  # <5>
    
    if __name__ == '__main__':
        main(download_many)  # <6>
    # end::FLAGS_THREADPOOL[]
    
    • The context manager is ThreadPoolExecutor, the executor.__exit__ method will call executor.shutdown(wait=True) and this is blocking until all the threads are done.

    • executor.map() similar to map builtin,

      the function is called concurrently from multiple threads

      it returns a generator that we need to iterate to retrieve the value returned by each function call

      any exceptions from a particular call will also be within this.

    • concurrent.futures makes it easy for us to add concurrent execution atop legacy sequential code

    • Other useful args to ThreadPoolExecutor:

      • max_workers

        the default is max_workers = min(32, os.cpu_count() + 4) the extra ones are for I/O-BOUND tasks

        Also it will try to reuse idle workers instead of using new workers. (lmao meeting rooms II leetcode question be like)

  • Where Are the Futures?

    • purpose: an instance of either Future class represents a deferred computation that may or may not have completed.

      like Promise in JS

      both async frameworks give us futures: concurrent.futures.Future and asyncio.Future

    • allows us to put them in queues and check if they’re done

      HOWEVER, it is the job of the concurrency framework to handle futures, WE DON’T create them directly. This is because a future represents something that will eventually run, so it must be scheduled to run and that’s the role of the framework

      e.g. Executor.submit(<callable>) does the scheduling and returns a Future

    • Who can change the state of a future?

      Only the concurrency framework, never the application code.

      We are NOT in control of the state of a future.

    • push/pull method to determine completion:

      pull: Future.done() where the applogic keeps polling

      push: Future.add_done_callback() to register a callback that will be invoked when the future is done. NOTE: the callback callable will run in the same worker thread or process that ran the function wrapped in the future.

    • futures have a result()

      • when done, it works the same for both libs

      • when not done, it works differently for the two libs:

        =concurrency.futures.Future: calling f.result() will block the caller’s thread until the result is ready (we can pass a timeout to avoid infinite blocking)

    • demo:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      
        #!/usr/bin/env python3
      
        """Download flags of top 20 countries by population
      
        ThreadPoolExecutor example with ``as_completed``.
        """
        from concurrent import futures
      
        from flags import main
        from flags_threadpool import download_one
      
      
        # tag::FLAGS_THREADPOOL_AS_COMPLETED[]
        def download_many(cc_list: list[str]) -> int:
            cc_list = cc_list[:5]  # <1> smaller sample
            with futures.ThreadPoolExecutor(max_workers=3) as executor:  # <2> attempt to see pending futures in the output
                to_do: list[futures.Future] = []
                for cc in sorted(cc_list):  # <3>
                    future = executor.submit(download_one, cc)  # <4> schedules the  callable to be executed, returns a future representing this pending operation
                    to_do.append(future)  # <5> just storing it for inspection
                    print(f'Scheduled for {cc}: {future}')  # <6> we'll see something like this: Scheduled for BR: <Future at 0x100791518 state=running>
      
                for count, future in enumerate(futures.as_completed(to_do), 1):  # <7> yields futures as they are completed
                    res: str = future.result()  # <8> retrieving the result
                    print(f'{future} result: {res!r}')  # <9> will look something like this: IN <Future at 0x101807080 state=finished returned str> result: 'IN'
      
            return count
        # end::FLAGS_THREADPOOL_AS_COMPLETED[]
      
        if __name__ == '__main__':
            main(download_many)
      
      In this example, because we’re getting the futures from as_completed, when we call future.result(), it will never be blocking.

Launching Processes with concurrent.futures

  • Both ProcessPoolExecutor and ThreadPoolExecutor implement the Executor interface

    this allows us to switch from thread-based to process-based concurrency using concurrent.futures

  • so we can use process-based primitives just like we can use thread-based primitives, we just have to call a different pool executor

  • main usecase for process-based is for CPU-intensive jobs

    Using process-based allows us to go around the GIL and use multiple CPU cores to simplify

    Remember processes use more memory and take longer to start than threads

    Main usecase for thread-based is I/O intensive applications.

  • Multicore Prime Checker Redux

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    
    #!/usr/bin/env python3
    
    """
    proc_pool.py: a version of the proc.py example from chapter 20,
    but using `concurrent.futures.ProcessPoolExecutor`.
    """
    
    # tag::PRIMES_POOL[]
    import sys
    from concurrent import futures  # <1> no need to import the lower level abstractions (multiprocessing, SimpleQueue)
    from time import perf_counter
    from typing import NamedTuple
    
    from primes import is_prime, NUMBERS
    
    class PrimeResult(NamedTuple):  # <2> we don't need the queues or the worker function anymore
        n: int
        flag: bool
        elapsed: float
    
    def check(n: int) -> PrimeResult:
        t0 = perf_counter()
        res = is_prime(n)
        return PrimeResult(n, res, perf_counter() - t0)
    
    def main() -> None:
        if len(sys.argv) < 2:
            workers = None      # <3> setting to None allows the ProcessPoolExecutor decide for us
        else:
            workers = int(sys.argv[1])
    
        executor = futures.ProcessPoolExecutor(workers)  # <4> built executor so that we have access the acutal workers selected
        actual_workers = executor._max_workers  # type: ignore  # <5>
    
        print(f'Checking {len(NUMBERS)} numbers with {actual_workers} processes:')
    
        t0 = perf_counter()
    
        numbers = sorted(NUMBERS, reverse=True)  # <6>
        with executor:  # <7> we use the executor as the context manager
            for n, prime, elapsed in executor.map(check, numbers):  # <8> returns PrimeResult instances that =check= returns in the same order as the numbers argument
                label = 'P' if prime else ' '
                print(f'{n:16}  {label} {elapsed:9.6f}s')
    
        time = perf_counter() - t0
        print(f'Total time: {time:.2f}s')
    
    if __name__ == '__main__':
        main()
    # end::PRIMES_POOL[]
    
    • the use of executor.map() will block until all child processes are done. It preserves the order in which they were spawned.

      blocking overall but not individually, that’s why the rest return almost instantly

Experimenting with Executor.map

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
Experiment with ``ThreadPoolExecutor.map``
"""
# tag::EXECUTOR_MAP[]
from time import sleep, strftime
from concurrent import futures

def display(*args):  # <1> just echoes back with timestamp
    print(strftime('[%H:%M:%S]'), end=' ')
    print(*args)

def loiter(n):  # <2>
    msg = '{}loiter({}): doing nothing for {}s...'
    display(msg.format('\t'*n, n, n))
    sleep(n)
    msg = '{}loiter({}): done.'
    display(msg.format('\t'*n, n))
    return n * 10  # <3>

def main():
    display('Script starting.')
    executor = futures.ThreadPoolExecutor(max_workers=3)  # <4> 3 threads
    results = executor.map(loiter, range(5))  # <5> the first 3 tasks will start immediately (nonblocking)
    display('results:', results)  # <6>
    display('Waiting for individual results:')
    for i, result in enumerate(results):  # <7>
        display(f'result {i}: {result}')

if __name__ == '__main__':
    main()
# end::EXECUTOR_MAP[]
  • the display will be seen to get updated incrementally.

  • enumerate call in the for loop will implicitly invoke next(results), which in turn will invoke _f.result() on the (internal) _f future representing the first call, loiter(0)

    the _f.result() will block unti the future is done

  • Executor.map() will block until all the jobs are done.

    Alternatively, to make it more JIT, we can use Executor.submit and futures.as_completed

    TRICK : This is more flexible than executor.map because you can submit different callables and arguments, while executor.map is designed to run the same callable on the different arguments.

  • TRICK: we can pass futures to futures.as_completed such that the futures come from different pool executors (including different type of pool executors)

Downloads with Progress Display and Error Handling

  • common functions just a reference on the support code
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    
      """Utilities for second set of flag examples.
      """
    
      import argparse
      import string
      import sys
      import time
      from collections import Counter
      from enum import Enum
      from pathlib import Path
    
      DownloadStatus = Enum('DownloadStatus', 'OK NOT_FOUND ERROR')
    
      POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
                  'MX PH VN ET EG DE IR TR CD FR').split()
    
      DEFAULT_CONCUR_REQ = 1
      MAX_CONCUR_REQ = 1
    
      SERVERS = {
          'REMOTE': 'https://www.fluentpython.com/data/flags',
          'LOCAL':  'http://localhost:8000/flags',
          'DELAY':  'http://localhost:8001/flags',
          'ERROR':  'http://localhost:8002/flags',
      }
      DEFAULT_SERVER = 'LOCAL'
    
      DEST_DIR = Path('downloaded')
      COUNTRY_CODES_FILE = Path('country_codes.txt')
    
    
      def save_flag(img: bytes, filename: str) -> None:
          (DEST_DIR / filename).write_bytes(img)
    
    
      def initial_report(cc_list: list[str],
                         actual_req: int,
                         server_label: str) -> None:
          if len(cc_list) <= 10:
              cc_msg = ', '.join(cc_list)
          else:
              cc_msg = f'from {cc_list[0]} to {cc_list[-1]}'
          print(f'{server_label} site: {SERVERS[server_label]}')
          plural = 's' if len(cc_list) != 1 else ''
          print(f'Searching for {len(cc_list)} flag{plural}: {cc_msg}')
          if actual_req == 1:
              print('1 connection will be used.')
          else:
              print(f'{actual_req} concurrent connections will be used.')
    
    
      def final_report(cc_list: list[str],
                       counter: Counter[DownloadStatus],
                       start_time: float) -> None:
          elapsed = time.perf_counter() - start_time
          print('-' * 20)
          plural = 's' if counter[DownloadStatus.OK] != 1 else ''
          print(f'{counter[DownloadStatus.OK]:3} flag{plural} downloaded.')
          if counter[DownloadStatus.NOT_FOUND]:
              print(f'{counter[DownloadStatus.NOT_FOUND]:3} not found.')
          if counter[DownloadStatus.ERROR]:
              plural = 's' if counter[DownloadStatus.ERROR] != 1 else ''
              print(f'{counter[DownloadStatus.ERROR]:3} error{plural}.')
          print(f'Elapsed time: {elapsed:.2f}s')
    
    
      def expand_cc_args(every_cc: bool,
                         all_cc: bool,
                         cc_args: list[str],
                         limit: int) -> list[str]:
          codes: set[str] = set()
          A_Z = string.ascii_uppercase
          if every_cc:
              codes.update(a+b for a in A_Z for b in A_Z)
          elif all_cc:
              text = COUNTRY_CODES_FILE.read_text()
              codes.update(text.split())
          else:
              for cc in (c.upper() for c in cc_args):
                  if len(cc) == 1 and cc in A_Z:
                      codes.update(cc + c for c in A_Z)
                  elif len(cc) == 2 and all(c in A_Z for c in cc):
                      codes.add(cc)
                  else:
                      raise ValueError('*** Usage error: each CC argument '
                                       'must be A to Z or AA to ZZ.')
          return sorted(codes)[:limit]
    
    
      def process_args(default_concur_req):
          server_options = ', '.join(sorted(SERVERS))
          parser = argparse.ArgumentParser(
              description='Download flags for country codes. '
                          'Default: top 20 countries by population.')
          parser.add_argument(
              'cc', metavar='CC', nargs='*',
              help='country code or 1st letter (eg. B for BA...BZ)')
          parser.add_argument(
              '-a', '--all', action='store_true',
              help='get all available flags (AD to ZW)')
          parser.add_argument(
              '-e', '--every', action='store_true',
              help='get flags for every possible code (AA...ZZ)')
          parser.add_argument(
              '-l', '--limit', metavar='N', type=int, help='limit to N first codes',
              default=sys.maxsize)
          parser.add_argument(
              '-m', '--max_req', metavar='CONCURRENT', type=int,
              default=default_concur_req,
              help=f'maximum concurrent requests (default={default_concur_req})')
          parser.add_argument(
              '-s', '--server', metavar='LABEL', default=DEFAULT_SERVER,
              help=f'Server to hit; one of {server_options} '
                   f'(default={DEFAULT_SERVER})')
          parser.add_argument(
              '-v', '--verbose', action='store_true',
              help='output detailed progress info')
          args = parser.parse_args()
          if args.max_req < 1:
              print('*** Usage error: --max_req CONCURRENT must be >= 1')
              parser.print_usage()
              # "standard" exit status codes:
              # https://stackoverflow.com/questions/1101957/are-there-any-standard-exit-status-codes-in-linux/40484670#40484670
              sys.exit(2)  # command line usage error
          if args.limit < 1:
              print('*** Usage error: --limit N must be >= 1')
              parser.print_usage()
              sys.exit(2)  # command line usage error
          args.server = args.server.upper()
          if args.server not in SERVERS:
              print(f'*** Usage error: --server LABEL '
                    f'must be one of {server_options}')
              parser.print_usage()
              sys.exit(2)  # command line usage error
          try:
              cc_list = expand_cc_args(args.every, args.all, args.cc, args.limit)
          except ValueError as exc:
              print(exc.args[0])
              parser.print_usage()
              sys.exit(2)  # command line usage error
    
          if not cc_list:
              cc_list = sorted(POP20_CC)[:args.limit]
          return args, cc_list
    
    
      def main(download_many, default_concur_req, max_concur_req):
          args, cc_list = process_args(default_concur_req)
          actual_req = min(args.max_req, max_concur_req, len(cc_list))
          initial_report(cc_list, actual_req, args.server)
          base_url = SERVERS[args.server]
          DEST_DIR.mkdir(exist_ok=True)
          t0 = time.perf_counter()
          counter = download_many(cc_list, base_url, args.verbose, actual_req)
          final_report(cc_list, counter, t0)
    
  • Error Handling in the flags2 Examples

    • sequential version Uses a sequential HTTPX client
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      
        #!/usr/bin/env python3
      
        """Download flags of countries (with error handling).
      
        Sequential version
      
        Sample run::
      
            $ python3 flags2_sequential.py -s DELAY b
            DELAY site: http://localhost:8002/flags
            Searching for 26 flags: from BA to BZ
            1 concurrent connection will be used.
            --------------------
            17 flags downloaded.
            9 not found.
            Elapsed time: 13.36s
      
        """
      
        # tag::FLAGS2_BASIC_HTTP_FUNCTIONS[]
        from collections import Counter
        from http import HTTPStatus
      
        import httpx
        import tqdm  # type: ignore  # <1>
      
        from flags2_common import main, save_flag, DownloadStatus  # <2> get the commons
      
        DEFAULT_CONCUR_REQ = 1
        MAX_CONCUR_REQ = 1
      
        def get_flag(base_url: str, cc: str) -> bytes:
            url = f'{base_url}/{cc}/{cc}.gif'.lower()
            resp = httpx.get(url, timeout=3.1, follow_redirects=True)
            resp.raise_for_status()  # <3> raises if HTTP status code not in range(200, 300)
            return resp.content
      
        def download_one(cc: str, base_url: str, verbose: bool = False) -> DownloadStatus:
            try:
                image = get_flag(base_url, cc)
            except httpx.HTTPStatusError as exc:  # <4> handles the 404 errors specifically
                res = exc.response
                if res.status_code == HTTPStatus.NOT_FOUND:
                    status = DownloadStatus.NOT_FOUND  # <5> replaces it with an internal download status
                    msg = f'not found: {res.url}'
                else:
                    raise  # <6> re-propagate any other errors other than 404
            else:
                save_flag(image, f'{cc}.gif')
                status = DownloadStatus.OK
                msg = 'OK'
      
            if verbose:  # <7> verbosity flag
                print(cc, msg)
      
            return status
        # end::FLAGS2_BASIC_HTTP_FUNCTIONS[]
      
        # tag::FLAGS2_DOWNLOAD_MANY_SEQUENTIAL[]
        def download_many(cc_list: list[str],
                          base_url: str,
                          verbose: bool,
                          _unused_concur_req: int) -> Counter[DownloadStatus]:
            counter: Counter[DownloadStatus] = Counter()  # <1> to tally the download outcomes
            cc_iter = sorted(cc_list)  # <2>
            if not verbose:
                cc_iter = tqdm.tqdm(cc_iter)  # <3> tqdm returns an iterator yielding the items in cc_iter and also animating the progress bar
            for cc in cc_iter:
                try:
                    status = download_one(cc, base_url, verbose)  # <4> successive calls to the singular function
                except httpx.HTTPStatusError as exc:  # <5> the non 404 errors handled here
                    error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                    error_msg = error_msg.format(resp=exc.response)
                except httpx.RequestError as exc:  # <6>
                    error_msg = f'{exc} {type(exc)}'.strip()
                except KeyboardInterrupt:  # <7> manging keyboard interrupts
                    break
                else:  # <8> clear the error msg if there's no error that came down
                    error_msg = ''
      
                if error_msg:
                    status = DownloadStatus.ERROR  # <9> local status check based on the internal enum
                counter[status] += 1           # <10>
                if verbose and error_msg:      # <11>
                    print(f'{cc} error: {error_msg}')
      
            return counter  # <12>
        # end::FLAGS2_DOWNLOAD_MANY_SEQUENTIAL[]
      
        if __name__ == '__main__':
            main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
      
  • Using futures.as_completed

    • threadpool

      Uses concurrent HTTP client based on futures.ThreadPoolExecutor to show error handling

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      
        #!/usr/bin/env python3
      
        """Download flags of countries (with error handling).
      
        ThreadPool version
      
        Sample run::
      
            $ python3 flags2_threadpool.py -s ERROR -e
            ERROR site: http://localhost:8003/flags
            Searching for 676 flags: from AA to ZZ
            30 concurrent connections will be used.
            --------------------
            150 flags downloaded.
            361 not found.
            165 errors.
            Elapsed time: 7.46s
      
        """
      
        # tag::FLAGS2_THREADPOOL[]
        from collections import Counter
        from concurrent.futures import ThreadPoolExecutor, as_completed
      
        import httpx
        import tqdm  # type: ignore
      
        from flags2_common import main, DownloadStatus
        from flags2_sequential import download_one  # <1>
      
        DEFAULT_CONCUR_REQ = 30  # <2> defaults for max num of concurrent requests, size of threadpool
        MAX_CONCUR_REQ = 1000  # <3> max num concurrent reqs
      
      
        def download_many(cc_list: list[str],
                          base_url: str,
                          verbose: bool,
                          concur_req: int) -> Counter[DownloadStatus]:
            counter: Counter[DownloadStatus] = Counter()
            with ThreadPoolExecutor(max_workers=concur_req) as executor:  # <4>
                to_do_map = {}  # <5> maps each Future instance (representing one download) with the cc for error reporting
                for cc in sorted(cc_list):  # <6> response order is more based on timing of the HTTP responses more  so than anything
                    future = executor.submit(download_one, cc,
                                             base_url, verbose)  # <7> each submission does the scheduling and returns a Future
                    to_do_map[future] = cc  # <8>  Future instances are hashable
                done_iter = as_completed(to_do_map)  # <9> returns an iterator that yields futures as each task is done
                if not verbose:
                    done_iter = tqdm.tqdm(done_iter, total=len(cc_list))  # <10> wrap the iterator within the progress bar
                for future in done_iter:  # <11> iterates on futures as they are completed
                    try:
                        status = future.result()  # <12> this could have been blocking but NOT in this case because it's handled by the as_completed()
                    except httpx.HTTPStatusError as exc:  # <13> error handling
                        error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                        error_msg = error_msg.format(resp=exc.response)
                    except httpx.RequestError as exc:
                        error_msg = f'{exc} {type(exc)}'.strip()
                    except KeyboardInterrupt:
                        break
                    else:
                        error_msg = ''
      
                    if error_msg:
                        status = DownloadStatus.ERROR
                    counter[status] += 1
                    if verbose and error_msg:
                        cc = to_do_map[future]  # <14>
                        print(f'{cc} error: {error_msg}')
      
            return counter
      
      
        if __name__ == '__main__':
           main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
        # end::FLAGS2_THREADPOOL[]
      
      • NOTE: Future instances are hashable, that’s why we can use it as keys within a dictionary

      • IDIOM: use a map to store futures to do follow up processing

        building a dict to map each future to other data that may be useful when the future is completed.

    • asyncio

      Concurrent HTTPX client

        1
        2
        3
        4
        5
        6
        7
        8
        9
       10
       11
       12
       13
       14
       15
       16
       17
       18
       19
       20
       21
       22
       23
       24
       25
       26
       27
       28
       29
       30
       31
       32
       33
       34
       35
       36
       37
       38
       39
       40
       41
       42
       43
       44
       45
       46
       47
       48
       49
       50
       51
       52
       53
       54
       55
       56
       57
       58
       59
       60
       61
       62
       63
       64
       65
       66
       67
       68
       69
       70
       71
       72
       73
       74
       75
       76
       77
       78
       79
       80
       81
       82
       83
       84
       85
       86
       87
       88
       89
       90
       91
       92
       93
       94
       95
       96
       97
       98
       99
      100
      101
      102
      103
      104
      105
      106
      
        #!/usr/bin/env python3
      
        """Download flags of countries (with error handling).
      
        asyncio async/await version
      
        """
        # tag::FLAGS2_ASYNCIO_TOP[]
        import asyncio
        from collections import Counter
        from http import HTTPStatus
        from pathlib import Path
      
        import httpx
        import tqdm  # type: ignore
      
        from flags2_common import main, DownloadStatus, save_flag
      
        # low concurrency default to avoid errors from remote site,
        # such as 503 - Service Temporarily Unavailable
        DEFAULT_CONCUR_REQ = 5
        MAX_CONCUR_REQ = 1000
      
        async def get_flag(client: httpx.AsyncClient,  # <1>
                           base_url: str,
                           cc: str) -> bytes:
            url = f'{base_url}/{cc}/{cc}.gif'.lower()
            resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2>
            resp.raise_for_status()
            return resp.content
      
        async def download_one(client: httpx.AsyncClient,
                               cc: str,
                               base_url: str,
                               semaphore: asyncio.Semaphore,
                               verbose: bool) -> DownloadStatus:
            try:
                async with semaphore:  # <3>
                    image = await get_flag(client, base_url, cc)
            except httpx.HTTPStatusError as exc:  # <4>
                res = exc.response
                if res.status_code == HTTPStatus.NOT_FOUND:
                    status = DownloadStatus.NOT_FOUND
                    msg = f'not found: {res.url}'
                else:
                    raise
            else:
                await asyncio.to_thread(save_flag, image, f'{cc}.gif')  # <5>
                status = DownloadStatus.OK
                msg = 'OK'
            if verbose and msg:
                print(cc, msg)
            return status
        # end::FLAGS2_ASYNCIO_TOP[]
      
        # tag::FLAGS2_ASYNCIO_START[]
        async def supervisor(cc_list: list[str],
                             base_url: str,
                             verbose: bool,
                             concur_req: int) -> Counter[DownloadStatus]:  # <1>
            counter: Counter[DownloadStatus] = Counter()
            semaphore = asyncio.Semaphore(concur_req)  # <2>
            async with httpx.AsyncClient() as client:
                to_do = [download_one(client, cc, base_url, semaphore, verbose)
                         for cc in sorted(cc_list)]  # <3>
                to_do_iter = asyncio.as_completed(to_do)  # <4>
                if not verbose:
                    to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5>
                error: httpx.HTTPError | None = None  # <6>
                for coro in to_do_iter:  # <7>
                    try:
                        status = await coro  # <8>
                    except httpx.HTTPStatusError as exc:
                        error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                        error_msg = error_msg.format(resp=exc.response)
                        error = exc  # <9>
                    except httpx.RequestError as exc:
                        error_msg = f'{exc} {type(exc)}'.strip()
                        error = exc  # <10>
                    except KeyboardInterrupt:
                        break
                    else:
                        error = None
      
                    if error:
                        status = DownloadStatus.ERROR  # <11>
                        if verbose:
                            url = str(error.request.url)  # <12>
                            cc = Path(url).stem.upper()   # <13>
                            print(f'{cc} error: {error_msg}')
                    counter[status] += 1
      
            return counter
      
        def download_many(cc_list: list[str],
                          base_url: str,
                          verbose: bool,
                          concur_req: int) -> Counter[DownloadStatus]:
            coro = supervisor(cc_list, base_url, verbose, concur_req)
            counts = asyncio.run(coro)  # <14>
      
            return counts
      
        if __name__ == '__main__':
            main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
        # end::FLAGS2_ASYNCIO_START[]
      

Chapter Summary

Further Reading

Chapter 21. Asynchronous Programming

  • async constructs

  • Objects supporting async constructs

    • includes other constructs enables by the async/await keywords: async generator functions, async comprehensions, async genexps

      these aren’t tied to asyncio!

  • async libraries like asyncio

What’s New in This Chapter

A Few Definitions

  • native coroutines

    only defined using async def

    delegation from coroutine to coroutine only done using await, not necessary that it MUST delegate

  • classic coroutines

    actually a generator function that consumes data (data that is sent to it via my_coro.send(data) calls)

    can delegate to other classic coroutines using yield from. Ref “Meaning of yield from”

    no longer supported by asyncio and doesn’t support await keyword

  • generator-based coroutines (decorated using @types.coroutine)

    a decorated generator function (@types.coroutine), which makes the generator compatible with await keyword

    this is NOT supportd by asyncio, but used in low-level code in other frameworks like Curio and Trio

  • async generator (function)

    generator function defined with async def that uses yield in its body

    returns an async generator object that provides __anext__, which is a coroutine method to retrieve the next item.

An asyncio Example: Probing Domains

  • async operations are interleaved \(\implies\) the total time is practically the same as the time for the single slowest DNS response, instead of the sum of the times of all responses.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env python3
import asyncio
import socket
from keyword import kwlist

MAX_KEYWORD_LEN = 4  # <1>


async def probe(domain: str) -> tuple[str, bool]:  # <2> returns tuple of domain name and bool
    loop = asyncio.get_running_loop()  # <3> have a ref to the =asyncio= event loop, so that we can use it
    try:
        await loop.getaddrinfo(domain, None)  # <4> await on coroutine that returns a 5-part tuple of parameters. We don't directly use that here because if it resolves means it's a DNS resolution
    except socket.gaierror: # get addr info err
        return (domain, False)
    return (domain, True)


async def main() -> None:  # <5> has to be a coroutine so that we can use =await= within it
    names = (kw for kw in kwlist if len(kw) <= MAX_KEYWORD_LEN)  # <6> gen
    domains = (f'{name}.dev'.lower() for name in names)  # <7> gen
    coros = [probe(domain) for domain in domains]  # <8> build list of coros
    for coro in asyncio.as_completed(coros):  # <9> generator that yields coroutines that return the results of the coros passed to it in the order they are completed (not order of submission), similar to =futures.as_completed=
        domain, found = await coro  # <10> this await is non-blocking because it's guarded by the as_completed above
        mark = '+' if found else ' '
        print(f'{mark} {domain}')


if __name__ == '__main__':
    asyncio.run(main())  # <11> starts the event loop until the loop exits
  • loop.getaddrinfo() is the async version of socket.getaddrinfo()

    this returns a 5-part tuples of params to connect to the given address using a socket

  • asyncio.get_running_loop is designed to be used from within coroutines.

    If no running event loops, then it raises a RuntimeError. The event loop should have already been started prior to execution reaching there.

  • for coro in asyncio.as_completed(coros):

    the asyncio.as_completed(coros) generator that yields coroutines that return the results of the coros passed to it in the order they are completed (not order of submission), similar to futures.as_completed

  • the await coro is non-blocking because it’s guarded by the as_completed above

    if coro raises an exception, then it gets re-raised here

  • event loop:

    • started using asyncio.run()

    • IDIOM: for scripts, the common pattern is to make the main function a coroutine as well. The main coroutine is driven with asyncio.run()

  • Guido’s Trick to Read Asynchronous Code

    • squint and pretend the async and await keywords are not there. If you do that, you’ll realize that coroutines read like plain old sequential functions.

New Concept: Awaitable

  • await expression:

    • uses the yield from implementation with an extra step of validating its argument

    • only accepts an awaitable

  • for \(\rightarrow\) iterables, await \(\rightarrow\) awaitables

  • from asyncio, we typically work with these awaitables:

    • a native coroutine object that we get by calling a native coroutine function e.g. coro() where coro is the coroutine function

    • asyncio.Task that we get when we create a task from a coroutine object to asyncio.create_task()

      • remember that the coro_obj = coro(), so the overall call is usually asyncio.creat_task(one_coro()), note the invocation of the native coroutine function

      • Whether to keep a handle to the task or not depends on whether we need to use it (e.g. to cancel the task or wait for it)

    • lower-level awaitables: (something we might encouter if we work with lower level abstractions)

      • an obj with __await__ method that returns an iterator (e.g. asyncio.Future, by the way, asyncio.Task <: asyncio.Future)

      • objs written in other langs that use the Python/C API with a tp_as_async.am_wait function, returning an iterator (similar to __await__ method)

      • soon to be deprecated: generator-based-coroutine objects

Downloading with asyncio and HTTPX

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/usr/bin/env python3

"""Download flags of top 20 countries by population

asyncio + aiottp version

Sample run::

    $ python3 flags_asyncio.py
    EG VN IN TR RU ID US DE CN MX JP BD NG ET FR BR PH PK CD IR
    20 flags downloaded in 1.07s
"""
# tag::FLAGS_ASYNCIO_TOP[]
import asyncio

from httpx import AsyncClient  # <1> have to install httpx

from flags import BASE_URL, save_flag, main  # <2>

async def download_one(client: AsyncClient, cc: str):  # <3> has to be a native coro so that we can await on get_flag
    image = await get_flag(client, cc)
    save_flag(image, f'{cc}.gif')
    print(cc, end=' ', flush=True)
    return cc

async def get_flag(client: AsyncClient, cc: str) -> bytes:  # <4> needs the client to make the http request
    url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
    resp = await client.get(url, timeout=6.1,
                                  follow_redirects=True)  # <5> get method also returns a ClientResponse that is an async context manager, the network I/O is drive async via the =asyncio= event loop
    return resp.read()  # <6> the body is just lazily fetched from the response object. This fully consumes the response body into memory.
# end::FLAGS_ASYNCIO_TOP[]

# tag::FLAGS_ASYNCIO_START[]
def download_many(cc_list: list[str]) -> int:    # <1> has t obe a plain function and not a coroutine (to be called by main function from the other modules)
    return asyncio.run(supervisor(cc_list))      # <2> event loop is executed, which drives the supervisor(cc_list) coroutine obj until that coroutine returns. Remember the coroutine fn was supervisor, which , when called, returns a coroutine object. Also this is a blocking call that blocks the event loop, until the coroutine returns.

async def supervisor(cc_list: list[str]) -> int:
    async with AsyncClient() as client:          # <3> an async context manager (AsyncClient) is being used here
        to_do = [download_one(client, cc)
                 for cc in sorted(cc_list)]      # <4> build list of coros here
        res = await asyncio.gather(*to_do)       # <5> pass the awaitables so that they can be gathered after completion, so that we get a list of results. Gathers in the order of submission of the coros.

    return len(res)                              # <6> supervisor returns length of list

if __name__ == '__main__':
    main(download_many)
# end::FLAGS_ASYNCIO_START[]
  • asyncio directly supports TCP and UDP, without relying on external packages

  • res = await asyncio.gather(*to_do):

    Here, we pass the awaitables so that they can be gathered after completion, so that we get a list of results. Gathers in the order of submission of the coros.

  • AsyncClient is the async context manager that is used here. It’s a context manager that has async setup and teardown functions KIV

    In this snippet of the get_flags coroutine:

    1
    2
    3
    4
    5
    
      async def get_flag(client: AsyncClient, cc: str) -> bytes:  # <4> needs the client to make the http request
          url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
          resp = await client.get(url, timeout=6.1,
                                        follow_redirects=True)  # <5> get method also returns a ClientResponse that is an async context manager, the network I/O is drive async via the =asyncio= event loop
          return resp.read()  # <6> the body is just lazily fetched from the response object. This fully consumes the response body into memory.
    
    • Implicit delegation of coroutines via async context managers:

      get method of an httpx.AsyncClient instance returns a ClientResponse object that is also an asynchronous context manager.

      this is an awaitable that returns a Response

      by the way, Response can also be used as a context manager when streaming! If it was, then resp.read() would have been an an I/O operation that may yield to the event loop again if it’s attempting to drain the response body stream from the socket

  • the await yields control flow to the event loop while the network I/O happens (DNS resolution, TCP connect, handshake, waiting for response headers). During that suspension, other tasks can run.

    so by the end of point 5, resp is a proper Response object and not a coroutine. The connection is ready.

  • LANG_LIMITATION: However, asyncio does not provide an asynchronous filesystem API at this time like Node.js does.

    there’s OS-level support for it (io_uring on Linux), but nothing that supports this for python’s stdlib/asyncio

  • The Secret of Native Coroutines: Humble Generators

    • classic vs native coroutines: the native ones don’t rely on a visible .send() call or yield expressions

    • mechanistic model for async programs and how they drive async libraries:

      inline image

      Here, we see how in an async program:

      • a user’s function starts the event loop, scheduling an initial coroutine with asyncio.run

      • Each user’s coroutine drives the next with an await expression, which is when the control flow is yielded to the next coroutine

        this forms a channel that enables communication between a library like HTTPX and the event loop.

        await chain eventually reaches a low-level awaitable, which returns a generator that the event loop can drive in response to events such as timers or network I/O. The low-level awaitables and generators at the end of these await chains are implemented deep into the libraries, are not part of their APIs, and may be Python/C extensions.

    • await borrows most of its implementation from yield from (classic coroutines), which also makes .send calls to drive coroutines.

    • functions like asyncio.gather and asyncio.create_task, you can start multiple concurrent await channels, enabling concurrent execution of multiple I/O operations driven by a single event loop, in a single thread.

  • The All-or-Nothing Problem

    • had to replace I/O functions with their async versions so that they could be activated with await or asyncio.create_task

    • if no choice, have to delegate to separate thread/proc

      If you can’t rewrite a blocking function as a coroutine, you should run it in a separate thread or process

Asynchronous Context Managers via async with

  • asynchronous context managers: objects implementing the __aenter__ and __aexit__ methods as coroutines.

Enhancing the asyncio Downloader

  • caution: asyncio vs threading asyncio can send requests faster, so more likely to get suspected of ddos by the HTTP server.
  • Using asyncio.as_completed and a Thread

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    
    #!/usr/bin/env python3
    
    """Download flags of countries (with error handling).
    
    asyncio async/await version
    
    """
    # tag::FLAGS2_ASYNCIO_TOP[]
    import asyncio
    from collections import Counter
    from http import HTTPStatus
    from pathlib import Path
    
    import httpx
    import tqdm  # type: ignore
    
    from flags2_common import main, DownloadStatus, save_flag
    
    # low concurrency default to avoid errors from remote site,
    # such as 503 - Service Temporarily Unavailable
    DEFAULT_CONCUR_REQ = 5
    MAX_CONCUR_REQ = 1000
    
    async def get_flag(client: httpx.AsyncClient,  # <1> similar to the sequential version, just that here it requires a client param
                       base_url: str,
                       cc: str) -> bytes:
        url = f'{base_url}/{cc}/{cc}.gif'.lower()
        resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2> we await the coroutine from client.get()
        resp.raise_for_status()
        return resp.content
    
    async def download_one(client: httpx.AsyncClient,
                           cc: str,
                           base_url: str,
                           semaphore: asyncio.Semaphore,
                           verbose: bool) -> DownloadStatus:
        try:
            async with semaphore:  # <3> semaphore as an asynchronous context manager so that the program as a whole is not blocked; only this coroutine is suspended when the semaphore counter is zero.
                image = await get_flag(client, base_url, cc)
        except httpx.HTTPStatusError as exc:  # <4> familiar error handling logic
            res = exc.response
            if res.status_code == HTTPStatus.NOT_FOUND:
                status = DownloadStatus.NOT_FOUND
                msg = f'not found: {res.url}'
            else:
                raise
        else:
            await asyncio.to_thread(save_flag, image, f'{cc}.gif')  # <5> FileSystem I/O, don't let it block us by running it in a thread
            status = DownloadStatus.OK
            msg = 'OK'
        if verbose and msg:
            print(cc, msg)
        return status
    # end::FLAGS2_ASYNCIO_TOP[]
    
    # tag::FLAGS2_ASYNCIO_START[]
    async def supervisor(cc_list: list[str],
                         base_url: str,
                         verbose: bool,
                         concur_req: int) -> Counter[DownloadStatus]:  # <1> since it's a coroutine, it can't be invoked directly from main.
        counter: Counter[DownloadStatus] = Counter()
        semaphore = asyncio.Semaphore(concur_req)  # <2> creates the semaphore to be shared across the coros we will have
        async with httpx.AsyncClient() as client:
            to_do = [download_one(client, cc, base_url, semaphore, verbose)
                     for cc in sorted(cc_list)]  # <3> list of coro objs, one per call to download_one coro fn
            to_do_iter = asyncio.as_completed(to_do)  # <4> get an iter, receives in the order of completion, allows the iter to be wrapped by tqdm
            if not verbose:
                to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5> wrap iter w tqdm
            error: httpx.HTTPError | None = None  # <6> init error
            for coro in to_do_iter:  # <7> iter over completed coro objs
                try:
                    status = await coro  # <8> this is a nonblocking await because implicitly guarded by the =as_completed=
                except httpx.HTTPStatusError as exc:
                    error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                    error_msg = error_msg.format(resp=exc.response)
                    error = exc  # <9> to preserve the exc
                except httpx.RequestError as exc:
                    error_msg = f'{exc} {type(exc)}'.strip()
                    error = exc  # <10> preserve the exc
                except KeyboardInterrupt:
                    break
                else:
                    error = None
    
                if error:
                    status = DownloadStatus.ERROR  # <11> user internal error enum
                    if verbose:
                        url = str(error.request.url)  # <12>
                        cc = Path(url).stem.upper()   # <13>
                        print(f'{cc} error: {error_msg}')
                counter[status] += 1
    
        return counter
    
    def download_many(cc_list: list[str],
                      base_url: str,
                      verbose: bool,
                      concur_req: int) -> Counter[DownloadStatus]:
        coro = supervisor(cc_list, base_url, verbose, concur_req)
        counts = asyncio.run(coro)  # <14> drives the event loop, passes coro to event loop and returns when the event loop ends.
    
        return counts
    
    if __name__ == '__main__':
        main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
    # end::FLAGS2_ASYNCIO_START[]
    
    • the asyncio.semaphore is being used as an asynchronous context manager so that the program as a whole is not blocked; only this coroutine is suspended when the semaphore counter is zero.
    • notice how we delegate the File I/O in point 5 to a threadpool provided by asyncio using asyncio.to_thread, we just await it and yield the control flow to allow other threads to carry on
  • Throttling Requests with a Semaphore

    • throwback to OS mods in school, semaphore numbered “mutex” \(\implies\) more flexibilty than just a binary mutex lock.

    • we can share the semaphore between multiple coroutines with a configured max number in order to throttle our Network I/O

    • why? because we should avoid spamming a server with too many concurrent requests \(\implies\) we need to throttle the Network I/O

    • previously, we did the throttling in a coarse manner by setting the max_workers for the download_many in the demo code

    • Python’s Semaphores

      • all the 3 different concurrency structures (threading, multiprocessing, asyncio) have their own semaphore classes

      • initial value set @ point of creating the semaphore, semaphore is passed to every coroutine that needs to rely on it to synchronize semaphore = asyncio.Semaphore(concur_req)

      • semaphore decrements when we await on .acquire() coroutine, increments when we call release() method (non blocking, not a coroutine)

      • if not ready (count = 0), =.acquire() suspends the awaiting coroutine until some other coroutine calls .release() on the same Semaphore, thus incrementing the counter.

      • asyncio.Semaphore used as an async context manager:

        • instead of using semaphore.acquire() and semaphore.release() directly, we can rely on the async context manager to acquire (Semaphore.__aenter__ coroutine method await for .acquire()) and release the semaphore (Semaphore.__aexit__ calls .release())

        • this guarantees that no more than concur_req instances of get_flags coroutines will be active at any time

  • Making Multiple Requests for Each Download

    • our objective now is to make 2 callbacks per country. In a sequential pattern, it would have been to just call one after the other. The async version isn’t directly the same.

      We can drive the asynchronous requests one after the other, sharing the local scope of the driving coroutine.

    • here’s the v3 using asyncio

      some changes:

      1. new coroutine get_country is a new coroutine for the .json fetch

      2. download_one we now use await to delegate to get_flag and the new get_country

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    
    #!/usr/bin/env python3
    
    """Download flags of countries (with error handling).
    
    asyncio async/await version
    
    """
    # tag::FLAGS2_ASYNCIO_TOP[]
    import asyncio
    from collections import Counter
    from http import HTTPStatus
    from pathlib import Path
    
    import httpx
    import tqdm  # type: ignore
    
    from flags2_common import main, DownloadStatus, save_flag
    
    # low concurrency default to avoid errors from remote site,
    # such as 503 - Service Temporarily Unavailable
    DEFAULT_CONCUR_REQ = 5
    MAX_CONCUR_REQ = 1000
    
    async def get_flag(client: httpx.AsyncClient,  # <1>
                       base_url: str,
                       cc: str) -> bytes:
        url = f'{base_url}/{cc}/{cc}.gif'.lower()
        resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2>
        resp.raise_for_status()
        return resp.content
    
    # tag::FLAGS3_ASYNCIO_GET_COUNTRY[]
    async def get_country(client: httpx.AsyncClient,
                          base_url: str,
                          cc: str) -> str:    # <1> returns string with country name
        url = f'{base_url}/{cc}/metadata.json'.lower()
        resp = await client.get(url, timeout=3.1, follow_redirects=True)
        resp.raise_for_status()
        metadata = resp.json()  # <2> is a python dict
        return metadata['country']  # <3> erturns the country name
    # end::FLAGS3_ASYNCIO_GET_COUNTRY[]
    
    # tag::FLAGS3_ASYNCIO_DOWNLOAD_ONE[]
    async def download_one(client: httpx.AsyncClient,
                           cc: str,
                           base_url: str,
                           semaphore: asyncio.Semaphore,
                           verbose: bool) -> DownloadStatus:
        try:
            async with semaphore:  # <1> hold the semaphore to =await= (delegate) to =get_flag=
                image = await get_flag(client, base_url, cc)
            async with semaphore:  # <2> hold the semaphore again to delegate to the next step
                country = await get_country(client, base_url, cc)
        except httpx.HTTPStatusError as exc:
            res = exc.response
            if res.status_code == HTTPStatus.NOT_FOUND:
                status = DownloadStatus.NOT_FOUND
                msg = f'not found: {res.url}'
            else:
                raise
        else:
            filename = country.replace(' ', '_')  # <3>
            await asyncio.to_thread(save_flag, image, f'{filename}.gif')
            status = DownloadStatus.OK
            msg = 'OK'
        if verbose and msg:
            print(cc, msg)
        return status
    # end::FLAGS3_ASYNCIO_DOWNLOAD_ONE[]
    
    # tag::FLAGS2_ASYNCIO_START[]
    async def supervisor(cc_list: list[str],
                         base_url: str,
                         verbose: bool,
                         concur_req: int) -> Counter[DownloadStatus]:  # <1>
        counter: Counter[DownloadStatus] = Counter()
        semaphore = asyncio.Semaphore(concur_req)  # <2>
        async with httpx.AsyncClient() as client:
            to_do = [download_one(client, cc, base_url, semaphore, verbose)
                     for cc in sorted(cc_list)]  # <3>
            to_do_iter = asyncio.as_completed(to_do)  # <4>
            if not verbose:
                to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5>
            error: httpx.HTTPError | None = None  # <6>
            for coro in to_do_iter:  # <7>
                try:
                    status = await coro  # <8>
                except httpx.HTTPStatusError as exc:
                    error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                    error_msg = error_msg.format(resp=exc.response)
                    error = exc  # <9>
                except httpx.RequestError as exc:
                    error_msg = f'{exc} {type(exc)}'.strip()
                    error = exc  # <10>
                except KeyboardInterrupt:
                    break
    
                if error:
                    status = DownloadStatus.ERROR  # <11>
                    if verbose:
                        url = str(error.request.url)  # <12>
                        cc = Path(url).stem.upper()   # <13>
                        print(f'{cc} error: {error_msg}')
                counter[status] += 1
    
        return counter
    
    def download_many(cc_list: list[str],
                      base_url: str,
                      verbose: bool,
                      concur_req: int) -> Counter[DownloadStatus]:
        coro = supervisor(cc_list, base_url, verbose, concur_req)
        counts = asyncio.run(coro)  # <14>
    
        return counts
    
    if __name__ == '__main__':
        main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
    # end::FLAGS2_ASYNCIO_START[]
    
    • NOTE: point 1 & 2 in download_one: it’s good practice to hold semaphores and locks for the shortest possible time.
    • One challenge is to know when you have to use await and when you can’t use it.

      The answer in principle is easy: you await coroutines and other awaitables, such as asyncio.Task instances.

      Reality is that the APIs can be confusingly named e.g. StreamWriter

Delegating Tasks to Executors

  • problem: unlike NodeJS where ALL I/O has async APIs, python doesn’t have async APIs for all I/O. Notably, File I/O is NOT async.

    This means that in our async code, file I/O can severly bottleneck performance if the main thread is blocked.

  • delegating to an executor is a good idea then

    • we can use asyncio.to_thread e.g. await asyncio.to_thread(save_flag, image, f'{cc}.gif')

      under the hood, it uses loop.run_in_executor, so the equivalent to the above statement would be:

      1
      2
      3
      
          loop = asyncio.get_running_loop() # gets a reference to the event loop
          loop.run_in_executor(None, save_flag, image, f'{cc}.gif')
          # 1st Arg: Executor to use. None => default => ThreadPoolExecutor (always available in asyncio event loop)
      

      when using run_in_executor, the 1st Arg is the Executor to use. None \(\implies\) default \(\implies\) ThreadPoolExecutor (always available in asyncio event loop)

      CAUTION: this accepts positional args, have to use functool.partial if we wish to use kwargs. Or just use the newer asyncio.to_thread which will accept kwargs.

    • IDIOM: this is a common pattern in async APIs:

      wrap blocking calls that are implementation details in coroutines using run_in_executor internally. That way, you provide a consistent interface of coroutines to be driven with await, and hide the threads you need to use for pragmatic reasons.

    • loop.run_in_executor’s explicit Executor allows us to use process-based approach for CPU-intensive tasks so that it’s a different python process and we avoid the GIL contention.

    • TRICK / IDIOM: prime the ProcessPoolExecutor in the supervisor and then pass it to the coroutines that need it to reduce the effect of the high startup costs

  • WARNING / LANG_LIMITATION: Coroutines that use executors give the pretense of cancellation because the underlying thread/proc has no cancellation mechanism.

    Using run_in_executor can produce hard-to-debug problems since cancellation doesn’t work the way one might expect. Coroutines that use executors give merely the pretense of cancellation: the underlying thread (if it’s a ThreadPoolExecutor) has no cancellation mechanism.

    For example, a long-lived thread that is created inside a run_in_executor call may prevent your asyncio program from shutting down cleanly:

    asyncio.run will wait for the executor to fully shut down before returning, and it will wait forever if the executor jobs don’t stop somehow on their own.

    My greybeard inclination is to want that function to be namedrun_in_executor_uncancellable.

Writing asyncio Servers

  • A FastAPI Web Service

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    
    from pathlib import Path
    from unicodedata import name
    
    from fastapi import FastAPI
    from fastapi.responses import HTMLResponse
    from pydantic import BaseModel
    
    from charindex import InvertedIndex
    
    STATIC_PATH = Path(__file__).parent.absolute() / 'static'  # <1> beautiful overloading of =/= for pathlib
    
    app = FastAPI(  # <2> defines the ASGI app, params are for autogen docs
        title='Mojifinder Web',
        description='Search for Unicode characters by name.',
    )
    
    class CharName(BaseModel):  # <3> pydantic schema for runtime type checking
        char: str
        name: str
    
    def init(app):  # <4> attach to app state for later use
        app.state.index = InvertedIndex()
        app.state.form = (STATIC_PATH / 'form.html').read_text()
    
    init(app)  # <5>
    
    @app.get('/search', response_model=list[CharName])  # <6> search endpoint, response_model uses the CharName pydantic model to describe the response format
    async def search(q: str):  # <7> non-path params within the coro signature
        chars = sorted(app.state.index.search(q))
        return ({'char': c, 'name': name(c)} for c in chars)  # <8> an iterable of dicts compatible with response_model schema => FastAPI can build the json response accoding to the response model that we supplied in the @app.get decorator
    
    @app.get('/', response_class=HTMLResponse, include_in_schema=False)
    def form():  # <9> can use regular functions to handle endpoints as well, not just coros
        return app.state.form
    
    # no main funcion  # <10>
    
    • endpoint handlers can be coros or plain functions like we see here.

    • there’s no main function, it’s loaded and driven by the ASGI server (uvicorn).

    • we don’t have return type hints here because we allow the pydantic schema to do the job

      this is like schema casting when defining changesets in elixir

      model is declared in this parameter instead of as a function return type annotation, because the path function may not actually return that response model but rather return a dict, database object or some other model, and then use the response_model to perform the field limiting and serialization.

      response_model in FastAPI + Pydantic plays the role of both serialization and field-whitelisting — taking arbitrary Python objects/dicts and producing clean, predictable outputs according to the model definition

    by the way the inverted index was implemened like so:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    
    #!/usr/bin/env python
    
    """
    Class ``InvertedIndex`` builds an inverted index mapping each word to
    the set of Unicode characters which contain that word in their names.
    
    Optional arguments to the constructor are ``first`` and ``last+1``
    character codes to index, to make testing easier. In the examples
    below, only the ASCII range was indexed.
    
    The `entries` attribute is a `defaultdict` with uppercased single
    words as keys::
    
        >>> idx = InvertedIndex(32, 128)
        >>> idx.entries['DOLLAR']
        {'$'}
        >>> sorted(idx.entries['SIGN'])
        ['#', '$', '%', '+', '<', '=', '>']
        >>> idx.entries['A'] & idx.entries['SMALL']
        {'a'}
        >>> idx.entries['BRILLIG']
        set()
    
    The `.search()` method takes a string, uppercases it, splits it into
    words, and returns the intersection of the entries for each word::
    
        >>> idx.search('capital a')
        {'A'}
    
    """
    
    import sys
    import unicodedata
    from collections import defaultdict
    from collections.abc import Iterator
    
    STOP_CODE: int = sys.maxunicode + 1
    
    Char = str
    Index = defaultdict[str, set[Char]]
    
    
    def tokenize(text: str) -> Iterator[str]:
        """return iterator of uppercased words"""
        for word in text.upper().replace('-', ' ').split():
            yield word
    
    
    class InvertedIndex:
        entries: Index
    
        def __init__(self, start: int = 32, stop: int = STOP_CODE):
            entries: Index = defaultdict(set)
            for char in (chr(i) for i in range(start, stop)):
                name = unicodedata.name(char, '')
                if name:
                    for word in tokenize(name):
                        entries[word].add(char)
            self.entries = entries
    
        def search(self, query: str) -> set[Char]:
            if words := list(tokenize(query)):
                found = self.entries[words[0]]
                return found.intersection(*(self.entries[w] for w in words[1:]))
            else:
                return set()
    
    
    def format_results(chars: set[Char]) -> Iterator[str]:
        for char in sorted(chars):
            name = unicodedata.name(char)
            code = ord(char)
            yield f'U+{code:04X}\t{char}\t{name}'
    
    
    def main(words: list[str]) -> None:
        if not words:
            print('Please give one or more words to search.')
            sys.exit(2)  # command line usage error
        index = InvertedIndex()
        chars = index.search(' '.join(words))
        for line in format_results(chars):
            print(line)
        print('─' * 66, f'{len(chars)} found')
    
    
    if __name__ == '__main__':
        main(sys.argv[1:])
    
  • An asyncio TCP Server (no deps, just asyncio streams)

    • this demo is one where we use plain TCP to comms with a telnet/netcat client and using asyncio directly without any external dependencies!
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    
    #!/usr/bin/env python3
    
    # tag::TCP_MOJIFINDER_TOP[]
    import asyncio
    import functools
    import sys
    from asyncio.trsock import TransportSocket
    from typing import cast
    
    from charindex import InvertedIndex, format_results  # <1> formatting useful for TUI via CLI telnet session
    
    CRLF = b'\r\n'
    PROMPT = b'?> '
    
    async def finder(index: InvertedIndex,          # <2> server expects a coro / function that only takes in teh reader and writer args. That's why we need to wrap it up in a partial
                     reader: asyncio.StreamReader,
                     writer: asyncio.StreamWriter) -> None:
        client = writer.get_extra_info('peername')  # <3> remote client addr
        while True:  # <4> handles a dialog until we get a control char (see break stmt below)
            writer.write(PROMPT)  # can't await!  # <5> this is not a CORO, just a plain function
            await writer.drain()  # must await!  # <6> flushes writer buffer, it's a coro that's why needs to be driven with =await=
            data = await reader.readline()  # <7> coro that returns bytes
            if not data:  # <8> no bytes => client closed the connection ==> break the loop
                break
            try:
                query = data.decode().strip()  # <9> byte to string decoding
            except UnicodeDecodeError:  # <10> replace with null char for simplicity (e.g. when keyboard interrupt then we get control bytes that can't be decoded into str)
                query = '\x00'
            print(f' From {client}: {query!r}')  # <11> log stmt
            if query:
                if ord(query[:1]) < 32:  # <12> kill loop if control or nullchar
                    break
                results = await search(query, index, writer)  # <13> delegate to searching coro
                print(f'   To {client}: {results} results.')  # <14>
    
        writer.close()  # <15> close the writer steram
        await writer.wait_closed()  # <16> wait for closing of stream
        print(f'Close {client}.')  # <17>log
    # end::TCP_MOJIFINDER_TOP[]
    
    # tag::TCP_MOJIFINDER_SEARCH[]
    async def search(query: str,  # <1> has to be a coro because we have to write to a StreamWriter and use its =.drain()= coro method
                     index: InvertedIndex,
                     writer: asyncio.StreamWriter) -> int:
        chars = index.search(query)  # <2> query inverted index
        lines = (line.encode() + CRLF for line  # <3> genexp gives char, name and CRLF
                    in format_results(chars))
        writer.writelines(lines)  # <4> SURPRISE! this is NOT a coro
        await writer.drain()      # <5> SURPRISE! this is a coro
        status_line = f'{"─" * 66} {len(chars)} found'  # <6> status line to be written
        writer.write(status_line.encode() + CRLF)
        await writer.drain()
        return len(chars)
    # end::TCP_MOJIFINDER_SEARCH[]
    
    # tag::TCP_MOJIFINDER_MAIN[]
    async def supervisor(index: InvertedIndex, host: str, port: int) -> None:
        server = await asyncio.start_server(    # <1> gets an instance of the server, creates and starts it so that it's ready to receive conns
            functools.partial(finder, index),   # <2> =client_connected_cb=, a cb that is either a fn/coro needs to be supplied a stream reader and stream writer
            host, port)                         # <3>
    
        socket_list = cast(tuple[TransportSocket, ...], server.sockets)  # <4> because typeshed type is outdated
        addr = socket_list[0].getsockname()
        print(f'Serving on {addr}. Hit CTRL-C to stop.')  # <5>
        await server.serve_forever()  # <6> suspends the supervisor. without this supervisor returns immediately
    
    def main(host: str = '127.0.0.1', port_arg: str = '2323'):
        port = int(port_arg)
        print('Building index.')
        index = InvertedIndex()                         # <7> index gets built
        try:
            asyncio.run(supervisor(index, host, port))  # <8> starts the event loop that will drive the supervisor coro
        except KeyboardInterrupt:                       # <9> catch CTRL-C
            print('\nServer shut down.')
    
    if __name__ == '__main__':
        main(*sys.argv[1:])
    # end::TCP_MOJIFINDER_MAIN[]
    
    • IDIOM @ finder point number 2;

      Use functools.partial to bind that parameter and obtain a callable that takes the reader and writer. Adapting user functions to callback APIs is the most common use case for functools.partial

    • how multiple clients can be served at once:

      While the event loop is alive, a new instance of the finder coroutine will be started for each client that connects to the server.

    • how the keyboard interrupt works

      the interrupt signal will cause the raising of KeyboardInterrupt exception from within the supervisor::server.serve_forever.

      event loop dies also.

      This propagates out into the main function that had been driving the event loop.

    • GOTCHA: StreamWriter.write is not a coro, StreamWriter.drain is a coro

      some of the I/O methods are coroutines and must be driven with await, while others are simple functions. For example, StreamWriter.write is a plain function, because it writes to a buffer. On the other hand, StreamWriter.drain — which flushes the buffer and performs the network I/O — is a coroutine, as is StreamReader.readline —but not StreamWriter.writelines!

Asynchronous Iteration and Asynchronous Iterables and using async for

  • async with \(\implies\) works with Async Context Managers

    async for \(\implies\) asynchronous iterables:

    • __aiter__ that returns an async iterator BUT __aiter__ is NOT as coro method, it’s a regular method

    async iterator provides __anext__ coro method that returns an awaitable, usually a coro object. Just like the sync counterparts, expected to implement __aiter__ which trivially returns self

  • Remember same point about NOT mixing iterables and iterators

  • example: aiopg async postgres driver :

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
      async def go():
              pool = await aiopg.create_pool(dsn)
    
              async with pool.acquire() as conn:
                      async with conn.cursor() as cur: # the cursor is the async iterator here
                              await cur.execute("SELECT 1")
                              ret = []
                              async for row in cur: # important to NOT block the event loop while cursor may be waiting for additional rows
                                      ret.append(row)
                              assert ret == [(1,)]
    
    • By implementing the cursor as an asynchronous iterator, aiopg may yield to the event loop at each __anext__ call, and resume later when more rows arrive from PostgreSQL.
  • Asynchronous Generator Functions

    • Implementing and Using an async generator

      • Implementing an Async Iterator

        • class-implementation for async iterator: implement a class with __anext__ and __aiter__

        • simpler way to implement an async iterator: as a generator function that is async \(\implies\) async generator

          write a function declared with async def and use yield in its body. This parallels how generator functions simplify the classic Iterator pattern.

      • Usage of async generators:

        • Async generators can be used with async for \(\Leftarrow\) driven by async for:

          • as a block statement
          • as async comprehensions
        • We can’t use typical for loops because async generators implement __aiter__ and NOT __iter__

      • Demo example

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        
          import asyncio
          import socket
          from collections.abc import Iterable, AsyncIterator
          from typing import NamedTuple, Optional
        
        
          class Result(NamedTuple):  # <1> convenience: easier to read and debug
              domain: str
              found: bool
        
        
          OptionalLoop = Optional[asyncio.AbstractEventLoop]  # <2> typealias to clean up the hinting below
        
        
          async def probe(domain: str, loop: OptionalLoop = None) -> Result:  # <3>
              if loop is None: # no current event loop handle in scope
                  loop = asyncio.get_running_loop()
              try:
                  await loop.getaddrinfo(domain, None)
              except socket.gaierror:
                  return Result(domain, False)
              return Result(domain, True)
        
        
          async def multi_probe(domains: Iterable[str]) -> AsyncIterator[Result]:  # <4> Async Generator function returns an async generator object, that's why it's typed like that
              loop = asyncio.get_running_loop()
              coros = [probe(domain, loop) for domain in domains]  # <5> list of proble coros
              for coro in asyncio.as_completed(coros):  # <6> this is a classic generator, that's why we can drive it using =for= and not =async for=
                  result = await coro  # <7> guarded by the =as_completed= not to worry that it will be actually blocking.
                  yield result  # <8> this is what makes multiproble an async generator
        
      • The result is yielded by multi_probe, which is what makes multi_probe an async generator

        Shortcut to the for loop:

        1
        2
        
            for coro in asyncio.as_completed(coros):
                    yield await coro
        
      • TRICK: The .invalid top-level domain is reserved for testing.

        see elaboration here:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        
            Yes, the statement is **true**.
        
            The **`.invalid` top-level domain (TLD) is reserved specifically for testing and use in examples or scenarios where a guaranteed invalid domain is needed**. It is defined as a special-use domain name by the Internet Engineering Task Force (IETF) in [RFC 2606 (1999)](https://www.rfc-editor.org/rfc/rfc2606.html) and officially reserved by the Internet Assigned Numbers Authority (IANA).
        
            ### Key points on `.invalid` TLD reservation:
        
            - The `.invalid` TLD **cannot appear in the global DNS root zone** to avoid conflicts with existing or future valid TLDs.
        ​    - It is intended to be used in tests, documentation, or example scenarios where domain names must be constructed clearly as invalid or guaranteed to not resolve.
        ​    - Alongside `.invalid`, other reserved TLDs for similar "safe" use are `.test`, `.example`, and `.localhost`.
        ​    - Using `.invalid` in software or test settings helps catch or demonstrate domain resolution failures without accidentally affecting real domains.
        ​    - Because of this reservation, any use of `.invalid` as a real domain name should not expect it to resolve on the public internet.
        
            ### Supporting authoritative references:
        
            - **RFC 2606 (Reserved Top Level DNS Names)** states:
              > "`.invalid` is intended for use in online construction of domain names that are sure to be invalid and which it is obvious at a glance are invalid."
        ​    - [Wikipedia: .invalid](https://en.wikipedia.org/wiki/.invalid) also confirms this reservation by IETF for such use.
        ​    - IANA maintains `.invalid` as one of the reserved special-use domain names unlikely to ever be delegated.
        
            ### Summary table
        
            | Domain         | Purpose                             | Delegated in global DNS? | Use Case                                         |
            |----------------|-----------------------------------|-------------------------|-------------------------------------------------|
            | `.invalid`     | Reserved for invalid/test domains | No                      | Testing, documentation, avoiding domain clashes |
            | `.test`        | Reserved for testing               | No                      | Test environments                               |
            | `.example`     | Reserved for examples              | No                      | Documentation and examples                       |
            | `.localhost`   | Reserved for loopback services     | No                      | Localhost network reference                      |
        
            ***
        
            In conclusion, your quoted **TRICK** that `.invalid` is a top-level domain reserved for testing is **correct and reflects Internet standards**.
        
            If you want, I can provide more background on reserved TLDs, best practices for using them in networking or development, or how they differ from other special-use or reserved names. Just let me know!
        
            [1] https://en.wikipedia.org/wiki/.invalid
            [2] https://skynethosting.net/blog/what-is-invalid-tlds/
            [3] https://datatracker.ietf.org/doc/rfc2606/
            [4] https://www.rfc-editor.org/rfc/rfc2606.html
            [5] https://domaintyper.com/invalid-domain
            [6] https://stackoverflow.com/questions/4128351/is-there-a-valid-domain-name-guaranteed-to-be-unreachable
            [7] https://circleid.com/posts/20090618_most_popular_invalid_tlds_should_be_reserved
            [8] https://news.ycombinator.com/item?id=15268822
            [9] https://en.wikipedia.org/wiki/Top-level_domain
        
      • Using the async generator:

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        
            #!/usr/bin/env python3
            import asyncio
            import sys
            from keyword import kwlist
        
            from domainlib import multi_probe
        
        
            async def main(tld: str) -> None:
                tld = tld.strip('.')
                names = (kw for kw in kwlist if len(kw) <= 4)  # <1>
                domains = (f'{name}.{tld}'.lower() for name in names)  # <2>
                print('FOUND\t\tNOT FOUND')  # <3>
                print('=====\t\t=========')
                async for domain, found in multi_probe(domains):  # <4> async iterate over the async generator
                    indent = '' if found else '\t\t'  # <5>
                    print(f'{indent}{domain}')
        
        
            if __name__ == '__main__':
                if len(sys.argv) == 2:
                    asyncio.run(main(sys.argv[1]))  # <6>
                else:
                    print('Please provide a TLD.', f'Example: {sys.argv[0]} COM.BR')
        
    • Async generators as context managers

      • Generators (sync and async versions) have one extra use unrelated to iteration: they can be made into context managers.

      • We can use the @asynccontextmanager decorator within the contextlib module

        Similar to its sync counterpart @contextmanager

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        
          from contextlib import asynccontextmanager
        
          @asynccontextmanager
          async def web_page(url): # the function to be decorated has to be an async generator
                  loop = asyncio.get_running_loop()
                  data = await loop.run_in_executor(
                          None, download_webpage, url) # we run in a separate thread in case this is a blocking function; keeps out event loop unblocked
        
                  yield data # this makes it an async generator
        
                  await loop.run_in_executor(None, update_stats, url)
        
          async with web_page('google.com') as data:
                  process(data)
        
        • Outcome

          similar to the sync version, all lines before the yield become the entry code, __aenter__ coro method of the async context manager that is built by the decorator. So, when control flow comes back to this, the value of data will be bound to the data target variable that is associated with the context manager below.

          All lines after yield become the __aexit__ coro method. Another possibly blocking call is delegated to the thread executor.

    • Asynchronous generators versus native coroutines

      • Similarities

        • async def for both
      • Differences

        • async generator has a yield in its body but not a native coroutine

        • async generator can ONLY have empty return statements BUT a naive coro may return a value other than None

        • Async generators are NOT awaitable, they are iterables so are driven by async for or async comprehensions

          meanwwhile, native coros are awaitable. Therefore:

          • can be driven by await expressions

          • can be passed to asyncio functions that consume awaitables (e.g. create_task)

  • Async Comprehensions and Async Generator Expressions

    • Async generator expressions

      Here’s how we can define and use one:

      1
      2
      3
      4
      
      gen_found = (name async for name, found in multi_probe(names) if found) # the async genexpr builds the async generator (async iterator) obj
      
      async for  name in gen_found: # driven by the async for
          print(name)
      
      • an asynchronous generator expression can be defined anywhere in your program, but it can only be consumed inside a native coroutine or asynchronous generator function.
    • Async comprehensions

      • we can have the usual kind of comprehensions done async! just need to make sure that it’s within an async context i.e. within an async def or within an async REPL console.

      • async listcomps: result = [i async for i in aiter() if i % 2] which is actually similar to asyncio.gather() just a little less flexible. gather function allows us to do better exception handling.

      • async dictcomps: {name: found async for name, found in multi_probe(names)}

      • async setcomps: {name for name in names if (await probe(name)).found}

        the extra parentheses is because __getattr__ operator, . has operator precedence there

async Beyond asyncio: Curio

  • async/await constructs are library agnostic

  • curio blogdom demo example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    
      #!/usr/bin/env python3
      from curio import run, TaskGroup
      import curio.socket as socket
      from keyword import kwlist
    
      MAX_KEYWORD_LEN = 4
    
    
      async def probe(domain: str) -> tuple[str, bool]:  # <1> no need to receive event loop
          try:
              await socket.getaddrinfo(domain, None)  # <2> getaddrinfo is top-level fn of the curio.socket, it's not a method of a loop object like it is in asyncio
          except socket.gaierror:
              return (domain, False)
          return (domain, True)
    
      async def main() -> None:
          names = (kw for kw in kwlist if len(kw) <= MAX_KEYWORD_LEN)
          domains = (f'{name}.dev'.lower() for name in names)
          async with TaskGroup() as group:  # <3> core concept in curio monitors and controls a group of tasks (coros)
              for domain in domains:
                  await group.spawn(probe, domain)  # <4> we spawn to start a coro, managed by a particular TaskGroup instance. Coro is wrapped by a Task within the TaskGroup
              async for task in group:  # <5> yields as it's completed, like =as_completed=
                  domain, found = task.result
                  mark = '+' if found else ' '
                  print(f'{mark} {domain}')
    
      if __name__ == '__main__':
          run(main())  # <6> sensible syntax
    
  • TaskGroup

    • Curio TaskGroup is an asynchronous context manager that replaces several ad hoc APIs and coding patterns in asyncio.

    • above we saw how we can just drive the group and we get things in the order of completion, analogous to asyncio.as_completed

    • we can also gather them all easily:

    1
    2
    3
    
        async with TaskGroup(wait=all) as g:
                await g.spawn(coro1)
                await g.spawn(coro2)
    
  • TaskGroup as a support for structured concurrency:

    • adds a constraint to concurrent programming:

      a group of async tasks should have a single entry and single exit point.

      as an asynchronous context manager, a TaskGroup ensures that all tasks spawned inside are completed or cancelled, and any exceptions raised, upon exiting the enclosed block.

    • just like how structured programming advised against the use of GOTO statements

  • seems like asyncio has some partial support for structured concurrency since 3.11, e.g. with TaskGroups…

  • Curio also provides a UniversalQueue that can be used to coordinate the work among threads, Curio coroutines, and asyncio coroutines.

Type Hinting Asynchronous Objects

  • the return type of native coroutine == the type of result it spits out when you await on it

  • different from annotations for classic coroutines, where it’s the 3-paramed Generator type

  • 3 points about typing:

    • all the async objects are all covariant on the first type parameter, which is the type of the items yielded from these objects. Aligns with the “producer” / output types being covariant.

    • AsyncGenerator and Coroutine are contravariant on the second to last parameter. That’s because they are output types and output types are contravariant.

    • AsyncGenerator has no return type

      when we saw typing.Generator, we realised how we could return values by hacking the StopIteration(value) and that’s how generator-enhanced classic coroutines were hacked out, which is why we could make generators operate as classic coroutines and support yield from

      No such thing for AsyncGenerator

      AsyncGenerator objects don’t return values, and are completely separate from native coroutine objects, which are annotated with typing.Coroutine

How Async Works and How It Doesn’t

  • Running Circles Around Blocking Calls

    • IO is god damn slow, if we async in a disciplined manner then our servers would be high-performance
  • The Myth of I/O-Bound Systems

    • there are “I/O bound functions” but no “I/O bound systems”

    • any nontrivial system will have CPU-bound functions, dealing with them is the key to success in async programming

  • Avoiding CPU-Bound Traps

    • should have performance regression tests
    • important with async code, but also relevant to threaded Python code because of the GIL
    • we should not OBSERVE slowdown (by that time it’s too late) because the direct performance hit bad patterns are less likely to be humanly observable (until it’s too late).
    • What to do if we see a CPU-hogging bottleneck:

      • delegate task to a python proc pool
      • delegate task to external task queue
      • avoid GIL constraints, rewrite code in Cython, C, Rust – anything that interfaces with the Python/C API
      • choose to do nothing

Chapter Summary

  • don’t block the event loop, delegate to different processing unit (thread, proc, task queue)

Further Reading

Part V. Metaprogramming

Chapter 22. Dynamic Attributes and Properties

What’s New in This Chapter

Data Wrangling with Dynamic Attributes

  • Exploring JSON-Like Data with Dynamic Attributes
  • The Invalid Attribute Name Problem
  • Flexible Object Creation with new

Computed Properties

  • Step 1: Data-Driven Attribute Creation
  • Step 2: Property to Retrieve a Linked Record
  • Step 3: Property Overriding an Existing Attribute
  • Step 4: Bespoke Property Cache
  • Step 5: Caching Properties with functools

Using a Property for Attribute Validation

  • LineItem Take #1: Class for an Item in an Order
  • LineItem Take #2: A Validating Property

A Proper Look at Properties

  • Properties Override Instance Attributes
  • Property Documentation

Coding a Property Factory

Handling Attribute Deletion

Essential Attributes and Functions for Attribute Handling

  • Special Attributes that Affect Attribute Handling
  • Built-In Functions for Attribute Handling
  • Special Methods for Attribute Handling

Chapter Summary

Further Reading

Chapter 23. Attribute Descriptors

What’s New in This Chapter

Descriptor Example: Attribute Validation

  • LineItem Take #3: A Simple Descriptor
  • LineItem Take #4: Automatic Naming of Storage Attributes
  • LineItem Take #5: A New Descriptor Type

Overriding Versus Nonoverriding Descriptors

  • Overriding Descriptors
  • Overriding Descriptor Without get
  • Nonoverriding Descriptor
  • Overwriting a Descriptor in the Class

Methods Are Descriptors

Descriptor Usage Tips

Descriptor Docstring and Overriding Deletion

Chapter Summary

Further Reading

Chapter 24. Class Metaprogramming

What’s New in This Chapter

Classes as Objects

type: The Built-In Class Factory

A Class Factory Function

Introducing init_subclass

  • Why init_subclass Cannot Configure slots

Enhancing Classes with a Class Decorator

What Happens When: Import Time Versus Runtime

  • Evaluation Time Experiments

Metaclasses 101

  • How a Metaclass Customizes a Class
  • A Nice Metaclass Example
  • Metaclass Evaluation Time Experiment

A Metaclass Solution for Checked

Metaclasses in the Real World

  • Modern Features Simplify or Replace Metaclasses
  • Metaclasses Are Stable Language Features
  • A Class Can Only Have One Metaclass
  • Metaclasses Should Be Implementation Details

A Metaclass Hack with prepare

Wrapping Up

Chapter Summary

Further Reading

Legend

In these notes, I add some tags once in a while:

  1. TRICK: pythonic tricks

  2. IDIOM: Pythonic Idioms that are great

    sometimes, I also use the tag RECIPE for similar points.

  3. TO_HABIT: for things I should add to my own habits when I’m writing python

  4. MISCONCEPTION: some misconception that I had had.

  5. LANG_LIMITATION: is a point that outlines a limitation in python

TODO pending tasks

TODO Skipped Parts of the Book [0/2]

[ ] ControlFlow::Chapter18::lis.py

[ ] Metaprogramming

TODO watch list:

[ ] what makes python awesome by Raymond Hettinger

[ ] “Fear adn Awaiting in Async” PyOhio 2016 by David Beazley

[ ] Advanced asyncio: Solving Real-world Production Problems using python (staff eng @ spotify)

TODO add in some recipes that will help for Leetcode [0/1]

[ ] Table slicing

Extras blog post for the book (ref)

this is where a bunch of extra material has been included that couldn’t be included in the book

GitHub organisation for the Fluent Python book

Videos:

  1. design patterns in python from europython 2011 talk