Notes on "Fluent Python"

Table of Contents

Outline and Notes
Part I. Data Structures
Part II. Functions as Objects
Part III. Classes and Protocols
Part IV. Control Flow
Part V. Metaprogramming
Legend
TODO pending tasks
- TODO Skipped Parts of the Book [0/2]
  - [ ] ControlFlow::Chapter18::lis.py
  - [ ] Metaprogramming
- TODO watch list:
  - [ ] what makes python awesome by Raymond Hettinger
  - [ ] “Fear adn Awaiting in Async” PyOhio 2016 by David Beazley
  - [ ] Advanced asyncio: Solving Real-world Production Problems using python (staff eng @ spotify)
- TODO add in some recipes that will help for Leetcode [0/1]
  - [ ] Table slicing
- Extras blog post for the book (ref)
- GitHub organisation for the Fluent Python book
- Videos:

My unedited org-noter notes from the classic book “Fluent Python – Clear, Concise, and Effective Programming” by Luciano Ramalho.

The notes for this are messy, sorry about that. There are some chapters I could not get time to finish so they are left as TODOs for now.

Outline and Notes

Each chapter’s summary page is really useful. We should always start with the summary if we were to review these topics in the future, having already read them at least once before.

The things that are useful and I want to create a habit for, I write a coment with the word: “TO_HABIT” so that we can search it easily.

Part I. Data Structures

Chapter 1. The Python Data Model

Seeing python as a “framework”

This gives us some use cases / purpose for implementing special methods to interface w python as a “framework”

the special methods are dunder methods

We implement special methods when we want our objects to support and interact with fundamental language constructs such as: • Collections • Attribute access • Iteration (including asynchronous iteration using async for) • Operator overloading • Function and method invocation • String representation and formatting • Asynchronous programming using await • Object creation and destruction • Managed contexts using the with or async with statements

What’s New in This Chapter

A Pythonic Card Deck

this is a demonstrative example on how we can adapt to the “interface” for the “framework” that is python.

Class Composition and how Delegation pattern in the data model helps

our getitem delegates to the [] operator of self._cards, our deck automatically supports slicing. Here’s

The use of base classes allows OOP benefits for us such as being able to delegate functionality.
Delegation is different from forwarding
this python example is closer to the concept of “forwarding” actually

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Yes, the concept of "delegation" referenced in *Fluent Python* when discussing how `__getitem__` delegates to the `[]` operator of `self._cards` is directly related to a general programming language principle—specifically, the **delegation pattern** in object-oriented programming.

## Delegation in Programming

**Delegation** is a software design pattern where an object (the *delegator*) passes responsibility for a particular task to another object (the *delegate*) that is better suited to handle it. The delegator maintains the context and control, but leverages the delegate's specialized capabilities to fulfill certain operations[1][2][3].

In the context of Python's data model, when you write:

```python
def __getitem__(self, position):
    return self._cards[position]
```

You are *delegating* the item access operation to the underlying list (`self._cards`), which already knows how to handle slicing and indexing. This is a classic example of the delegation pattern: your class does not implement slicing logic itself, but instead relies on a delegate (the list) to do so.

## Delegation vs. Forwarding

- **Delegation**: The delegator object maintains the context and can pass itself (or relevant context) to the delegate. In some languages, this means that `self` (or `this`) inside the delegate's methods refers back to the original object, not the delegate[3][2].
- **Forwarding**: The delegator simply passes the request to the delegate, and the delegate operates in its own context. The delegate does not know about the original sender[3][2].

In Python, the example above is technically closer to *forwarding* because the list (`self._cards`) does not receive any special context from the delegator. However, in the broader sense, this is often referred to as delegation in Python literature.

## Delegation Pattern in Object-Oriented Programming

The delegation pattern allows composition to achieve the same code reuse as inheritance. Instead of inheriting behavior from a parent class, you compose your class with instances of other classes (the delegates) and forward certain operations to them[2][1]. This makes your code more modular and flexible.

## Language Support

Some languages have built-in support for delegation (e.g., Kotlin's `by` keyword), but in Python, delegation is typically implemented manually by calling methods on composed objects[2].

## Summary Table

| Concept      | Description                                                                 | Example in Python                |
|--------------|-----------------------------------------------------------------------------|----------------------------------|
| Delegation   | Object passes responsibility to another, often with context preservation    | `self._cards[position]` (loosely)|
| Forwarding   | Object passes request to another, delegate operates in its own context      | `self._cards[position]` (strict) |

## Key Takeaway

Delegation is a general programming pattern for code reuse and modularity, where one object relies on another to perform certain tasks. In Python, this is often seen when a class delegates an operation to an internal object, as in the `__getitem__` example from *Fluent Python*[1][2][3].

[1] https://dev.to/kalashin1/delegation-4h29
[2] https://handwiki.org/wiki/Delegation_pattern
[3] https://en.wikipedia.org/wiki/Delegation_(object-oriented_programming)
[4] https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/
[5] https://en.wikipedia.org/wiki/Delegation_(computing)
[6] https://www.cs.unc.edu/~dewan/comp401/current/Class%20Notes/22_Delegation_Notes.pdf
[7] https://developer.apple.com/library/archive/documentation/General/Conceptual/CocoaEncyclopedia/DelegatesandDataSources/DelegatesandDataSources.html
[8] https://langdev.stackexchange.com/questions/3586/are-delegates-still-relevant-in-modern-c
[9] https://www.reddit.com/r/Kotlin/comments/1928bzd/can_someone_explain_me_the_concept_of_delegations/
[10] https://perso.limsi.fr/pointal/python:delegator

How Special Methods Are Used

NOTE: built-ins that are variable sized under the hood have an ob_size attribute that holds the size of that collection. This makes it faster to call len(my_object) since it’s not really a function call, the interpreter just reads off the pointer.

1. Emulating Numeric Types
- it’s all about implementing the number-class related dunder methods, then anything can behave like a number

1. String Representation __repr__
- repr is different from string in the sense that it’s supposed to be a visual representation of the creation of that object. Therefore, it should be unambiguous, and if possible, match source code necessary to recreate the represented object
- repr is not really for display purposes, that’s what str builtin is for
- implement the special function repr first then str

1. Boolean Value of a Custom Type
- default: forward to bool() elif len()
  By default, instances of user-defined classes are considered truthy, unless either bool or len is implemented. Basically, bool(x) calls x._bool_() and uses the result. If bool is not implemented, Python tries to invoke x._len_(), and if that returns zero, bool returns False. Otherwise bool returns True.

1. Collection API
The collections api is new, and it unifies the 3 following interfaces: • Iterable to support for, unpacking, and other forms of iteration • Sized to support the len built-in function • Container to support the in operator
There’s no need to inherit from these ABCs specifically, as long as the dunder methods are implemented then it’s considered as satisfying the ABC
- Specialisations of Collection
  Three very important specializations of Collection are: • Sequence, formalizing the interface of built-ins like list and str • Mapping, implemented by dict, collections.defaultdict, etc. • Set, the interface of the set and frozenset built-in types
  I want to use the vocabulary here when describing what primitives I want to use.
- python dicts are “ordered” in the sense that the insertion order is preserved
  - there’s nothing else we can do about the ordering property (e.g. manipulating[rearranging] the order and such)

Overview of Special Methods

there’s a bunch, the latest ones are more on the async support side, they will be covered throughout the book

Why len Is Not a Method

“Practicality beats purity”.

there’s no method call for len(x) when x is a CPython built-in because it’s a direct read of a C-struct
for custom objects, we can implement the dunder method __len__
it kinda looks like a functional style (since len is a fn) in a OOP-styled language. To reconcile this, we can think of abs and len as unary functions!

Chapter Summary

Python’s DataModel can be seen as a MetaObject Protocol

Metaobjects The Art of the Metaobject Protocol (AMOP) is my favorite computer book title. But I mention it because the term metaobject protocol is useful to think about the Python Data Model and similar features in other languages. The metaobject part refers to the objects that are the building blocks of the language itself. In this context, protocol is a synonym of interface. So a metaobject protocol is a fancy synonym for object model: an API for core language constructs.

Chapter 2. An Array of Sequences

What’s New in This Chapter

Overview of Built-in Sequences

two factors to group sequences by:
1. by container (heterogeneous) / flat (homogeneous) sequences
  1. Container sequences: can be heterogeneous
    - holds references (“pointers”)
  2. Flat sequences: are homogeneous
    - holds values
2. by mutability / immutability
things like generators can be seen in the context of sequences themselves “To fill up sequences of any type”

Mem Representation for Python Objects: have a header (with metadata) and value
example of meta fields (using float as a reference):
1. refcount
2. type
3. value
Every Python object in memory has a header with metadata. The simplest Python object, a float, has a value field and two metadata fields: • ob_refcnt: the object’s reference count • ob_type: a pointer to the object’s type • ob_fval: a C double holding the value of the float On a 64-bit Python build, each of those fields takes 8 bytes. That’s why an array of floats is much more compact than a tuple of floats: the array is a single object holding the raw values of the floats, while the tuple consists of several objects—the tuple itself and each float object contained in it.

List Comprehensions and Generator Expressions

List Comprehensions and Readability
- a loop has generic purpose, but a listcomp’s purpose is always singular: to build a list
- we should stick to this purpose and not introduce abuse mechanisms like adding in side-effects from listcomp evaluations
- List comprehensions build lists from sequences or any other iterable type by filtering and transforming items.
- Scope: listcomps have a local scope, use walrus operator to expand the scope to its outer frame
  ``Local Scope Within Comprehensions and Generator Expressions''
  if that name is modified using global or nonlocal, then the scope is accordingly set
  defines the scope of the target of := as the enclos‐ ing function, unless there is a global or nonlocal declaration for that target.

Listcomps Versus map and filter

Cartesian Products
This is the part where we have more than one iterable within the listcomp

Generator Expressions

Tuples Are Not Just Immutable Lists

The immutable list part is definitely one of the main features.

It should also be seen as a nameless record.

Tuples as Records
- some examples of tuple unpacking:
  1. the loop constructs automatically support unpacking, we can assign vars even for each iteration of the loop
  2. the % formatting operator will also unpack values within the tuple when doing string formats

Tuples as Immutable Lists
2 benefits:
1. clarity: the length of tuple is fixed thanks to its immutability
2. performance: memory use is a little better, also allows for some optimisations
- Warning: the immutability is w.r.t references contained within the tuple, not values
  So tuples containing mutable items can be a source of bugs Also, unhashable tuple => can’t be inserted as a dict key or set
- Tuple’s Performance Efficiency Reasons
  Tuples are more efficient because:
  1. bytecode: tuple has simpler bytecode required: Python compiler generates bytecode for a tuple constant in one operation; but for a list literal, the generated bytecode pushes each element as a separate constant to the data stack, and then builds the list.
  2. constructor: tuple construction from existing doesn’t need any copying, it’s the same reference:
    - the list constructor returns a copy of a given list if its list(l)
    - tuple constructor returns a reference to the same t if we do tuple(t) (because they’re immutable anyway so why not same reference)
  3. amortisation: tuple, since fixed size, doesn’t need to account for future size changes by amortising that operation
  4. no extra layer of indirection The references to the items in a tuple are stored in an array in the tuple struct,while a list holds a pointer to an array of references stored elsewhere. The indirection is necessary because when a list grows beyond the space currently allocated, Python needs to reallocate the array of references to make room. The extra indirection makes CPU caches less effective.

Comparing Tuple and List Methods

Unpacking Sequences and Iterables

safer extraction of elements from sequences
works with any iterable object as the datasource, including iterators.
- for the iterable case, as long as the iterable yields exactly one item per variable in the receiving end (or * is used to do a glob capture)

Parallel assignment
This is the multi-name assignments that we do, and how involves sequence unpacking
most visible form of unpacking is parallel assignment; that is, assigning items from an iterable to a tuple of variables, as you can see in this example: >>> lax_coordinates = (33.9425, -118.408056) >>> latitude, longitude = lax_coordinates # unpacking >>> latitude

Using * to Grab Excess Items
- classic case is the use of the grabbing part for varargs
- context of parallel assignment, the * prefix can be applied to exactly one variable, but it can appear in any position

Unpacking with * in Function Calls and Sequence Literals
- the use of the unpacking operator is context-dependent, so in the context of function calls and the creation of sequences, they can be used multiple times. In the context of parallel asisgnment, it’s a singular use (else there’s going to be ambiguity on how to partition values in the sequence)

Nested Unpacking

GOTCHA: single-item tuple syntax may have silent bugs if used improperly
Both of these could be written with tuples, but don’t forget the syntax quirk that single-item tuples must be written with a trailing comma. So the first target would be (record,) and the second ((field,),). In both cases you get a silent bug if you forget a comma.

Pattern Matching with Sequences

Pattern matching is an example of declarative programming: the code describes “what” you want to match, instead of “how” to match it. The shape of the code follows the shape of the data, as Table 2-2 illustrates.

here’s the OG writeup for structural pattern matching. Some points from it:
- Therefore, an important exception is that patterns don’t match iterators. Also, to prevent a common mistake, sequence patterns don’t match strings.
- the matching primitives allow us to use guards on the match conditions (see here)
- there’s support for defining sub-patterns like so:
  1
  case (Point(x1, y1), Point(x2, y2) as p2): ...
here’s a more comprehensive tutorial PEP 636 - Structural Pattern Matching

Pattern-matching is declarative
Pattern matching is an example of declarative programming: the code describes “what” you want to match, instead of “how” to match it. The shape of the code fol‐ lows the shape of the data, as Table 2-2 illustrates.

python’s match goes beyond just being a switch statement because it supports destructuring similar to elixir
- random thought: this features is really useful if we were to write out a toy interpreter for some source code. Here’s lis.py
On the surface, match/case may look like the switch/case statement from the C lan‐ guage—but that’s only half the story.4 One key improvement of match over switch is destructuring—a more advanced form of unpacking. Destructuring is a new word in the Python vocabulary, but it is commonly used in the documentation of languages that support pattern matching—like Scala and Elixir. As a first example of destructuring, Example 2-10 shows part of Example 2-8 rewrit‐ ten with match/case.

class-patterns gift us the ability to do runtime type checks
1
case [str(name), _, _, (float(lat), float(lon))]:
- the constructor-like syntax is not a constructor, it’s a runtime check
- the names (name, lat, lon) are binded here and are available for referencing thereafter within the codeblock
- this is really interesting, it’s in the context of patterns that the syntax does runtime type checking and the code does
The expressions str(name) and float(lat) look like constructor calls, which we’d use to convert name and lat to str and float. But in the context of a pattern, that syntax performs a runtime type check: the preceding pattern will match a four-item sequence in which item 0 must be a str, and item 3 must be a pair of floats. Additionally, the str in item 0 will be bound to the name variable, and the floats in item 3 will be bound to lat and lon, respectively. So, although str(name) borrows the syntax of a constructor call, the semantics are completely different in the context of a pattern. Using arbitrary classes in patterns is covered in “Pattern Matching Class Instances” on page 192.

Pattern Matching Sequences in an Interpreter
- it’s interesting how the python 2 code was described as “a fan of pattern matching” because it matches on the first element and then the tree of control flow paths does their job, so it’s really like a switch
- this switch-like pattern-matching style is something abstract even more so than in it’s concrete programming language implementation that we have been discussing so far
- the catch-all is used for error-handling purposes here. In general there should always be a fallthrough case instead of going for no-ops which will end up being more silent

Slicing

Why Slices and Ranges Exclude the Last Item
this refers to the fact that one end of the range is closed (inclusive) and the other is open (exclusive).
- easy to calculate lengths
- easy to split / partition without creating overlaps

Slice Objects
- useful to know this because it lets you assign names to slices, like spreadsheets allow the naming of cell-ranges

Multidimensional Slicing and Ellipsis
This is more useful in the context of numpy lib, the book doens’t include examples here for the python stdlib
- built-ins are single dim, except for memoryview Except for memoryview, the built-in sequence types in Python are one-dimensional, so they support only one index or slice, and not a tuple of them.
- Multiple indexes or slices get passed in as tuples a[i,j] is evaluated as a.__getitem__((i,j)) e.g. numpy multi-dim array accesses
- ellipsis class is a singleton, the sole object being Elipsis
  - a similar case is bool class and True, False
- so in numpy, if x is a four-dimensional array, x[i, ...] is a shortcut for x[i, :, :, :,]

Assigning to Slices

Applies to mutable sequences.

Gotcha: when LHS of assignment is slice, the RHS must be iterable

In the example below, we’re trying to graft some sequence to another. With that intent, we can only graft an iterable onto another sequence, not a single element. Hence, the requirement that the RHS must be iterable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
l = list(range(10))
try:
        # so this is wrong:
        l[2:5] = 100
except:
        print("this will throw an error, we aren't passing in an iterable for the grafting.")

finally:
        # and this is right
        l[2:5] = [100]

print(l)

Using + and * with Sequences

both + and - create new objects without modding their operands

Building Lists of Lists

Gotcha: Pitfall of references to mutable objects – using a * n where a contains sequence of mutable items can be problematic

Actually applies to other mutable sequences as well, in this case it’s just a list that we’re using

Just be careful what the contained element’s properties are like.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11

my_mutable_elem = ['apple', 'banana']
print(f"my mutable elem ref: {id(my_mutable_elem)}")

list_of_lists = [ my_mutable_elem ] * 2
print(f"This creates 2 repeats \n{list_of_lists}")
print(f"(first ref, second ref) = {[id(elem) for elem in list_of_lists]}")

list_of_lists[0][0] = 'strawberry'
print(f"This mods all 2 repeated refs \n{list_of_lists}")
print(f"(first ref, second ref) = {[id(elem) for elem in list_of_lists]}")

Here’s the same gotcha using tic-tac-toe as an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
good_board = [['_'] * 3 for i in range(3)]

bad_board = [['_'] * 3] * 3

print(f"BEFORE, the boards look like this:\n\
\tGOOD Board:\n\
\t{ [row for row in good_board] }\n\
\tBAD Board:\n\
\t{ [row for row in bad_board] }\n")

# now we make a mark on the boards:
good_board[1][2] = 'X'
bad_board[1][2] = 'X'


print(f"AFTER, the boards look like this:\n\
\tGOOD Board:\n\
\t{ [row for row in good_board] }\n\
\tBAD Board:\n\
\t{ [row for row in bad_board] }\n")

Augmented Assignment with Sequences
This refers to the in-place versions of the sequence operators. in ==, there are 2 cases:
- Case A: Identity of a changes
  - the dunder method __iadd__ was not available for use
  - so a + b had to be evaluated and stored as a new id
  - and that id was then referenced by a as part of the new assignment
- Case B: Identity of a does not change
  - this would mean that a is actually mutated in-place
  - it would have used the dunder method __iadd__
In other words, the identity of the object bound to a may or may not change, depending on the availability of __iadd__.
In general, for mutable sequences, it is a good bet that __iadd__ is implemented and that += happens
- doing += for repeated concats of immutable sequences is inefficient
  however, str contacts have been optimised in CPython, it’s alright to do that in CPython. Extra space would have been allocated to amortise the new space allocations.

A += Assignment Puzzler

Learnings!
I take three lessons from this:
• Avoid putting mutable items in tuples.
• Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.
• Inspecting Python bytecode is not too difficult, and can be helpful to see what is going on under the hood.

Example

it’s a peculiarity in the += operator!

Learnings:

I take three lessons from this:
• Avoid putting mutable items in tuples.
• Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.
• Inspecting Python bytecode is not too difficult, and can be helpful to see what is going on under the hood.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
t = (1,2, [30, 40])
print(t)
try:
        t[2] += [50, 60]
except:
        print("LMAO complaints")
finally:
        print(t)

try:
        t[2].extend([90, 100])

except:
        print("this won't error out though")

finally:
        print(t)

list.sort Versus the sorted Built-In

in-place functions should return None as a convention
There’s a drawback to this: we can’t cascade calls to this method

python’s sorting uses timsort!

Managing Ordered Sequences with bisect (extra ref from textbook)

When a List Is Not the Answer

Arrays: best for containing numbers
- an array of float values does not hold full-fledged float instances, but only the packed bytes representing their machine values—similar to an array of double in the C language.
- examples:
  1. typecode b => byte => 8 bits over signed and unsigned regions ==> [-128, 127] range of representation
- for special cases of numeric arrays for bin data (e.g. raster images), bytes and bytearray types are more appropriate!

Memory Views

Examples

id vs context

The learning from this is that the memoryview objects and the memory that they provide a view of are two different regions of memory. id vs context.

So here, m2, m3 and all have different id references, but the memory region that they give a view of is all the same.

That’s why we can mutate using one memory view and every other view also reflects that change.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from array import array

# just some bytes, the sequence is buffer-protocol-adherent
octets = array("B", range(6))
print(octets)

# builds a new memoryview from the array
m1 = memoryview(octets)
print(m1)
# exporting of a memory view to a list, this creates a new list (a copy!)
print(m1.tolist())

# builds a new memoryview, with 2 rows and 3 columns
m2 = m1.cast('B', [2,3])
print(m2)
print(m2.tolist())


m3 = m1.cast('B', [3,2])
print(m3)
print(m3.tolist())

# overwrite byte
m2[1,1] = 22

# overwrite byte
m3[1,1] = 33

print(f"original memory has been changed: \n\t{octets} ")
print(f"m1 has been changed:\n\t { m1.tolist() }")
print(f"m2 has been changed:\n\t { m2.tolist() }")
print(f"m3 has been changed:\n\t {m3.tolist()}")

corruption

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from array import array
from sys import byteorder

print(byteorder)
numbers = array('h', [-2,-1,0,1,2])
memv = memoryview(numbers)

print(len(memv))

print(memv[0])

# cast the half as a byte, so the resultant sequence will have double the elements:
memv_oct = memv.cast('B')
# the numbers are stored in little endian format
print(memv_oct.tolist())
# so -2 as a 2-byte signed short will be (little endian binary) 0xfe 0xff (254, 255)

# so we get:
# -2: 0xfe 0xff (254, 255)
# -1: 0xff 0xff (255, 255)
# 0: 0x00 0x00 (0, 0)
# 1: 0x01 0x00 (1, 0)
# 2: 0x02 0x00 (2, 0)


# asisgns the value of 4 to byte-offset 5
memv_oct[5] = 4
print( numbers )
# so this change is to the 2nd byte of the third element of numbers
# byte index 5 is the high byte (since it's little endian so bytes are low -> high)
# so the 3rd element is now [0, 0x0400]
# = a + (b*256) = 0 + (4 * 256) is 1024 in decimal

# NOTE: Note the change to numbers: a 4 in the most significant byte of a 2-byte unsigned
# integer is 1024.

Extra: “Parsing binary records with struct”
Here’s the reference.
Some takeaways:
- Proprietary binary records in the real world are brittle and can be corrupted easily. examples:
  1. string parsing: paddings, null terminated, size limits?
  2. endianness problem: what byteorder was used for representing integers and floats (CPU-architecture-dependent)?
- always explore pre-built solutions first instead of building yourself:
  - for data exchange, pickle module works great, but have to ensure python versions align since the default binary formats may be different. Reading a pickle also may run arbitrary code.
- if the binary exchange uses multiple programming languages, standardise the serialisation. Serial forms:
  1. multi-platform binary serialisation formats:
    1. MessagePack
    2. ProtocolBuffers
  2. JSON

bot assisted concept mapping

Here’s a bot-assisted concept map between unix mmap and memoryviews:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Memory mapping a file is a powerful technique that allows access to file data as if it were in memory, and the concepts connect naturally between the Unix world (via `mmap` system calls) and Python (via the `mmap` module and `memoryview` objects).

**Unix World: mmap**

- **Definition:** The Unix `mmap` system call maps files or devices into a process's address space, enabling file I/O by reading and writing memory. This is efficient for large files because data is loaded on demand, and multiple processes can share the same mapped region[1].
- **Usage:** After opening a file, `mmap` associates a region of virtual memory with the file. Reading and writing to this memory behaves as if you were reading and writing to the file itself. The system manages when data is actually read from or written to disk, often using demand paging[1].
- **Types:** Both file-backed (mapping a file) and anonymous (not backed by a file, similar to dynamic allocation) mappings are supported. Shared mappings allow interprocess communication, while private mappings isolate changes to one process[1].

**Python World: mmap Module**

- **Definition:** Python’s `mmap` module provides a high-level interface to memory-mapped files, closely mirroring the Unix `mmap` functionality. You must provide a file descriptor (from `fileno()` or `os.open()`) and specify the mapping size and access mode[2][3].
- **Usage:** Memory-mapped file objects behave like both file objects and mutable byte arrays, allowing random access and slicing. You can read and write data by indexing or slicing, and you can seek through the file as if it were a standard file object[2][3].
- **Access Modes:** You can specify read-only, write-through (changes go directly to the file), or copy-on-write (changes are local)[4][5].

**Python World: memoryview**

- **Definition:** The `memoryview` object provides a safe way to access the internal data of an object that supports the buffer protocol, such as `bytes`, `bytearray`, or a memory-mapped file object[6].
- **Usage:** With a `memoryview`, you can manipulate large data structures without copying them, which is especially useful for memory-mapped files. It allows you to pass a view of the data to functions or slice it efficiently, avoiding unnecessary memory duplication[6].
- **Connection to mmap:** When you create a memory-mapped file in Python, you can use `memoryview` to work with the data as a buffer, enabling efficient manipulation and sharing of large data sets.

**Linking the Concepts**

- **Memory Mapping:** Both Unix `mmap` and Python’s `mmap` module allow you to treat file data as if it were in memory, enabling efficient random access and sharing between processes[1][2][3].
- **Efficient Data Access:** Using `memoryview` in Python is analogous to working directly with the mapped memory region in Unix, as both avoid copying large chunks of data and allow efficient manipulation of file contents[6].
- **Interprocess Communication:** In Unix, shared memory mappings (`MAP_SHARED`) allow processes to communicate by reading and writing the same memory region. In Python, you can achieve similar effects by sharing a memory-mapped file object between processes[1][2].
- **Performance:** Both approaches leverage the operating system’s memory management to reduce I/O overhead and enable fast, random access to file data.

**Summary Table**

| Concept                | Unix (`mmap`)                | Python (`mmap` module)           | Python (`memoryview`)         |
|------------------------|------------------------------|-----------------------------------|-------------------------------|
| Purpose                | Map files to memory          | Map files to memory               | View memory as buffer         |
| Access Method          | System call                  | Module/object                     | Object                        |
| Sharing                | Shared/private mappings      | Shared via file object            | View of existing buffer       |
| Efficiency             | Demand paging, no copy       | Demand paging, no copy            | No copy, efficient slicing    |
| Use Case               | IPC, efficient file I/O      | Efficient file I/O, IPC           | Efficient data manipulation   |

By understanding these connections, you can leverage memory mapping for efficient file handling and data sharing across both Unix and Python environments.

[1] https://en.wikipedia.org/wiki/Mmap
[2] https://docs.python.org/3/library/mmap.html
[3] https://github.com/python/cpython/blob/master/Doc/library/mmap.rst
[4] https://pymotw.com/3/mmap/
[5] https://realpython.com/python-mmap/
[6] https://smart-spatial.com/data%20science/2017/09/22/MemoryView/
[7] https://stackoverflow.com/questions/63553692/how-to-use-memory-mapped-file-in-python-linux/63554607
[8] https://pymotw.com/3/mmap/index.html
[9] https://unix.stackexchange.com/questions/712651/does-mmap-allow-creating-a-mapping-that-is-much-larger-than-the-amount-of-physic
[10] https://deepaksood619.github.io/computer-science/operating-system/memory-mapping-mmap/
[11] https://www.cmi.ac.in/~madhavan/courses/prog2-2012/docs/python-3.2.1-docs-html/library/mmap.html
[12] https://www.blopig.com/blog/2024/08/memory-mapped-files-for-efficient-data-processing/
[13] https://stackoverflow.com/questions/4991533/sharing-memory-between-processes-through-the-use-of-mmap/4991631
[14] https://documentation.help/Python-2.4/module-mmap.html
[15] https://docs.python.org/3.4/library/mmap.html?highlight=mmap
[16] https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-mmap-map-pages-memory
[17] https://man7.org/linux/man-pages/man2/mmap.2.html
[18] https://programmingappliedai.substack.com/p/what-is-mmap-in-linux-and-how-it
[19] https://w3.cs.jmu.edu/kirkpams/OpenCSF/Books/csf/html/MMap.html
[20] https://www.ibm.com/docs/ssw_ibm_i_74/apis/mmap.htm
[21] https://www.unix.com/man-page/linux/3/mmap/
[22] https://stackoverflow.com/questions/47437481/python-can-i-get-a-memoryview-or-a-bytearray-mapping-to-a-mmap
[23] https://docs.vultr.com/python/built-in/memoryview
[24] https://pymotw.com/2/mmap/
[25] https://www.youtube.com/watch?v=ky1n6luzL3Y

“casting” in memoryview vs Elixir ecto “casting”

The mental models are different.

“casting” with memoryview is about interpreting memory as a different type, not about automatic type conversion or validation as in Elixir changesets. The term is used more loosely in Python, while in Elixir, it is a formal, declarative operation for data preparation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
The use of the term **"cast"** in the context of Python `memoryview` is not the same as in Elixir changesets, even though both involve types or transformations.

**Python `memoryview` and Casting**

- **Casting in Python `memoryview`:**
  When people refer to "casting" with `memoryview`, they usually mean taking a slice of a memoryview or viewing the underlying buffer as a different type (sometimes using modules like `ctypes` or `numpy` to achieve type reinterpretation). However, the built-in `memoryview` object itself does not perform type conversion; it provides a view into the memory buffer of an object that supports the buffer protocol, such as `bytes`, `bytearray`, or a NumPy array[5][6][1].
- **Type Safety:**
  The `memoryview` object allows you to manipulate the data as if it were a sequence of bytes or another type (if the underlying buffer supports it), but it does not automatically convert the data to a different type. If you want to reinterpret the data as a different type (e.g., casting bytes to an integer array), you typically use additional libraries or manual conversion, not `memoryview` itself[5][1].
- **No Implicit Type Conversion:**
  The `memoryview` simply provides a window into the existing data, preserving the memory layout and avoiding data copying[6][8]. Any "casting" is more about how you interpret the bytes, not a formal declaration or transformation of types.

**Elixir Changesets and Casting**

- **Casting in Elixir Changesets:**
  In Elixir's Ecto changesets, "casting" refers to the explicit declaration of how data (often from a web form or external source) should be converted or validated into a specific type before being inserted into a database. This is a declarative process that ensures data integrity and type safety.
- **Type Conversion:**
  Casting in Elixir changesets involves actual type conversion (e.g., from a string to an integer or date), and is a core part of the data validation and preparation process.

**Comparison Table**

| Context                | Meaning of "Cast"                        | Type Conversion | Purpose                        |
|------------------------|------------------------------------------|-----------------|--------------------------------|
| Python `memoryview`    | Viewing/reinterpreting memory as a type  | No (manual only)| Efficient data access/manipulation |
| Elixir changesets      | Declaring/validating/converting types    | Yes             | Data validation, preparation   |

**Summary**

In Python, "casting" with `memoryview` is about interpreting memory as a different type, not about automatic type conversion or validation as in Elixir changesets. The term is used more loosely in Python, while in Elixir, it is a formal, declarative operation for data preparation[5][6][1].

[1] https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
[2] https://stackoverflow.com/questions/71028367/how-to-make-a-custom-python-class-to-support-memoryview-conversion
[3] https://stackoverflow.com/questions/48136122/how-do-i-cast-a-slice-of-memoryview-into-c-string-unsigned-char
[4] https://docs.python.org/3/library/stdtypes.html
[5] https://blog.finxter.com/converting-python-bytes-to-memoryview/
[6] https://realpython.com/ref/builtin-functions/memoryview/
[7] https://www.stratascratch.com/blog/understanding-data-types-in-python-with-examples/
[8] https://docs.python.org/3/c-api/memoryview.html

NumPy

python’s Global Interpreter Lock (GIL) and how releasing it unlocks better parallelisation

it’s a mutex that protects access to python objects and therefore prevents multiple native threads from executing Python bytecode simultaneously within the same process.
It was intended to be a simplification mechanism to make memory handling simpler but that also means having this mutex limits parallelism.
Typical workarounds:
1. multi-processing, separate processes, each with their own GIL
2. offload CPU-intensive work to C-extensions or libs that release the GIL

here’s a bot-written outline on it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
The **Global Interpreter Lock (GIL)** is a core mechanism in CPython, the reference implementation of Python, that ensures only one thread executes Python bytecode at a time, even on multi-core processors[2][4][5]. Here’s a detailed overview:

## **What Is the GIL?**

- **Definition:** The GIL is a mutex (mutual exclusion lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode simultaneously within the same process[2][4][7].
- **Purpose:** It exists primarily to simplify CPython’s memory management, especially reference counting, which is not thread-safe by default. Without the GIL, concurrent access to Python objects could lead to race conditions and memory corruption[5][7].

## **How Does the GIL Work?**

- **Single Thread Execution:** Only one thread holds the GIL at any moment, meaning only one thread can execute Python code at a time, even if you have multiple threads running[2][4][6].
- **Thread Switching:** The interpreter periodically releases the GIL, allowing other threads to acquire it and execute Python code. This switching happens frequently, but it means that CPU-bound multithreaded Python programs do not benefit from multiple cores for parallel execution of Python code[2][4].
- **Non-Python Code:** Operations that do not require the Python interpreter (such as I/O or some C extensions like NumPy) can release the GIL, allowing other threads to run Python code or the process to use multiple cores for those operations[2][4].

## **Why Does the GIL Exist?**

- **Memory Management:** Simplifies reference counting and garbage collection by ensuring thread safety for Python objects[5][7].
- **C Extensions:** Makes it easier to write and use C extensions by providing a stable, single-threaded environment for their execution[1][3][7].
- **Implementation Simplicity:** Using a single lock is easier to implement and maintain than fine-grained locking for all Python objects[1][7].

## **Implications of the GIL**

- **Limited Parallelism:** The GIL prevents true parallel execution of Python code in multi-threaded programs, making it a bottleneck for CPU-bound tasks[2][4][5].
- **Workarounds:** For parallelism, Python developers often use multiprocessing (which uses separate processes, each with its own GIL) or offload CPU-intensive work to C extensions or libraries that release the GIL[1][4].
- **Performance Impact:** The GIL can degrade performance in multi-threaded, CPU-bound applications. However, for I/O-bound or single-threaded programs, its impact is minimal[2][4][6].

## **Future of the GIL**

- **Potential Removal:** The Python Steering Council has indicated support for PEP 703, which proposes making a version of CPython without the GIL. This could enable true multi-threaded parallelism in Python in the future[3].
- **Challenges:** Removing the GIL is complex due to backward compatibility and the reliance of many extensions on its guarantees[3][2].

## **Summary Table**

| Feature                | Description                                                                 |
|------------------------|-----------------------------------------------------------------------------|
| Purpose                | Protect Python objects, simplify memory management, enable C extensions     |
| Execution Model        | Only one thread executes Python bytecode at a time                          |
| Impact on Parallelism  | Limits CPU-bound parallelism in multi-threaded Python code                  |
| Workarounds            | Multiprocessing, C extensions, I/O-bound operations                         |
| Future                 | Potential removal via PEP 703, but challenges remain                        |

The GIL is a key part of Python’s design, balancing simplicity and safety with some limitations for parallel execution[2][4][5].

[1] https://en.wikipedia.org/wiki/Global_interpreter_lock
[2] https://wiki.python.org/moin/GlobalInterpreterLock
[3] https://developer.vonage.com/en/blog/removing-pythons-gil-its-happening
[4] https://realpython.com/python-gil/
[5] https://dev.to/adityabhuyan/understanding-pythons-global-interpreter-lock-gil-and-its-impact-on-concurrency-2da6
[6] https://realpython.com/videos/global-interpreter-lock-overview/
[7] https://dev.to/ohdylan/understanding-pythons-global-interpreter-lock-gil-mechanism-benefits-and-limitations-4aha
[8] https://www.pubnub.com/blog/understanding-pythons-global-interpreter-lock/

NumPy and SciPy are formidable libraries, and are the foundation of other awesome tools such as the Pandas—which implements efficient array types that can hold non‐ numeric data and provides import/export functions for many different formats, like .csv, .xls, SQL dumps, HDF5, etc.—and scikit-learn, currently the most widely used Machine Learning toolset. Most NumPy and SciPy functions are implemented in C or C++, and can leverage all CPU cores because they release Python’s GIL (Global Interpreter Lock). The Dask project supports parallelizing NumPy, Pandas, and scikit-learn processing across clusters of machines. These packages deserve entire books about them.

Deques and Other Queues
- issues with list methods
  although we can use list as a stack / queue (by using .append() or .pop()). However, inserting and removing from the head of the list (the 0-idx end) is costly because the entire list must be shifted in memory => this is why just re-purposing lists is not a good idea.
- Characteristics:
  1. when bounded, every mutation will adhere to the deque capacity for sure.
  2. hidden cost is that removing items from the middle of a deque is not fast
  3. append and popleft are atomic, so can be used for multi-threaded applications without needing locks:w
- alternative queues in stdlib
  - asyncio provides async-programming focused queues

Chapter Summary

Chapter 3. Dictionaries and Sets

What’s New in This Chapter

Extra: Internals of sets and dicts internalsextra

This info is found in the fluentpython website. It considers the strengths and limitations of container types (dict, set) and how it’s linked to the use of hash tables.

Running performance tests
- the trial example of needle in haystack has beautiful ways of writing it
  1 2 3 4
  found = 0 for n in needles: if n in haystack: found += 1
  when using sets, because it’s directly related to set theory, we can use a one-liner to count the needles that occur in the haystack by doing an intersection:
  1
  found = len(needles & haystack)
- This intersection approach is the fastest from the test that the textbook runs.
- the worst times are if we use the list datastructure for the haystack
- If your program does any kind of I/O, the lookup time for keys in dicts or sets is negligible, regardless of the dict or set size (as long as it fits in RAM).

Hashes & Equality
- the usual uniform random distribution assumption as the goal to reach for hashing functions, just described in a different way: to be effective as hash table indexes, hash codes should scatter around the index space as much as possible. This means that, ideally, objects that are similar but not equal should have hash codes that differ widely.
  - here’s the oficial docs on the hash function
- A hashcode for an object usually has less info than the object that the hashcode is for.
  - 64-bit CPython hashcodes is a 64-bit number => $2^{64}$ possible values
  - consider an ascii string of 10 characters (and that there are 100 possible values in ascii) => $100^{10}$ which is bigger than the possible values for the hashcode.
  By the way it’s actually salted, there’s some nuances on how the salt is derived but it should be such that each shell has a particular salt.
- The modern hash function is the siphash implementation

Hash Implementation
- each row in the table is traditionally a “bucket”. In the case of sets, it’s just a single item that the bucket will hold
- For 64-bit CPython,
  - It’s a 64-bit hash code that points to a 64 bit pointer to the element value
  - so the table doesn’t need to keep track of indices, offsets work fine since they are fixed-width.
- Also it keeps 1/3 extra space that gets doubled when encroached so there’s some amortisation happening there also.

Hash Table Algo for sets
- in the flowchart, notice that the first step includes the modulo operation, this is the reason why the insertion order is not preserved since the output of running the modulo on the hashvalues will not be in order, it will spread about.
- on hash collisions, the probing can be done in various ways. CPython uses linear probing but also mitigates the harms of using linear probing: Incrementing the index after a collision is called linear probing. This can lead to clusters of occupied buckets, which can degrade the hash table performance, so CPython counts the number of linear probes and after a certain threshold, applies a pseudo random number generator to obtain a different index from other bits of the hash code. This optimization is particularly important in large sets.
- the last step is to actually do an equality check on the value. this is why for something to be hashable, two dunder functions must be implemented: __hash__ and __eq__

Hash table usage for dicts

Dictionary implementation benefits from 2 memory optimisations. Here’s a summary of it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Here’s a summary of the **two major memory optimizations** for modern Python dictionaries, as described in the referenced Fluent Python article:

1. **Key-Sharing Dictionaries (PEP 412)**
   - Introduced in Python 3.3, this optimization allows multiple dictionaries that share the same set of keys (such as instance `__dict__` for objects of the same class) to share a single "keys table."
   - Only the values are stored separately for each dictionary; the mapping from keys to indices is shared.
   - This greatly reduces memory usage for objects of the same type, especially when many objects have the same attributes[1].

2. **Compact Dictionaries**
   - Modern Python dictionaries use a split-table design, separating the storage of keys and values from the hash table itself.
   - The hash table stores indices into a compact array of keys and values, rather than storing the full key-value pairs directly in the hash table.
   - This reduces memory overhead, improves cache locality, and keeps insertion order predictable and efficient[1].

**In summary:**
- **Key-sharing dictionaries** save memory by sharing the key structure among similar dicts.
- **Compact dicts** store keys and values in separate, dense arrays, minimizing wasted space and improving performance.

[1] https://www.fluentpython.com/extra/internals-of-sets-and-dicts/
[2] https://www.geeksforgeeks.org/python/minimizing-dictionary-memory-usage-in-python/
[3] https://python.plainenglish.io/optimizing-python-dictionaries-a-comprehensive-guide-f9b04063467a
[4] https://stackoverflow.com/questions/10264874/python-reducing-memory-usage-of-dictionary
[5] https://labex.io/tutorials/python-how-to-understand-python-dict-memory-scaling-450842
[6] https://www.youtube.com/watch?v=aJpk5miPaA8
[7] https://www.reddit.com/r/pythontips/comments/149qlts/some_quick_and_useful_python_memory_optimization/
[8] https://www.tutorialspoint.com/How-to-optimize-Python-dictionary-access-code
[9] https://labex.io/tutorials/python-how-to-understand-python-dictionary-sizing-435511
[10] https://www.joeltok.com/posts/2021-06-memory-dataframes-vs-json-like/
[11] https://www.linkedin.com/advice/0/what-strategies-can-you-use-optimize-python-dictionaries-fqcuf

Original implementation
- there’s 3 fields to keep, 64 bits each
- first two fields play the same role as they do in the implementation of sets. To find a key, Python computes the hash code of the key, derives an index from the key, then probes the hash table to find a bucket with a matching hash code and a matching key object. The third field provides the main feature of a dict: mapping a key to an arbitrary value

Optimisation 1: Compact implementation
- there’s an indices table extra that has a smaller width (hence compact)
- Raymond Hettinger observed that significant savings could be made if the hash code and pointers to key and value were held in an entries array with no empty rows, and the actual hash table were a sparse array with much smaller buckets holding indexes into the entries array

Optimisation 2: Key-Sharing Dictionary ⭐️
The combined-table layout is still the default when you create a dict using literal syntax or call dict(). A split-table dictionary is created to fill the __dict__ special attribute of an instance, when it is the first instance of a class. The keys table is then cached in the class object. This leverages the fact that most Object Oriented Python code assigns all instance attributes in the __init__ method. That first instance (and all instances after it) will hold only its own value array. If an instance gets a new attribute not found in the shared keys table, then this instance’s __dict__ is converted to combined-table form. However, if this instance is the only one in its class, the __dict__ is converted back to split-table, since it is assumed that further instances will have the same set of attributes and key sharing will be useful.

Practical Consequences
- of how sets work
  1. need to implement the __hash__ and __eq__ functions
  2. efficient membership testing, the possible overheads is the small number of probing the might need to be done to find a matching element or an empty bucket
  3. Memory overhead:
    - an array of pointers is the most compact, sets have significant memory overhead. hash table adds a hash code per entry, and at least ⅓ of empty buckets to minimize collisions
  4. Insertion order is somewhat preserved but it’s not reliable.
  5. Adding elements to a set may change the order of other elements. That’s because, as the hash table is filled, Python may need to recreate it to keep at least ⅓ of the buckets empty. When this happens, elements are reinserted and different collisions may occur.
- of how dicts work
  1. need to implement both the dunder methods __hash__ and __eq__
  2. key search almost as fast as element searches in sets
  3. Item ordering preserved in the entries table
  4. To save memory, avoid creating instance attributes outside of the init method. If all instance attributes are created in init, the dict of your instances will use the split-table layout, sharing the same indices and key entries array stored with the class.

Modern dict Syntax

dict Comprehensions

Unpacking Mappings
- we can use the unpacking operator ** when keys aer all strings
- if there’s any duplicates in the keys then the later entries will overwrite the earlier ones

Merging Mappings with | (the union operator)
- there’s an inplace merge |= and there’s a normal merge that creates a new mapping |
- it’s supposed to look like the union operator and you’re doing an union on two mappings

Syntax & Structure: Pattern Matching with Mappings cool

this will work with anything that is a subclass or virtual subclass of Mapping

we can use the usual tools for this:

can use partial matching

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
     data = {"a": 1, "b": 2, "c": 3}

     match data:
         case {"a": 1}:
             print("Matched 'a' only")
         case {"a": 1, "b": 2}:
             print("Matched 'a' and 'b'")
         case _:
             print("No match")

     # in this case, the order of the cases matter, the first match is evaluated

can capture keys using the **rest syntax

1
2
3
     match data:
         case {"a": 1, **rest}:
             print(f"Matched 'a', rest: {rest}")

can be arbitrarily deeply nested

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
     data = {
         "user": {
             "id": 42,
             "profile": {
                 "name": "Alice",
                 "address": {"city": "Wonderland"}
             }
         }
     }

     match data:
         case {
             "user": {
                 "profile": {
                     "address": {"city": city_name}
                 }
             }
         }:
             print(f"City is {city_name}")
         case _:
             print("No match")

Keys in the pattern must be literals (not variables), but values can be any valid pattern, including captures, literals, or even further nested patterns
Pattern matching works with any mapping type (not just dict), as long as it implements the mapping protocol

Guards (if clauses) can be used to add extra conditions to a match.

More on virtual sub-classes (and how it’s similar to mixins)

should be used when we can’t control the class (e.g. it’s an external module) but we want to adapt it
allows the indication that a class conforms to the interface of another – to adapt to multiple interfaces

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
A **virtual subclass** in Python refers to a class that is recognized as a subclass of an abstract base class (ABC) without actually inheriting from it in the traditional sense. This mechanism is provided by the `abc` module and is achieved by *registering* a class as a virtual subclass of an ABC using the `register()` method[4][5][8].

### Core Mental Model

- **Traditional subclassing**: A class (the subclass) inherits from another (the superclass), forming a direct relationship. Methods and attributes are inherited, and `issubclass()` and `isinstance()` reflect this relationship[3].
- **Virtual subclassing**: A class is *declared* to be a subclass of an ABC at runtime, without modifying its inheritance tree or MRO (Method Resolution Order). This is done by calling `ABC.register(SomeClass)`. After registration, `issubclass(SomeClass, ABC)` and `isinstance(instance, ABC)` will return `True`, but `SomeClass` does not actually inherit from `ABC`[4][5][8].

### Why Use Virtual Subclasses?

- **Third-party integration**: If you want to treat classes from external libraries as conforming to your interface (ABC), but you cannot or do not want to modify their source code to inherit from your ABC, you can register them as virtual subclasses[1][8].
- **Interface compliance**: Virtual subclassing is a way to declare that a class “conforms to” an interface, even if it doesn’t inherit from it, as long as it implements the required methods (i.e., it follows the protocol)[2][5].
- **Decoupling**: It allows you to decouple interface definition (the ABC) from implementation, enabling more flexible and extensible designs.

### Example

Suppose you have an ABC and an external class:

```python
from abc import ABC

class Car(ABC):
    def drive(self):
        pass

class Tesla:
    def drive(self):
        print("Driving in Tesla")
```

You want to use `isinstance(obj, Car)` to check if an object can be driven, but `Tesla` does not inherit from `Car`. You can register it:

```python
Car.register(Tesla)

print(issubclass(Tesla, Car))  # True
print(isinstance(Tesla(), Car))  # True
```

Now, `Tesla` is a *virtual subclass* of `Car`, even though it doesn't inherit from it[4][5][8].

### Key Properties

- **No inheritance**: Virtual subclasses do not inherit methods or properties from the ABC. Registration only affects `issubclass()` and `isinstance()` checks[4][8].
- **No MRO change**: The ABC does not appear in the virtual subclass’s MRO, so `super()` calls and method resolution are unaffected[4].
- **Runtime declaration**: Registration can be done at runtime, providing flexibility for dynamic systems[4][5].

### Relationship to Other Python Typing Models

| Model                 | Relationship Mechanism         | Example Use Case                                  |
|-----------------------|-------------------------------|---------------------------------------------------|
| Duck typing           | Implements required interface | Any object with `drive()` method can be used      |
| Virtual subclass      | Registered with ABC           | External class made compatible with ABC interface |
| Classical inheritance | Inherits from superclass      | Subclass extends or customizes base functionality |

Virtual subclassing sits between **duck typing** (no explicit relationship, just interface compliance) and **classical inheritance** (explicit parent-child relationship), allowing you to *declare* interface compliance without direct inheritance[1][6].

### Reference Implementation

The mechanism is built into the `abc.ABCMeta` metaclass, which provides the `register()` method for this purpose[4][5].

> "You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as 'virtual subclasses' – these and their descendants will be considered subclasses of the registering ABC by the built-in function, but the registering ABC won’t show up in their MRO nor will method implementations defined by the registering ABC be callable..."[4]

### Practical Implications

- **Type checks**: Enables more expressive and flexible type checks in frameworks and libraries.
- **Backward compatibility**: Allows legacy or third-party classes to be integrated into new type hierarchies without code changes.
- **Protocol enforcement**: Encourages protocol-oriented design in a language that is otherwise dynamically typed.

In summary, a **virtual subclass** in Python is a class that, through explicit registration, is treated as a subclass of an abstract base class for type-checking purposes, without requiring direct inheritance or MRO modification[4][5][8]. This is a powerful tool for interface-oriented design and decoupled architectures.

[1] https://stackoverflow.com/questions/51666120/whats-the-usage-of-a-virtual-subclass
[2] https://www.reddit.com/r/learnpython/comments/9fgdib/virtual_subclasses/
[3] https://realpython.com/ref/glossary/subclass/
[4] https://docs.python.org/3/library/abc.html
[5] https://www.demo2s.com/python/python-virtual-subclasses.html
[6] https://stackoverflow.com/questions/51666120/whats-the-usage-of-a-virtual-subclass/51666808
[7] https://en.wikipedia.org/wiki/Virtual_class
[8] https://kavianam.ir/Virtual-Subclass-in-Python
[9] https://stackoverflow.com/questions/57602862/what-is-indirect-and-virtual-subclass
[10] https://hynek.me/articles/python-subclassing-redux/

a virtual subclass should be seen more like a mixin rather than multiple inheritance

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
  **Virtual subclassing** in Python is *not* the canonical or idiomatic mechanism for achieving "multiple extends" (i.e., multiple inheritance) or adapting to multiple interfaces in the way you might do in statically-typed languages like Java or C#. Instead, Python supports **multiple inheritance** natively and directly through its class definition syntax, and this is the primary mechanism for a class to inherit from multiple parents and thus "implement multiple interfaces"[1][5][6].

  ### Multiple Inheritance: The Pythonic Way

  Python allows a class to inherit from any number of parent classes simply by listing them in the class definition:

  ```python
  class MyClass(InterfaceA, InterfaceB, SomeBase):
      pass
  ```

  This means `MyClass` will inherit all methods and attributes from `InterfaceA`, `InterfaceB`, and `SomeBase`, and will be considered a subclass of each for purposes of `issubclass()` and `isinstance()` checks[1][5][6].

  This is the direct and idiomatic way to "extend" multiple interfaces or base classes in Python, and is how you would typically express that a class conforms to multiple protocols or contracts.

  ### Virtual Subclassing: Purpose and Limitations

  **Virtual subclassing** via `ABC.register(SomeClass)` is a mechanism for declaring that a class should be *recognized* as a subclass of an abstract base class (ABC) for the purposes of type checking, without actually inheriting from it. This is especially useful when you want to adapt a class (often from a third-party library) to an interface you control, but cannot or do not want to modify its source code[7].

  - **Virtual subclassing does not provide code sharing**: The registered class does not inherit any methods or properties from the ABC; it is only recognized as a subclass for `issubclass()` and `isinstance()` checks.
​  - **No method resolution order (MRO) change**: The ABC does not appear in the registered class's MRO, so method resolution and `super()` calls are unaffected.
​  - **No attribute/method adaptation**: If the registered class does not implement the required methods, you get runtime errors.

  ### Adapting to Multiple Interfaces

  If you want a class to *adapt* to multiple interfaces:

  - **If you control the class**: Use multiple inheritance directly. This is idiomatic and gives you both interface conformance and code sharing[1][5][6].
​  - **If you do not control the class**: Use virtual subclassing to declare interface conformance, but be aware this is only for type checks and does not provide actual method implementations[7].

  ### Mixin Pattern

  A related idiom is the **mixin**: a class designed to be used as a base class that provides additional methods, typically without defining its own state. Mixins are commonly used in Python to modularly add functionality via multiple inheritance, and are a way to "compose" behaviors without the pitfalls of deep inheritance hierarchies[6].

  ### Summary Table

  | Mechanism           | Code Sharing | Interface Conformance | Idiomatic Use         |
  |---------------------|-------------|----------------------|----------------------|
  | Multiple Inheritance| Yes         | Yes                  | Implementing multiple interfaces, mixins |
  | Virtual Subclassing | No          | Yes (type checks)    | Adapting foreign classes to your ABCs    |

  **In summary:**
  Python's primary and idiomatic way to "extend" multiple interfaces is through direct multiple inheritance. **Virtual subclassing** is a supplementary mechanism for adapting existing classes to an interface for type checks, but it does not provide code sharing or actual interface adaptation[1][5][6][7].

  [1] https://www.digitalocean.com/community/tutorials/understanding-class-inheritance-in-python-3
  [2] https://stackoverflow.com/questions/66849568/issubclass-and-multiple-inheritance
  [3] https://stackoverflow.com/questions/27545888/is-multiple-inheritance-from-the-same-base-class-via-different-parent-classes-re
  [4] https://softwareengineering.stackexchange.com/questions/291061/is-there-a-better-pattern-than-multiple-inheritance-here
  [5] https://dataplatform.cloud.ibm.com/docs/content/wsd/nodes/scripting_guide/clementine/jython/clementine/python_inheritance.html?audience=wdp&context=dph&context=analytics&context=analytics&context=analytics&context=analytics&context=analytics&context=analytics&context=cpdaas
  [6] https://openstax.org/books/introduction-python-programming/pages/13-5-multiple-inheritance-and-mixin-classes
  [7] https://hynek.me/articles/python-subclassing-redux/
  [8] https://docs.python.org/3/tutorial/classes.html
  [9] https://realpython.com/inheritance-composition-python/
  [10] https://www.geeksforgeeks.org/python/multiple-inheritance-in-python/

Standard API of Mapping Types

The recommendation is to wrap a dict by composition instead of subclassing the Collection, Mapping, MutableMapping ABCs.

Note that because everything ultimately relies on the hastable, the keys must be hashable (doesn’t matter if the value is hashable)

What Is Hashable
- ✅ User Defined Types: for user defined types, the hashcode is the id() of the object and the __eq__ method from the object parent class compares the object ids.
- gotcha: there’s a salt applied to hashing
  And the salt differs across python processes.
  The hash code of an object may be different depending on the version of Python, the machine architecture, and because of a salt added to the hash computation for secu‐ rity reasons.3 The hash code of a correctly implemented object is guaranteed to be constant only within one Python process.

Overview of Common Mapping Methods: using dict, defaultdict and OrderedDict
:NOTER_PAGE: (115 . 0.580146)

Inserting or Updating Mutable Values: when to use setdefault

Should use setdefault when you want to mutate the mapping and there’s nothing there

E.g. you wanna fill in empty default values

so instead of doing this which has 2 searches through the dict index ⛔️

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import re
import sys
WORD_RE = re.compile(r'\w+')

index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            # this is ugly; coded like this to make a point
            occurrences = index.get(word, [])
            occurrences.append(location)
            index[word] = occurrences
# display in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

we could do just do a single search within the dict index and do:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
"""Build an index mapping word -> list of occurrences"""
import re
import sys
WORD_RE = re.compile(r'\w+')
index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
                for match in WORD_RE.finditer(line):
                        word = match.group()
                        column_no = match.start() + 1
                        location = (line_no, column_no)
                        index.setdefault(word, []).append(location)
                        # display in alphabetical order
for word in sorted(index, key=str.upper):
        print(word, index[word])

setdefault returns the value, so it can be updated without requiring a second search.

Automatic Handling of Missing Keys

We have 2 options here.

defaultdict: Another Take on Missing Keys
- it’s actually a callable that we are passing as an arg, so when we do things like bool or list we’re actually passing in the constructor to these builtins.
- callable is stored within the default_factory and we can replace the factory as we wish!
- interesting: if we do a membership check on a key that doesn’t exist, the default factory won’t be called yet.

The missing Method
:PROPERTIES: :NOTER_PAGE: (121 . 0.519175)
TLDR: subclass UserDict instaed of dict to avoid these issues
Take note of the nuances in the implementation that is shown because they avoid infinite recursion.
It’s important to think of how the method delegation may introduce chances of infinite recursion.
Also, same thing for what the fallback methods are for builtin methods.

note: k in my_dict faster than k in my_dict.keys()
Also technically k in my_dict is faster than using the k in my_dict.keys() because it avoids the attribute lookup to find the .keys method.

Inconsistent Usage of missing in the Standard Library
TLDR: subclass UserDict instaed of dict to avoid these issues subclassing builtin types is tricky! (will come up later in the book).
Basically, this dunder method is inconsistently used. Be careful if you wanna subclass this, it may result in infinite recursions.

Variations of dict

collections.OrderedDict
Mostly the modern implementation for dict is good enough
- has some minor differences from the modern implementation of dict:
  - can handle frequent reordering ops better than dict => useful to track recent accesses like in an LRU cache.
- use it for simple implementations of an LRU cache.

collections.ChainMap
- chains together multiple mappings by holding references to the mappings
- any mods happen to the first-inserted mapping
- it’s useful to search hierarchically (search from d1, then d2, then d3) and prioritises the results found in the earlier implementation.

collections.Counter

counters are great!!!

GOTCHA: when things are tied, only one is returned. example if counter looks like this: Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1}) then if we do ct.most_common(3) then even if both b and c are tied, we only get b in the result. Here’s some more context on this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
  The behavior you’re observing with `Counter.most_common(n)`—where only one of the tied elements is included when there’s a tie at the cutoff—is by design, and it’s important to understand the underlying mechanics and implications.

  ### Why does this happen?

  - **`most_common(n)` returns exactly `n` items**: The method returns the *first* `n` elements from the sorted list of items, sorted by count (descending). If multiple elements have the same count at the cutoff position, only those that appear first (according to the internal order, which is typically insertion order for equal counts) are included[1][2][9].
​  - **Ties are not expanded**: If two or more elements are tied for the last spot in the top `n`, only as many as needed to reach `n` are returned. The method does not automatically include all items with the same count as the nth item[6].

  #### Example:

  ```python
  from collections import Counter

  ct = Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
  print(ct.most_common(3))
  # Output might be: [('a', 10), ('z', 3), ('b', 2)]
  ```

  Even though both `'b'` and `'r'` have a count of `2`, only `'b'` is included because you asked for the top 3, and `'b'` appears before `'r'` in the internal ordering[2][6].

  ### How is the order determined for ties?

  - **Insertion order**: For elements with equal counts, their order in the result is based on the order they were first encountered in the input (Python 3.7+ dictionaries preserve insertion order)[2][9].
​  - **No guarantee for all ties**: There is no built-in mechanism to include all elements with the same count as the nth element unless you implement it yourself[6].

  ### What if you want all tied elements included?

  You need to post-process the result. For example, you can:

  1. Get the count of the nth item.
  2. Iterate through all items and include those with counts >= nth count.

  ```python
  from collections import Counter

  ct = Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
  n = 3
  top_n = ct.most_common(n)
  cutoff = top_n[-1][1]
  all_tied = [(k, v) for k, v in ct.items() if v >= cutoff]
  print(all_tied)
  ```
  This will include all items with counts equal to or greater than the nth most common, including ties[6].

  ### Summary

  - `most_common(n)` returns exactly `n` elements, even if there are ties at the last position.
​  - For ties, only the first encountered elements (in insertion order) are included.
​  - To include all tied elements, you must filter manually after getting the cutoff value[2][6][9].

  This behavior is documented and is a common source of confusion—your observation is correct and is a subtle but important aspect of using `Counter.most_common()` in Python.

  [1] https://www.geeksforgeeks.org/python/python-most_common-function/
  [2] https://docs.python.org/3/library/collections.html
  [3] https://www.digitalocean.com/community/tutorials/python-counter-python-collections-counter
  [4] https://stackoverflow.com/questions/29240807/python-collections-counter-most-common-complexity
  [5] https://blog.csdn.net/weixin_43056275/article/details/124384145
  [6] https://stackoverflow.com/questions/33791057/counter-most-common-is-a-little-misleading/33791292
  [7] https://www.youtube.com/watch?v=fqACZvcsNug
  [8] https://dev.to/atqed/you-can-be-happy-to-know-python-counter-how-to-get-the-most-common-elements-in-a-list-o1m
  [9] https://ioflood.com/blog/python-counter-quick-reference-guide/
  [10] https://dev.to/kathanvakharia/python-collections-module-counter-2gn

shelve.Shelf

shelves are for storing pickle jars
shelves are persistent storage for a mapping of strings to pickle objects
A Shelf instance is a context manager, so you can use a with block to make sure it is closed after use.

Ref “Pickle’s nine flaws”

here’s the writeup

And here’s a bot summary of it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Here is a summary of the nine flaws of Python's `pickle` module as detailed by Ned Batchelder[1][2]:

1. **Insecure**
   Pickle is fundamentally unsafe for untrusted data. Maliciously crafted pickles can execute arbitrary code during unpickling, as the process can invoke any callable in the Python environment—not just constructors for your objects.

2. **Old pickles look like old code**
   Pickle stores the exact structure of objects as they existed when pickled. If your code changes (e.g., you add or remove attributes), unpickling old data can result in objects that don't match your current codebase, leading to bugs and inconsistencies.

3. **Implicit**
   The serialization process is implicit and automatic: you cannot explicitly control how objects are serialized. For example, you cannot choose to serialize a datetime as an ISO8601 string instead of a datetime object.

4. **Over-serializes**
   Pickle serializes everything in your objects, including data you may not want to persist (like caches or open file handles). There is no straightforward way to exclude unwanted attributes, and if an attribute is not pickle-able, you'll get an exception[2].

5. **`__init__` isn’t called**
   When unpickling, the `__init__` method of your class is not invoked. This can leave objects in an inconsistent state, especially if `__init__` does essential setup or side effects.

6. **Python only**
   Pickle is Python-specific. While there are rare cross-language implementations, they're limited and not practical for general use. This makes pickled data hard to share with non-Python systems.

7. **Unreadable**
   Pickles are binary blobs, not human-readable. You can't inspect or search them with standard tools, making debugging and data recovery more difficult.

8. **Appears to pickle code**
   Pickle will serialize references to functions and classes, but not their code—only their names. This can give the false impression that code is being serialized, but on unpickling, the code must already exist in the environment.

9. **Slow**
   Pickle is slower than many alternative serialization formats, both in terms of speed and efficiency.

> Some of these issues can be mitigated with custom methods like `__getstate__` or `__reduce__`, but at that point, alternative serialization formats (e.g., JSON, protocol buffers) may be more robust and maintainable[1].

These flaws highlight why `pickle` is best reserved for trusted, Python-only, and short-lived data interchange scenarios—not for general-purpose or cross-system serialization.

[1] https://nedbatchelder.com/blog/202006/pickles_nine_flaws.html
[2] https://nedbatchelder.com/blog/tag/python.html
[3] https://www.python4data.science/en/latest/data-processing/serialisation-formats/pickle/index.html
[4] https://pycoders.com/issues/426
[5] https://stanforddaily.com/2019/10/11/face-it-pickles-are-bad-an-irrefutable-proof/
[6] https://content.ces.ncsu.edu/pickle-and-pickle-product-problems
[7] https://diff.blog/post/pickles-nine-flaws-49891/
[8] https://pythonbytes.fm/episodes/show/193/break-out-the-django-testing-toolbox
[9] https://www.reddit.com/r/Python/comments/1c5l9px/big_o_cheat_sheet_the_time_complexities_of/
[10] https://podscripts.co/podcasts/python-bytes/189-what-does-strstrip-do-are-you-sure

Subclassing UserDict Instead of dict
- key idea here is that it uses composition and keeps an internal dict within the data attribute
- implementing other functions as we extend it will require us to use the self.data attribute.

Immutable Mappings

we can use a read-only MappingProxyType from the types module to expose a readonly proxy
the constructor in a concrete Board subclass would fill a private mapping with the pin objects, and expose it to clients of the API via a public .pins attribute implemented as a mappingproxy. That way the clients would not be able to add, remove, or change pins by accident.

Dictionary Views

the views are supposed to be proxies as well so they are updated. Any changes to the original mapping will be viewable as well
because they are not sequences (they are view objects) they are not subscript-able. so doing something like myvals[0] won’t work. If we wish, we could convert it to a list, but then it’s a copy, it’s no longer a live dynamic read-only proxy.

Practical Consequences of How dict Works

why we should NOT add instance attrs outside of __init__ functions
That last tip about instance attributes comes from the fact that Python’s default behavior is to store instance attributes in a special dict attribute, which is a dict attached to each instance.9 Since PEP 412—Key-Sharing Dictionary was implemented in Python 3.3, instances of a class can share a common hash table, stored with the class. That common hash table is shared by the dict of each new instance that has the same attributes names as the first instance of that class when init returns. Each instance dict can then hold only its own attribute values as a simple array of pointers. Adding an instance attribute after init forces Python to create a new hash table just for the dict of that one instance
also KIV the implementation of __slots__ and how that is even better of an optimisation.

Set Theory

As we had found out from the extension writeup, the intersection operator is a great oneliner found = len(needles & haystack) or found = len(set(needles) & set(haystack)) to be more generalisable (though there’s the overhead from building the set)

Set Literals
- using the set literal ({1,2,3}) for construction is faster than using the constructor (set([ 1,2,3 ])) because the constructor will have to do a key lookup to fetch the function
- the literal directly uses a BUILDSET bytecode

Set Comprehensions
- looks almost the same as dictcomps

Practical Consequences of How Sets Work

Set Operations

Set Operations on dict Views

.keys() and .items() are similar to frozenset
- .values() may work like this too but only if all the values in the dict are hashable
Even better: the set operators in dictionary views are compatible with set instances.

Chapter Summary

Chapter 4. Unicode Text Versus Bytes

What’s New in This Chapter

Character Issues

“string as a sequence of characters” needs the term “character” to be defined well
in python 3, it’s “unicode
Unicode char separates:
- identity of the char => refers to its code point
- the byte representation for the char => dependent on the encoding used (codec between code points and byte sequences)

Byte Essentials

binary sequences, there are 2 builtin types:
- mutable: bytearray
- immutable: byte
Each item in bytes or bytearray is an integer from 0 to 255
literal notation depends (just a visual representation thing):
- if in ascii range, display in ascii
- if it’s a special char like tab and such, then escape it
- if amidst apostrophes, then use escape chars
- else just use the hex notation for it e.g. \x100
most functions work the same, except those that do formatting and those that depend on unicode data so won’t work:
- case, fold
regexes work the same only if regex is compiled from a binary sequence instead of a str
how to build bytes or bytearray :
1. use bytes.fromhex()
2. use bytes.encode(“mystr”, encoding=“utf-8”)
3. use soemthing that implements buffer protocol to create from source object to new binary sequence (e.g. memoryview).
  - This needs us to explicitly typecast
  - generally, this will also always copy the bytes from the source – except for memoryview which is a direct proxy to the memory

Indexing vs slicing & why str is special

Strings are really the special kind here.

For bytes, bytearray, list, etc., seq returns a single element (e.g., an int), while seq[:1] returns a new sequence of the same type containing just that element. This is different from str, where both return a string. This distinction is important when working with binary data or non-string sequences in Python.

More descriptive:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Let's break down the meaning of this passage from "Fluent Python" and the behavior it describes.

## The Key Point

> For all other sequence types in Python, 1 item is not the same as a slice of length 1.

This means:
- **Accessing a single element** (e.g., `seq`) returns a single item (e.g., an int, a character, etc.).
- **Taking a slice of length 1** (e.g., `seq[:1]` or `seq[0:1]`) returns a new sequence of the same type, containing just that one item.

For most sequences in Python, **these two operations do not return the same type or value**.

### The Exception: `str`

For strings, `s` and `s[:1]` both return a string (the first character), so they look the same:

```python
s = "hello"
print(s[0])   # 'h'
print(s[:1])  # 'h'
print(type(s[0]), type(s[:1]))  #
```

### The Rule: Other Sequences

For other sequence types (e.g., `bytes`, `bytearray`, `list`, `tuple`):

- **Indexing** returns a single element (not a sequence).
- **Slicing** returns a sequence of the same type.

#### Example: `bytes`

```python
cafe = bytes('café', encoding='utf_8')
print(cafe)      # b'caf\xc3\xa9'
print(cafe[0])   # 99  (int: ASCII code for 'c')
print(cafe[:1])  # b'c'  (bytes of length 1)
print(type(cafe[0]), type(cafe[:1]))  #
```
- `cafe` gives you the integer value of the first byte (`99`).
- `cafe[:1]` gives you a `bytes` object of length 1: `b'c'`.

#### Example: `bytearray`

```python
cafe_arr = bytearray(cafe)
print(cafe_arr[-1])    # 169 (int: last byte)
print(cafe_arr[-1:])   # bytearray(b'\xa9') (bytearray of length 1)
```
- `cafe_arr[-1]` is an integer (the last byte).
- `cafe_arr[-1:]` is a `bytearray` object of length 1.

#### Example: `list`

```python
lst = [10, 20, 30]
print(lst[0])   # 10
print(lst[:1])  # [10]
```
- `lst` is an int.
- `lst[:1]` is a list of length 1.

## Why Is This Surprising?

- In Python's `str` type, `s` and `s[:1]` both return a string (the first character), which is a bit special.
- For all other sequences, **indexing returns a single element (not a sequence), slicing returns a sequence**.

## Mental Model

- **Indexing** (`seq[i]`): Returns the element at position `i` (type depends on the sequence).
- **Slicing** (`seq[i:j]`): Returns a new sequence of the same type, containing elements from `i` to `j-1`.

## References

- [Fluent Python, 2nd Edition, Chapter 4: Text versus Bytes](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/)
- [Python Data Model: Sequence Types](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)

**In summary:**
For `bytes`, `bytearray`, `list`, etc., `seq` returns a single element (e.g., an int), while `seq[:1]` returns a new sequence of the same type containing just that element. This is different from `str`, where both return a string. This distinction is important when working with binary data or non-string sequences in Python.

Basic Encoders/Decoders

Highlight on page 153
- Contents
  Each codec has a name, like ‘utf_8’
- Comment
  so utf-8 is a codec here

Highlight on page 154
- Contents
  like ASCII and even the multibyte GB2312, cannot represent every Unicode character. The UTF encod‐ ings, however, are designed to handle every Unicode code point.

Understanding Encode/Decode Problems

Errors that we can expect:

(generic) UnicodeError
1. UnicodeDecodeError
2. UnicodeEncodeError
When loading libraries, might end-up facing SyntaxError also because of encoding issues

Coping with UnicodeEncodeError

the error handlers for encoding error can include 'xmlcharrefreplace'. What this does is XML character reference: &#<unicode code point> and in so doing, there’s no loss of that information Here’s more context on it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
  The statement from "Fluent Python"—

  > 'xmlcharrefreplace' replaces unencodable characters with an XML entity. If you can’t use UTF, and you can’t afford to lose data, this is the only option.

  —means that when you encode a string using a limited encoding (like ASCII) and specify `errors='xmlcharrefreplace'`, **any character that cannot be represented in the target encoding is replaced with an XML numeric character reference** (e.g., `&#233;` for "é"). This ensures that **no information is lost**: all original characters are either encoded directly (if possible) or represented as XML entities, which are reversible.

  ### How does it work?

  - When encoding, Python checks each character:
​      - If the character can be encoded in the target encoding (e.g., ASCII), it is kept as-is.
​      - If it cannot, it is replaced with its XML character reference: `&#;`
​  - When decoding, you can later convert these references back to the original characters, so the process is *lossless* in terms of information content.

  #### Example

  ```python
  txt = "Café"
  encoded = txt.encode("ascii", errors="xmlcharrefreplace")
  print(encoded)  # b'Caf&#233;'
  ```
  Here, "é" (which is not in ASCII) is replaced with `&#233;`, preserving the character information[2][7].

  ### Why is there no data loss?

  - **All original characters are represented:** Characters that can't be encoded are replaced with their numeric reference, which uniquely identifies the character.
​  - **Reversible:** You can later parse the XML entities back into the original Unicode characters, restoring the original string[1][6].

  ### Contrast with other error handlers

  - `'replace'` swaps unencodable characters for `?` (data loss).
​  - `'ignore'` simply omits them (data loss).
​  - `'backslashreplace'` uses Python escape sequences (reversible, but not standard in XML/HTML).
​  - `'xmlcharrefreplace'` uses XML/HTML-compatible numeric references (reversible, and standard for text interchange).

  ### Practical implication

  If you must encode text in a limited character set (like ASCII or Latin-1) but need to ensure that all characters are preserved in some form (for later recovery or interoperability), `'xmlcharrefreplace'` is the safest choice[4][6][7].

  **In summary:**
  Using `'xmlcharrefreplace'` means that **no original character data is lost**—all characters are either encoded directly or replaced with a reversible XML entity. This is why the book says it is the only option if you can't use UTF and can't afford to lose data.

  [1] https://stackoverflow.com/questions/44293891/python-string-encoding-xmlcharrefreplace-decode
  [2] https://www.w3schools.com/python/ref_string_encode.asp
  [3] https://docs.python.org/3/howto/unicode.html
  [4] https://docs.python.org/3/library/codecs.html
  [5] https://www.codecademy.com/resources/docs/python/strings/encode
  [6] https://code.activestate.com/recipes/303668-encoding-unicode-data-for-xml-and-html/
  [7] https://www.geeksforgeeks.org/python/python-strings-encode-method/
  [8] https://www.digitalocean.com/community/tutorials/python-string-encode-decode
  [9] https://labex.io/tutorials/python-what-is-the-role-of-the-encoding-and-errors-parameters-in-the-str-function-in-python-395133
  [10] https://docs.vultr.com/python/standard-library/str/encode

Coping with UnicodeDecodeError
- Highlight on page 156
  - Contents
    On the other hand, many legacy 8-bit encodings like ‘cp1252’, ‘iso8859_1’, and ‘koi8_r’ are able to decode any stream of bytes, including random noise, without reporting errors. Therefore, if your program assumes the wrong 8-bit encoding, it will silently decode garbage.
  - Comment
    utf8/16 will sound off because it’s a strict error check
    the older 8bit codecs will do it silently
- Highlight on page 157
  - Contents
    “�” (code point U+FFFD), the official Unicode REPLACEMENT CHARACTER intended to represent unknown characters.
  - Comment
    there’s an official REPLACEMENT CHARACTER

SyntaxError When Loading Modules with Unexpected Encoding
- utf8 default for python source code
- fix this by defining explicitly what encoding type to use at the top of the file when writing that file out.
  1
  # coding: cp1252
  OR just fix it by converting to UTF-8

How to Discover the Encoding of a Byte Sequence
- you can’t but you can make a good guess
- chardet exists for this reason, it’s an estimated detection of the encoding type.
- Highlight on page 159
  - Contents
    human languages also have their rules and restrictions, once you assume that a stream of bytes is human plain text, it may be possible to sniff out its encoding using heuristics and statistics. For example, if b’\x00’ bytes are common, it is probably a 16- or 32-bit encoding, and not an 8-bit scheme, because null characters in plain text are bugs. When the byte sequence b’\x20\x00’ appears often, it is more likely to be the space character (U+0020) in a UTF-16LE encoding, rather than the obscure U+2000 EN QUAD character—whatever that is. That is how the package “Chardet—The Universal Character Encoding Detector” works to guess one of more than 30 supported encodings. Chardet is a Python library that you can use in your programs, but also includes a command-line utility, charde tect.
  - Comment
    typically an encoding is declared – so you have to be told what encoding it is
    however, it’s possible to guess probabilistically what the encoding could be.
    there are packages for that (Chardet)

BOM: A Useful Gremlin
- Byte-Order Mark: helps us know if the machine that the encoding was performed on is little or big endian.
- endianness becomes a problem only for any encoding format that takes more than a byte (so for UTF-16 and UTF-32) ==> so BOM only matters for them
- so BOM not needed for UTF-8
- but it can still be added in (discouraged though)
- Highlight on page 160
  - Contents
    UTF-16 encoding prepends the text to be encoded with the special invisible character ZERO WIDTH NO-BREAK SPACE (U+FEFF).
- Highlight on page 160
  - Contents
    This whole issue of endianness only affects encodings that use words of more than one byte, like UTF-16 and UTF-32
- Highlight on page 161
  - Contents
    using UTF-8 for general interoperability. For example, Python scripts can be made executable in Unix systems if they start with the comment: #!/usr/bin/env python3. The first two bytes of the file must be b’#!’ for that to work, but the BOM breaks that con‐ vention. If you have a specific requirement to export data to apps that need the BOM, use UTF-8-SIG but be aware that Python’s codecs documentation says: “In UTF-8, the use of the BOM is dis‐ couraged and should generally be avoided.”
  - Comment
    use UTF-8-SIG because will be harmless
    also note that the python codecs documentation says that in utf8, using a BOM (byte order mark) is discouraged.

Handling Text Files & the “Unicode Sandwich”

Here’s the gist of why it’s “unicode sandwich”

decode bytes on input
process text only (the meat of the sandwich is the business logic that should use strings)
encode text on output

The best practice for handling text I/O is the “Unicode sandwich” (Figure 4-2).5 This means that bytes should be decoded to str as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or decoding in the middle of other processing. On output, the str are encoded to bytes as late as possible.

Highlight on page 161
- Contents
  e best practice for handling text I/O is the “Unicode sandwich” (Figure 4-2).5 This means that bytes should be decoded to str as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or decoding in the middle of other processing. On output, the str are enco‐ ded to bytes as late as possible.
- Comment
  Unicode sandwhich is the best practices for handling text files and their encoding:
  1. bytes -> str (decode bytes as early as possbile, i.e. on input)
  2. process text only in the business logic
  3. encode text on output only

Highlight on page 162
- Contents
  Code that has to run on multiple machines or on multiple occasions should never depend on encoding defaults. Always pass an explicit encoding= argument
- Comment
  cross-platform code should always explicitly define the encoding value!
  so unix machines will use utf-8 but then when using, say, a windows machine there might be an encoding issue becaue

Highlight on page 163
- Contents
  TextIOWrapper with the encoding set to a default from the local

Beware of Encoding Defaults
even within say windows itself, not every application would have the same encoding.
for unix it’s more standardised, so it’s most likely expected to be utf-8
- Defaults
  Main thing to remember is that the most important encoding setting is the one that is retired by locale.getpreferredencoding()
  The changes can be effected by changing the environment variables.

Normalizing Unicode for Reliable Comparisons

canonical equivalents exist, but they have different code points under the hood.
there’s a bunch of different normalisation forms, for extra safety, when saving strings, should normalise that string (using NFC normalistaion for example)
gotcha: some single characters can be normalised to result in visually similar but they compare different
string normalisation can be lossy, so there can be actual dataloss from multiple sandwhich creation, destruction, creation
- NFKC and NFKD are examples of such normalisation forms - these forms should only be used for intermediate representations for search & index

NFC is not sufficient for search and indexing because it preserves compatibility distinctions that are irrelevant (and even counterproductive) for matching. NFKC/NFKD are used as intermediate representations for search and indexing because they erase these distinctions, enabling robust, user-friendly search behavior—at the cost of losing some original form information, which is why they are not used for storage or display. See more info here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
  To understand why **NFC normalization is not always suitable for search and indexing**, and why compatibility forms like **NFKC/NFKD** are often used as intermediate representations for these purposes, let's clarify the properties and goals of each normalization form and their implications for search/index use cases.

  ### **NFC vs. NFKC: What’s the Difference?**

  - **NFC (Normalization Form C, Canonical Composition):**
​    - Collapses canonically equivalent sequences into a single, composed form.
​    - Preserves distinctions between characters that are *compatible* but not *canonically equivalent* (e.g., ligatures, superscripts, full-width vs. half-width characters).
​    - Designed to be *lossless* for textual content, so that round-tripping (normalize, then denormalize) does not lose data[4][7].

  - **NFKC (Normalization Form KC, Compatibility Composition):**
​    - Collapses both canonically equivalent and *compatibility equivalent* sequences.
​    - This means it will, for example, convert ligatures like 'ﬁ' (U+FB01) to 'fi', or full-width Latin letters to their standard forms.
​    - This process is **lossy**: information about the original form (e.g., that a ligature or superscript was used) is lost[4][7].

  ### **Why Not Use NFC for Search and Indexing?**

  **NFC is designed to preserve distinctions that are meaningful in text rendering or data storage, but are often *not* meaningful for search and indexing.**
  For example:
​  - The string "ﬁeld" could be encoded as:
​    - `U+0066 U+0069 U+0065 U+006C U+0064` ("field")
​    - `U+FB01 U+0065 U+006C U+0064` ("ﬁeld" with the ligature 'ﬁ')
​  - **NFC will *not* turn the ligature into "fi"; it will keep the distinction.**
    This means a search for "field" will not match a document containing "ﬁeld" (with the ligature) if both are normalized to NFC[4][7].

  **For search and indexing, you often want:**
​  - "field" and "ﬁeld" to be treated as equivalent.
​  - Full-width and half-width forms, superscripts, circled numbers, etc., to be treated as their base forms for matching.

  **NFKC/NFKD normalization does this:**
​  - It "folds" away these compatibility distinctions, making visually or semantically similar forms compare equal[1][4][7].

  ### **Special Role of Compatibility Normalization for Search/Index**

  - **Intermediate representation**:
    You normalize both the indexed data and the search queries to NFKC/NFKD, so that *compatibility* distinctions are erased and matches are more robust.
​  - **Not for storage or display**:
    Because NFKC/NFKD is lossy, you shouldn't use it for storing or displaying user data—only for the internal process of searching and indexing[1][4].

  ### **Summary Table**

  | Use Case        | Recommended Normalization | Rationale                                                    |
  |-----------------|--------------------------|--------------------------------------------------------------|
  | Storage/Display | NFC                      | Preserves all distinctions, lossless for most text           |
  | Search/Index    | NFKC/NFKD                | Folds away compatibility differences for robust matching      |

  ### **References from Results**
​  - "NFC removes the distinction between equivalent characters, while preserving the distinction between compatible characters or sequences; NFKC removes the distinction between both equivalent and compatible sequences. NFC conversion is not considered lossy, but NFKC conversion is." [4]
​  - "It's more explicitly lossy; the compatibility forms are recommended for performing searches since they fold additional characters such as plain latin and 'full-width' latin letters." [1]
​  - "The compatibility normalizations are super useful, because ... you often want a search query for `ffi` to match on `ﬃ` (and vice versa). But they are intended to lose symbolic representation (ﬃ and ffi are now the same thing with no way to distinguish), where the canonical normalizations are not." [7]

  ### **In summary**

  **NFC** is not sufficient for search and indexing because it preserves compatibility distinctions that are irrelevant (and even counterproductive) for matching.
  **NFKC/NFKD** are used as intermediate representations for search and indexing because they erase these distinctions, enabling robust, user-friendly search behavior—at the cost of losing some original form information, which is why they are not used for storage or display.

  [1] https://www.mediawiki.org/wiki/Unicode_normalization_considerations
  [2] https://stackoverflow.com/questions/15985888/when-to-use-unicode-normalization-forms-nfc-and-nfd
  [3] https://unicode.org/reports/tr15/
  [4] https://jazz.net/wiki/bin/view/LinkedData/UseOfUnicodeNormalForms
  [5] https://www.w3.org/wiki/I18N/CanonicalNormalizationIssues
  [6] https://blog.reeset.net/archives/2532
  [7] https://news.ycombinator.com/item?id=19379965
  [8] https://go.dev/blog/normalization
  [9] https://www.reddit.com/r/programming/comments/b09c0j/when_zo%C3%AB_zo%C3%AB_or_why_you_need_to_normalize_unicode/
  [10] https://unicode-org.github.io/icu/design/normalization/custom.html

Notes for page 140 V: 39% H: 25%
sequences like ‘é’ and ’e\u0301’ are called “canonical equivalents,” and applications are supposed to treat them as the same. But Python sees two different sequences of code points, and considers them not equal.

Notes for page 140 V: 82% H: 50%
it may be good to normalize strings with normalize(‘NFC’, user_text) before saving.

Case Folding (normalisation tranformation)
- folding everything into lowercase
- NOTE: casefold() and str.lower() have ~ 300 code points that return different results

Utility Functions for Normalized Text Matching
util functions that might help:
- nfc_equal
- fold_equal

Extreme “Normalization”: Taking Out Diacritics
- google search uses this aggressive normalisation based on real world attention that people give to diacritics
- also helps for readable URLs (e.g for latin-based languages)
- one way to call this transformation is “shaving”. We “shave” the diacritics

Sorting Unicode Text

python sorts by comparing sequences one by one
for strings, it compares code points
so to sort non-ascii text in python, have to use local.strxfrom to have locale-aware comparisons

Sorting with the Unicode Collation Algorithm

stdlib solution: there’s a locale.strxfrm to do locale-specific comparisons

Python is to use the locale.strxfrm function which, according to the locale module docs, “transforms a string to one that can be used in locale-aware comparisons.”

1
2
3
4
5
6
import locale
my_locale = locale.setlocale(locale.LC_COLLATE, 'pt_BR.UTF-8')
print(my_locale)
fruits = ['caju', 'atemoia', 'cajá', 'açaí', 'acerola']
sorted_fruits = sorted(fruits, key=locale.strxfrm)
print(sorted_fruits)

use the Unicode Collation Algorithm via pyuca lib

The Unicode Database

Db is in the form of multiple text files.

Contains:

code point to char name mappings
metadata about the individual characters and how they are related.

That’s how the str methods isalpha, isprintable, isdecimal, and isnumeric work.

Finding Characters by Name
use name() function from the unicodedata library

Numeric Meaning of Characters
Some useful string functions here:
1. .isnumeric()
2. .isdecimal()
comparisons with the human meaning of these rather than the code point.

common string functions may lookup this unicode database
This is responsible for the string functions like isdecimal isnumeric…
the Unicode database records whether a character is printable, is a letter, is a decimal digit, or is some other numeric symbol. That’s how the str methods isal pha, isprintable, isdecimal, and isnumeric work. str.casefold also uses infor‐ mation from a Unicode table.

Dual-Mode str and bytes APIs

str Versus bytes in Regular Expressions
- if given bytes patterns like \d and \w will only match ASCII characters
- if given str patterns like \d and \w will only match beyond just ASCII characters.
to make one point: you can use regular expressions on str and bytes, but in the second case, bytes outside the ASCII range are treated as nondigits and nonword characters.

regex patterns using bytes will treat outside-ASCII range chars as nondigits and nonword chars
trivial example to make one point: you can use regular expressions on str and bytes, but in the second case, bytes outside the ASCII range are treated as nondigits and nonword characters.

str Versus bytes in os Functions
- os functions actually abide by the Unicode Sandwich: they actually call sys.getfilesystemencoding() as soon as they can

Chapter Summary

remember that 1 char == 1 byte is only true if it’s utf-8, there’s more than just that.
just always be explicit about encodings when reading them Follow the unicode sandwich and ensure the encoding is explicit always.
Unicode provides multiple ways of representing some characters, so normalizing is a prerequisite for text matching.

Chapter 5. Data Class Builders

:NOTER_PAGE: (193 . 0.108844)

I think my stance on using data classes is that it should help mock things easily to come up with scaffolds which are easy to replace.

It’s interesting that the type hinting for class vs instance attributes ended up needing to use pseudoclasses specific for this purpose (ClassVar, InitVar)

Link on page 194: typing module documentation

What’s New in This Chapter

Overview of Data Class Builders

Problem posed:
- __init__ constructor can become too complex if we’re just going to assign attributes from constructor parameters
3 options:
- collections.namedtuple
- typing.NamedTuple
  - newer than namedtuple
- @dataclass decorator from dataclasses module

How they work:
- they don’t rely on inheritence
- typing hints are there if we use NamedTuple or dataclass
- some of them are subclasses of tuple
- All of them use metaprogramming techniques to inject methods and data attributes into the class under construction.
- Some of them are more updated ways of doing things: typed.NamedTuple is newer than namedtuple

Examples:

Named tuple:

define inline Coordinate = typing.NamedTuple('Coordinate', lat=float, lon=float)

defined with a class statement Although here, NamedTuple is not a superclass, it’s actually a metaclass

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
      from typing import NamedTuple
      class Coordinate(NamedTuple):
              lat: float
              lon: float

              def __str__(self):
                      ns = 'N' if self.lat >= 0 else 'S'
                      we = 'E' if self.lon >= 0 else 'W'

              return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

Using dataclass

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    from dataclasses import dataclass
    @dataclass(frozen=True)
    class Coordinate:
            lat: float
            lon: float

            def __str__(self):
                ns = 'N' if self.lat >= 0 else 'S'
                we = 'E' if self.lon >= 0 else 'W'

            return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

Main Features
- Link on page 198: inspect.get_annotations(MyClass)
- Link on page 198: typing.get_type_hints(MyClass)
- Mutability
  Out of the 3, only @dataclass allows us to keep the class mutable (if we need we an mark it as frozen btw).
  The rest, since they are subclasses of tuple are immutable.
  For the immutable ones, we can replace the object using replace functions.
- NamedTuple as a metaclass customization of a class def
  Although NamedTuple appears in the class statement as a super‐ class, it’s actually not. typing.NamedTuple uses the advanced func‐ tionality of a metaclass2 to customize the creation of the user’s class.
- Correctly reading type hints @ runtime
  It will be discussed in more detail later in the book
  reading from annotations directly is not recom‐ mended. Instead, the recommended best practice to get that information is to call inspect.get_annotations(MyClass) (added in Python 3.10) or typing.get_type_hints(MyClass) (Python 3.5 to 3.9). That’s because those functions provide extra services, like resolving forward references in type hints.

Classic Named Tuples

collections.namedtuple is a factory function
So it’s possible to hack things by adding functions to this subclass.
collections.namedtuple function is a factory that builds subclasses of tuple enhanced with field names, a class name, and an informative repr.

Memory Use by collections.namedtuple
There’s no excess mem usage because it’s the class that will store the attribute names
So it’s same space usage as a tuple.
Each instance of a class built by namedtuple takes exactly the same amount of memory as a tuple because the field names are stored in the class.

Injecting methods into the subclass

this is a hack, shouldn’t be relied upon.

NOTE: No need to name the first arg as self if you’re hacking things by injecting methods

the first argument doesn’t need to be named self. Anyway, it will get the receiver when called as a method.

normal classes method definition, self attribute is the receiver

just some extra information about what the receiver is in the context of defining class methods in python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
In Python, the **self** parameter in class methods is the conventional name for the **receiver**—the object instance that the method is being called on. While Python documentation and community almost always use the term "self," in some other object-oriented languages (like Ruby or in theoretical OOP discussions), "receiver" is the standard term for the object that receives the method call.

### What is the "receiver"?

- The **receiver** is the specific instance of the class on which the method is invoked.
- In Python, when you define a method like `def foo(self, ...)`, the `self` parameter is automatically bound to the instance when you call `instance.foo(...)`[6][1][7].
- This allows each method to access and modify the attributes and other methods of the particular object instance, not just the class as a whole[6][1].

### How does it work?

- When you call `obj.method(args)`, Python translates this to `Class.method(obj, args)`. The object `obj` is passed as the first argument to the method, and inside the method, it is referred to as `self`[6][1][7].
- This is how methods can operate on instance-specific data: `self.attribute` refers to the attribute named `attribute` on the particular instance, not on the class or on other instances[1][6][7].

#### Example:

```python
class Car:
    def __init__(self, model):
        self.model = model  # self is the receiver

    def show(self):
        print("Model is", self.model)

audi = Car("Audi A4")
audi.show()  # 'self' inside show() refers to the audi instance
```

- Here, `audi.show()` is equivalent to `Car.show(audi)`. The `audi` object is the receiver.

### Why is the receiver important?

- It enables **object-specific state and behavior**. Each instance maintains its own data, and methods can access or mutate that data through `self`[1][6][7].
- It allows methods to call other methods on the same object, e.g., `self.other_method()`[3].
- Without the receiver, methods would not know which instance's data to operate on, and all data would have to be global or class-level.

### How does Python implement this?

- When you retrieve a method from an instance (e.g., `method = obj.method`), Python returns a **bound method**—a function object with the receiver (`obj`) already attached as its first argument[5][8].
- When you call the bound method, you only need to supply the remaining arguments; `self` is supplied automatically[5].

### Summary

- **self** is the Pythonic name for the **receiver** of a method call in a class.
- It refers to the specific instance on which the method was called.
- It gives methods access to instance-specific data and behavior, enabling true object-oriented programming in Python[1][6][7].

> “The 'self' parameter in Python class methods represents the class instance (object). Methods can access and manipulate attributes (variables) and call other methods of the same class using this special variable.”[6]

This is a foundational mechanism for encapsulation and polymorphism in Python’s object model.

[1] https://www.geeksforgeeks.org/python/self-in-python-class/
[2] https://www.reddit.com/r/learnpython/comments/k9f4q7/could_someone_explain_the_use_of_self_when_it/
[3] https://docs.python.org/3/tutorial/classes.html
[4] https://stackoverflow.com/questions/14671218/python-class-methods-changing-self
[5] https://stackoverflow.com/questions/70076256/how-do-python-handles-self-attribute-of-a-class-internally-when-the-method-is
[6] https://www.w3resource.com/python-interview/explain-the-purpose-of-the-self-parameter-in-python-class-methods.php
[7] https://blog.finxter.com/understanding-the-self-in-python-classes/
[8] https://docs.python.org/3/reference/datamodel.html
[9] https://softwareengineering.stackexchange.com/questions/422364/is-it-better-to-pass-self-or-the-specific-attributes-in-python-methods
[10] https://www.pythonmorsels.com/what-is-self/

Typed Named Tuples

compile-time type annotations: the main feature of named tuples
Classes built by typing.NamedTuple don’t have any methods beyond those that col lections.namedtuple also generates—and those that are inherited from tuple. The only difference is the presence of the annotations class attribute—which Python completely ignores at runtime.

Typed Named Tuples

the type annotations are ignored by python at runtime

Type Hints 101

No Runtime Effect
- Type hints not enforced by compiler & interpreter
  - main intent is for use by static analysis tools, at rest
  The first thing you need to know about type hints is that they are not enforced at all by the Python bytecode compiler and interpreter.
- Works at import time!
  that’s why importing libraries may fail.

Variable Annotation Syntax
- variable here refers to the fact that variables are being annotated, not that the type hint is variable.
- the syntax is just var_name: some_type = a_default_value

The Meaning of Variable Annotations
- For classic class definitions, survival of annotations & survival of attributes within annotations
  :NOTER_PAGE: (206 . 0.086168)
  This applies to the classic class definitions, without the named tuples and such.
  This makes sense because there’s no reason to keep the annotations.
  surviving of annotation <== if there’s a type hint given
  surviving of the attribute in the class <== if there’s a value assignable
  Note that the annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class. The a survives only as an annotation. It doesn’t become a class attribute because no value is bound to it.6 The b and c are stored as class attributes because they are bound to values.
- Annotations are type annotations for immutable attributes
  This is because NT is extended from Tuple class.
  - Contents
    If you try to assign values to nt.a, nt.b, nt.c, or even nt.z, you’ll get Attribute Error exceptions with subtly different error messages. Try that and reflect on the messages.
  - Comment
    Because it’s read-only instance attribute and it’s expected to be immutable
- using the @dataclass decorator allows the attrs to persist as instance attributes
  :NOTER_PAGE: (208 . 0.488788)
  - Contents
    However, there is no attribute named a in DemoDataClass—in contrast with DemoNTClass from Example 5-11, which has a descriptor to get a from the instances as read-only attributes (that myste‐ rious <_collections._tuplegetter>). That’s because the a attribute will only exist in instances of DemoDataClass. It will be a public attribute that we can get and set, unless the class is frozen. But b and c exist as class attributes, with b holding the default value for the b instance attribute, while c is just a class attribute that will not be bound to the instances.
  - Comment
    when using a decorator, the descriptor for the class that is ONLY type-hinted will only exist in concrete instances of that class.
- annotation special attr are for type hints
  annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class.

More About @dataclass

Don’t set a custom attribute outside of its constructor function!
:NOTER_PAGE: (209 . 0.862182)
- Contents
  Setting an attribute after init defeats the dict key-sharing memory optimization mentioned in “Practical Consequences of How dict Works” on page 102.
- Comment
  Reminder: all the attrs for a class should really just be defined within the class itself to benefit from the memory optimisation that it comes with by default

immutability is emulated by methods
Which means it can be bypassed by overriding the implementation of these functions! (the settattr and deattr dunder methods)
emulates immutability by generating setattr and delattr, which raise data class.FrozenInstanceError

Field Options

WARNING: mutable defaults are NOT allowed.
similar to the assignment gotchas where if we do my arr = [[] * 3], reusing a mutable reference (the inner list) means that the 3 instances all point to the same memory location
we can how that would be a problematic bug
therefore, it’s illegal to set default values that are mutable when we use dataclasses.
we can use default_factory as a solution to this.

default_factory helps prevent mutability bugs
- if a default value is provided that is mutable, then it would mean that many instances can edit the same mutable handle ==> this is a problematic bug. That’s why the default option is only to pass a factory function if you want to assign mutable default values so that each mutable default is a separate reference.
- but this won’t apply to custom mutable objects, that’s why it’s a common source of mutable data related bugs l
The default_factory parameter lets you provide a function, class, or any other call‐ able, which will be invoked with zero arguments to build a default value each time an instance of the data class is created. This way, each instance of ClubMember will have its own list—instead of all instances sharing the same list from the class, which is rarely what we want and is often a bug.

mental model for sentinel values

``sentinel value’’

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
### Understanding Sentinel Values

**Sentinel values** are special values used in programming to signify that a variable or parameter is missing, not set, or has a specific condition that needs to be distinguished from valid values. They serve as markers that help identify states in data structures or function arguments without conflicting with legitimate data.

#### Key Characteristics of Sentinel Values

1. **Uniqueness**: A sentinel value is typically unique and not used as a regular value in the context where it is applied. This uniqueness allows it to serve as a clear indicator of absence or a specific condition.

2. **Common Usage**:
   - **Function Arguments**: In functions, sentinel values can indicate that an optional parameter was not provided. For example, in Python, you might use `None` or a custom sentinel object to differentiate between "no value" and "a value of None".
   - **Data Structures**: In data structures like linked lists or trees, sentinel nodes may be used to simplify boundary conditions (e.g., head and tail pointers).

3. **Examples**:
   ```python
   # Using None as a sentinel
   def fetch_data(key, default=None):
       if key not in my_dict:
           return default  # Return the sentinel if key is missing
       return my_dict[key]

   # Using a custom sentinel
   MISSING = object()
   def get_value(key, default=MISSING):
       if key not in my_dict:
           if default is MISSING:
               raise KeyError(f"{key} not found")
           return default
       return my_dict[key]
   ```

### Mental Model for Sentinel Values

To form a mental model for sentinel values, consider the **concept of markers or flags** in various contexts:

1. **Etymological Basis**: The term "sentinel" originates from the Latin word "sentinella," meaning "to watch." Just as sentinels guard and signal the presence or absence of something important, sentinel values act as indicators in programming. They help you "watch" for specific conditions in your code.

2. **Mental Model**: Think of sentinel values as **flags on a map**:
   - Imagine a treasure map where certain locations are marked with flags indicating whether they contain treasure (valid data) or are empty (missing data).
   - Just like those flags help you navigate the map without confusion, sentinel values guide your logic by clearly indicating when something is absent or when a specific condition applies.

3. **Practical Application**: When designing functions or data structures, consider how you can implement sentinel values to handle edge cases gracefully. This approach can prevent errors and make your code more robust by explicitly managing conditions that would otherwise lead to ambiguity.

### Summary

Sentinel values are special markers used to indicate missing or special conditions in programming. They provide clarity and help manage edge cases effectively. By conceptualizing them as flags on a map, you can better understand their role in guiding logic and maintaining code integrity. This mental model emphasizes the importance of clear indicators in navigating complex data scenarios, much like sentinels watch over critical points of interest.

Citations:
[1] https://www.reddit.com/r/slatestarcodex/comments/83phqz/what_mental_models_do_you_find_to_be_most_helpful/
[2] https://fs.blog/mental-models/
[3] https://pmc.ncbi.nlm.nih.gov/articles/PMC8290163/
[4] https://jamesclear.com/feynman-mental-models
[5] https://www.linkedin.com/pulse/10x-your-results-mental-models-mark-green
[6] https://forum.valuepickr.com/t/scalability-thinking-mental-models/104746
[7] https://www.safalniveshak.com/latticework-mental-models-active-reading/
[8] https://learn.microsoft.com/ja-jp/globalization/encoding/unicode-standard

Post-init Processing
Allows us to insert logic after the contructor, to do things like calculations and such.
- Highlight on page 213
  - Contents
    method generated by @dataclass only takes the arguments passed and assigns them—or their default values, if missing—to
- Highlight on page 213
  - Contents
    Common use cases for post_init are validation and computing field values based on other fields.
- Link on page 214: “Inheritance” section of the dataclasses module documentation

Typed Class Attributes

Need to use ClassVar from the typing module.

This is a pseudotype Read more here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Yes, **this is still the standard and recommended way to declare class variables with type hints in the most recent versions of Python (as of Python 3.13)**.

### Why is `ClassVar` Needed?

- By default, when you annotate a variable inside a class, **type checkers (like mypy, pyright, or the built-in `dataclasses` module) assume it is an instance variable**—that is, each instance of the class will have its own copy.
- If you want to indicate that a variable is a **class variable** (shared across all instances), you must use `typing.ClassVar` in your annotation[1][2].
- This is especially important for tools like `@dataclass`, which will otherwise treat all annotated variables as fields to be included in the instance's `__init__`.

### The Syntax

```python
from typing import ClassVar

class Example:
    all_handles: ClassVar[set[str]] = set()
```
- This tells both the type checker and `@dataclass` (if used) that `all_handles` is a class attribute, not an instance attribute.
- The type inside `ClassVar[...]` can be any valid type, including generics like `set[str]`.

### Why is this necessary/ugly?

- The reason for this "pseudotype" is that Python's variable annotations (PEP 526) are designed to be explicit and unambiguous for static analysis and code generation tools.
- Without `ClassVar`, type checkers and dataclasses would treat the variable as an instance field, which is almost always wrong for shared state.

### Is this still the case in Python 3.13+?

**Yes.**
- The [official Python docs](https://docs.python.org/3/library/typing.html) (as of Python 3.13.5) still require `ClassVar` for class variables in type annotations[2].
- This is also reflected in community best practices and in the implementation of type checkers and the standard library[1][2][3][4].

### Example with `@dataclass`

```python
from dataclasses import dataclass
from typing import ClassVar

@dataclass
class Example:
    all_handles: ClassVar[set[str]] = set()  # Not included as a dataclass field
    name: str  # This is an instance variable
```
- `all_handles` is not included in the generated `__init__`, nor in the instance's `__dict__`.

### Related: `Final` and `ClassVar`

- If you want a class variable to be a constant, you can use `Final` from `typing`, but **you cannot combine `Final` and `ClassVar`** in the annotation (see [1]). You must choose one or the other, as combining them is not supported by type checkers.

### References to Search Results

- [PEP 526 and usage of ClassVar](https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python)[1]
- [Python 3.13 typing documentation](https://docs.python.org/3/library/typing.html)[2]
- [Class variables in Python](https://pynative.com/python-class-variables/)[5]

**In summary:**
Declaring class variables with type hints using `ClassVar[...]` is still required and is the correct, modern, and supported approach in all current and foreseeable future versions of Python. This ensures type checkers and dataclasses handle your class variables as intended.

[1] https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python
[2] https://docs.python.org/3/library/typing.html
[3] https://github.com/python/typing/discussions/1424
[4] https://programming-25.mooc.fi/part-9/5-class-attributes/
[5] https://pynative.com/python-class-variables/
[6] https://docs.python.org/3/tutorial/classes.html
[7] https://www.digitalocean.com/community/tutorials/understanding-class-and-instance-variables-in-python-3
[8] https://realpython.com/python-variables/
[9] https://programming-25.mooc.fi/part-8/3-defining-classes/
[10] https://github.com/python/typing/discussions/1636

Initialization Variables That Are Not Fields

@dataclass Example: Dublin Core Resource Record
- Link on page 217: Dublin Core
- Link on page 217: Dublin Core

Data Class as a Code Smell

Data Class as Scaffolding

value: dataclasses are a good way to do temp wire-ups / stubs
In this scenario, the data class is an initial, simplistic implementation of a class to jump-start a new project or module. With time, the class should get its own methods, instead of relying on methods of other classes to operate on its instances. Scaffolding is temporary;

Data Class as Intermediate Representation

Pattern Matching Class Instances

pattern matching in python is new

It’s very similar to elixir’s pattern matching and is different from a typical case-switch construct
it typically uses the __match_args__ dunder declaration, will be discussed in a later part of the book

Here’s some details on it

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Python's `match-case` syntax, introduced in Python 3.10, offers a powerful alternative to traditional `switch-case` constructs found in other programming languages like C, C++, and Java. Here’s a detailed comparison of Python's `match-case` with traditional `switch-case` syntax.

### Key Differences Between Python's `match-case` and Traditional `switch-case`

1. **Pattern Matching vs. Value Matching**:
   - **Python's `match-case`**: Supports **pattern matching**, which means it can match complex data structures, such as lists, tuples, and even class instances. It allows for destructuring and extracting values from these structures.
     ```python
     match some_value:
         case (x, y):  # Matches a tuple with two elements
             print(f"Matched a tuple with x={x} and y={y}")
         case _:
             print("No match")
     ```
   - **Traditional `switch-case`**: Typically only matches against scalar values (like integers or strings) and does not support destructuring. It evaluates the expression and compares it against constant cases.
     ```c
     switch (value) {
         case 1:
             printf("One");
             break;
         case 2:
             printf("Two");
             break;
         default:
             printf("Default case");
     }
     ```

2. **Wildcards and Default Cases**:
   - **Python's `match-case`**: Uses the underscore (`_`) as a wildcard to catch all unmatched cases, similar to an `else` statement.
   - **Traditional `switch-case`**: Uses a `default` case for handling unmatched values, but it requires explicit declaration.

3. **Multiple Patterns**:
   - **Python's `match-case`**: Allows combining multiple patterns using the pipe operator (`|`) for cases that should execute the same block of code.
     ```python
     match day:
         case "Saturday" | "Sunday":
             print("It's the weekend!")
         case _:
             print("It's a weekday.")
     ```
   - **Traditional `switch-case`**: Requires separate cases for each value or uses fall-through behavior (if not explicitly handled with `break`).

4. **No Break Statements Needed**:
   - **Python's `match-case`**: Automatically exits after executing the matched case block, eliminating the need for `break` statements to prevent fall-through.
   - **Traditional `switch-case`**: Requires explicit use of `break` to prevent fall-through to subsequent cases.

5. **Guard Conditions**:
   - **Python's `match-case`**: Supports guard conditions using an `if` statement within the case clause to add additional checks.
     ```python
     match details:
         case [amt, duration] if amt < 10000:
             return amt * 0.1 * duration
         case [amt, duration] if amt >= 10000:
             return amt * 0.15 * duration
     ```
   - **Traditional `switch-case`**: Does not natively support guard conditions; you would need to use additional if-else statements.

### Summary

- Python's `match-case` syntax is more flexible and powerful than traditional `switch-case`, allowing for complex pattern matching and destructuring of data structures.
- It simplifies code by removing the need for break statements and supports more expressive patterns through guards and multiple patterns.
- While both constructs serve similar purposes in controlling flow based on variable values, Python's approach aligns more closely with modern programming paradigms that emphasize readability and expressiveness.

In conclusion, while Python's `match-case` serves a similar purpose to traditional switch-case statements in other languages, it introduces significant enhancements that make it more versatile and easier to use in many scenarios.

Citations:
[1] https://www.geeksforgeeks.org/python-match-case-statement/
[2] https://www.tutorialspoint.com/python/python_matchcase_statement.htm
[3] https://www.youtube.com/watch?v=L7tT0NZF-Ag
[4] https://www.datacamp.com/tutorial/python-switch-case
[5] https://discuss.python.org/t/providing-a-shorthand-match-case-statement/21421
[6] https://stackoverflow.com/questions/74655787/match-case-statement-with-multiple-or-conditions-in-each-case
[7] https://www.youtube.com/watch?v=prB2lfuPDAc
[8] https://docs.python.org/pt-br/3.13/whatsnew/3.10.html

Designed to match classes instances by types and by attrs
- Contents
  Class patterns are designed to match class instances by type and—optionally—by attributes. The subject of a class pattern can be any class instance, not only instances of data classes.10

Simple Class Patterns
- Link on page 222: PEP 634—
- Link on page 222: “Class Patterns”
- Link on page 222: Structural Pattern Matching: Specification

Keyword Class Patterns
- Captures also work with this syntax
  - Contents
    Keyword class patterns are very readable, and work with any class that has public instance attributes, but they are somewhat verbose.

Positional Class Patterns
The pattern for an attribute can be defined positionally as well.
Named collectors / captures still work with this.

Chapter Summary

Link on page 225: PEP 526—Syntax for Variable Annotations

Dataclasses as a code smell
- Contents
  warned against possible abuse of data classes defeating a basic principle of object-oriented programming: data and the functions that touch it should be together in the same class. Classes with no logic may be a sign of misplaced logic.

Chapter 6. Object References, Mutability, and Recycling

What’s New in This Chapter

Variables Are Not Boxes, they are labels

updated mental model and language
We should see it as a “to bind” instead of “to assign” whereby a name is bound to an object.
A sticky note is a better image rather than a box.

Identity, Equality, and Aliases

id() checking
the is operator does id checking, the = operator uses whatever the __eq__ is defined as (typically value-based checking).
programming. Identity checks are most often done with the is operator, which compares the object IDs, so our code doesn’t need to call id() explicitly.

Choosing Between == and is
for equality check, use ==
for identity check, use is, this avoids the direct use of id() since
- is is used when comparing with singletons – typically just None
  So the correct way to do None check is via a singleton
  However, if you are comparing a variable to a singleton, then it makes sense to use is. By far, the most common case is checking whether a variable is bound to None. This is the recommended way to do it: x is None And the proper way to write its negation is: x is not None None is the most common singleton we test with is.
- Default to == if unsure
- is is faster than == because it can’t be overloaded

The Relative Immutability of Tuples
this is why tuples are unhashable, because they are container types and though they are immutable, their containees may not be

Copies Are Shallow by Default

shallow copying is more of a problem if mutable items within the inner nestings
This saves memory and causes no problems if all the items are immutable. But if there are mutable items, this may lead to unpleasant surprises.

shallow-copy negative example

the example below will demonstrate how when the inner element is mutable, then only the reference is copied, so if we modify that, then the original mutable entity gets mutated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = list(l1)
l1.append(100)
l1[1].remove(55) # removal removes from both of the nested arrays since it's the same reference
print('l1:', l1)
print('l2:', l2)
l2[1] += [33, 22]
l2[2] += (10, 11)
print('l1:', l1)
print('l2:', l2)

Deep and Shallow Copies of Arbitrary Objects

complexity in cyclical references
if it’s a naive implementation, circular references can give deepcopying an issue, but the usual deepcopy will handle things alright, not to worry
this is because deepcopy does a graph-traversal of the original object and uses a memo table to keep track of references.
Note that making deep copies is not a simple matter in the general case. Objects may have cyclic references that would cause a naïve algorithm to enter an infinite loop. The deepcopy function remembers the objects already copied to handle cyclic refer‐ ences gracefully.

Function Parameters as References

“Call by sharing”/“pass by reference” is the only mode of parameter passing in python.

This is usually the case for OOP languages in general (JS, Ruby, Java [though in Java, primitive types are call by value])

Mutable Types as Parameter Defaults: Bad Idea
- the default params, if mutable and used, will all point to the same SHARED mutable obj since the params are just aliases to it
  issue with mutable defaults explains why None is commonly used as the default value for parameters that may receive mutable values.
  demonstrates, when a HauntedBus is instantiated with passengers, it works as expected. Strange things happen only when a HauntedBus starts empty, because then self.passengers becomes an alias for the default value of the passengers parameter. The problem is that each default value is evaluated when the function is defined—i.e., usually when the module is loaded—and the default values become attributes of the function object. So if a default value is a mutable object, and you change it, the change will affect every future call of the function.

Defensive Programming with Mutable Parameters

Principle of Least Astonishment == no surprising side-effects

TwilightBus violates the “Principle of least astonishment,” a best practice of interface design.3 It surely is astonishing that when the bus drops a student, their name is removed from the basketball team roster.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
### TL;DR: Principle of Least Astonishment (POLA)

The **Principle of Least Astonishment** (POLA), also known as the **Principle of Least Surprise**, is a design guideline in user interface and software design that emphasizes creating systems that behave in ways users expect. The main idea is to minimize confusion and surprises, ensuring that interactions are intuitive and predictable.

#### Key Points:
- **User Expectations**: Systems should align with users' mental models and past experiences to reduce cognitive load and learning curves.
- **Behavior Consistency**: Components of a system should operate consistently, following common conventions to avoid unexpected behavior.
- **Applications**: POLA applies across various aspects of design, including user interfaces, API design, and error handling.
- **Benefits**: Adhering to POLA leads to improved usability, reduced development time, enhanced maintainability, and increased user satisfaction.

By following the Principle of Least Astonishment, designers can create more intuitive and user-friendly applications that enhance overall user experience.

Citations:
[1] https://pointjupiter.com/ultimate-guide-principle-of-least-astonishment-pola/
[2] https://en.wikipedia.org/wiki/Least_surprise
[3] https://deviq.com/principles/principle-of-least-astonishment/
[4] https://usertimes.io/2018/12/07/the-principle-of-least-astonishment/
[5] https://www.centercode.com/glossary/principle-of-least-surprise
[6] https://www.linkedin.com/pulse/principle-least-surprise-incus-data-pty-ltd
[7] https://dovetail.com/ux/principle-of-least-surprise/
[8] https://barrgroup.com/blog/how-endianness-works-big-endian-vs-little-endian

rule of thumb on when to alias vs make a copy on mutable args
Just make a copy if you’re not sure (when you’re going to be consuming a mutable argument).
Unless a method is explicitly intended to mutate an object received as an argument, you should think twice before aliasing the argu‐ ment object by simply assigning it to an instance variable in your class. If in doubt, make a copy. Your clients will be happier. Of course, making a copy is not free: there is a cost in CPU and mem‐ ory. However, an API that causes subtle bugs is usually a bigger problem than one that is a little slower or uses more resources.

del and Garbage Collection

del is a statement and not a function, that’s why ew don’t do del(x), we do del x (though, this will work too)

Weak references are useful to have pointers but not affect refcount for an obj

good to do monitoring / caching activities using weak references
see this for more elaboration: Weak References | Fluent Python, the lizard book

To inspect whether an object is still alive without holding a strong reference, Python provides the weakref module. A weakref to an object returns None if the object has been garbage collected, effectively giving you a safe way to test “dangling-ness”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  import weakref

  class MyClass:
      pass

  obj = MyClass()
  obj_id = id(obj)
  weak_obj = weakref.ref(obj)

  print(weak_obj())  # <MyClass object at ...>

  del obj

  print(weak_obj())  # None, indicating the original object was garbage collected

This works because finalize holds a weak reference to {1, 2, 3}. Weak references to an object do not increase its reference count. Therefore, a weak reference does not prevent the target object from being garbage collected. Weak references are useful in caching applica‐ tions because you don’t want the cached objects to be kept alive just because they are referenced by the cache.

we can actually use ctypes to read memory spaces directly!

this memory location will have to be casted first though.

1
2
3
4
5
6
7
8
9
import ctypes

x = 42
address = id(x)

# Use ctypes to cast the address back to a Python object and get its value
value = ctypes.cast(address, ctypes.py_object).value

print(value)  # Output: 42

the __del__ method is more like a fini teardown
unlikely that we actually will need to implement it.
if implemented for a class, it gets called by the interpreter before freeing up the memory.
also kind of depends on the implementation of python itself, e.g. some might keep track of more than just refcounts.

Tricks Python Plays with Immutables $\rightarrow$ Interned Immutables

Interning as an optimisation technique for the internal python implementation
Basically some strings and common ints are shared memory, avoids unnecessary duplication.
- won’t work if you use .copy()
- won’t work if you use [:]
NOTE: What is interened or not can’t be always determined because that implementation detail is undocumented.
NOTE: therefore, for immutables, always check sameness using = instead of =is
The sharing of string literals is an optimization technique called interning. CPython uses a similar technique with small integers to avoid unnecessary duplication of num‐ bers that appear frequently in programs like 0, 1, –1, etc. Note that CPython does not intern all strings or integers, and the criteria it uses to do so is an undocumented implementation detail. Never depend on str or int interning! Always use == instead of is to compare strings or integers for equality. Interning is an optimi‐ zation for internal use of the Python interpreter.

Chapter Summary

Practical Consequences of using references
1. simple assignment doen’t create copies
2. for augmented assignments e.g. +=, *=, it depends on the LHS variable:
  - if bound to immutable object, then it creates new objects
  - if bound to mutable object, then it modifies that object in place
3. re-binding: assigning a new value to an existing variable doesn’t change the object previously bound to it, the var is just boudn to a different object.
4. function params are passed as aliases
  - mutable objects may get mutated unless the consumer function copies it
5. it’s dangerous to use mutable default values $\implies$ that’s why the convention is to use None instead.

Part II. Functions as Objects

Chapter 7. Functions as First-Class Objects

Definition of a First Class Object

Programming language researchers define a “first-class object” as a program entity that can be:

Created at runtime
Assigned to a variable or element in a data structure
Passed as an argument to a function
Returned as the result of a function

What’s New in This Chapter

Treating a Function Like an Object

the __doc__ attribute is typically used for the help(<fn_name>)
having fist-class functions enables programming in a functional style

Higher-Order Functions

argument and return type are both functions
apply has been deprecated because we can define **kwargs now
so a function can be defined as fn(*args, **kwargs)

Modern Pythonic Replacements for map, filter, and reduce
- map and filter return generators $\implies$ we can just directly define genexps now instead.
- reduce is no longer a builtin, KIV this for a later chapter in this book.

Anonymous Functions

The best use of anonymous functions is in the context of an argument list for a higher-order function.
e.g. sorted(fruits, key=lambda word:word[::-1])

python anon functions have to be pure functions, limited language features available in lambda definition
However, the simple syntax of Python limits the body of lambda functions to be pure expressions. In other words, the body cannot contain other Python statements such as while, try, etc. Assignment with = is also a statement, so it cannot occur in a lambda.

:= assignment syntax exists
The new assignment expression syntax using := can be used—but if you need it, your lambda is probably too complicated and hard to read, and it should be refac‐ tored into a regular function using def.

⭐️ The Nine Flavors of Callable Objects

() is the call operator
genrators, native coroutines and async generator functions are different from the rest of the callables types in that their return values are never applicaitons data, it’s objects that need further processing to yiled application data / do useful work.

1. callable: User-defined Functions

1. callable: builtin functions
e.g. len

1. callable: builtin methods
methods implemented in C e.g. dict.get

1. callable: methods:
fns defined in the body of a class

1. callable: Class
- classes are callable, it calls __new__ $\rightarrow$ __init__
- we an override __new__ actually
- there’s no new operator in Python
When invoked, a class runs its new method to create an instance, then init to initialize it, and finally the instance is returned to the caller. Because there is no new operator in Python, calling a class is like calling a function.2

1. callable: class instances
if we define a call method in the class

1. callable: Generator Functions
- they have yield in the function body – when called, they return a generator object
Generator functions Functions or methods that use the yield keyword in their body. When called, they return a generator object.

1. callable: native coroutine functions
- functions that are async def $\rightarrow$ when called, they return a coroutine object
Native coroutine functions Functions or methods defined with async def. When called, they return a coroutine object.

1. callable: async generator functions
- have a yield -> returns generators
- are async def -> generators are to be used with async for
Asynchronous generator functions Functions or methods defined with async def that have yield in their body. When called, they return an asynchronous generator for use with async for. Added in Python 3.6.

User-Defined Callable Types

this relates to class instances that work as callable objects.

usecases:

the main idea is really to have objects that can store their own internal state and are callable.

Here’s two concrete cases:

A class implementing __call__ is an easy way to create function-like objects that have some internal state that must be kept across invocations, like the remaining items in the BingoCage

Decorators that need to remember state between calls of the decorator e.g. the @cache decorator

this is also useful to split a complex implementation into staged, multi-decorator implementations.

we’re familiar with this from using frameworks like flask or django where functions can have many decorators applies that does some pre-processing.

Practical Takeaways

Mental Model: Think of the decorator stack as a pipeline: each decorator acts before or after your main logic, transforming input, output, or context as needed.
Framework idioms: Nearly every Flask or Django extension adds its value via decorators to manage permissions, caching, rate limits, etc.
Custom use: You can easily define your own decorators to factor out repeated steps in your app for preprocessing (e.g., parameter parsing), postprocessing (e.g., formatting responses), or injecting cross-cutting security checks.

In summary, the practice of splitting implementation using decorators (as in Flask/Django) is a core idiom for composing, reusing, and organizing web application logic in Python frameworks—and is a concrete, high-impact use of the broader decorator pattern described previously

elaboration on the splitting up part:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
       Splitting a concrete implementation into separate methods in Python using decorators typically refers to using decorators as a modular way to extend, preprocess, or postprocess your functions—effectively breaking up a monolithic function into composable steps. Decorators allow you to separate concerns and add reusable behaviors before or after the core logic, without modifying the original function directly.

       ## Key Concepts

       - **Decorators** are functions (or classes) that take another function/method, wrap it with additional behavior, and return the wrapped function.
​       - This is useful for **"splitting" implementation concerns**: e.g., one decorator handles logging, another argument validation, another result transformation, etc.
​       - Using multiple decorators, you can stack behaviors, creating a pipeline of processing stages for your function[1][5].

       ## Example: Splitting String Processing Steps

       Suppose you have a function that returns a string and you want to:
       1. Convert the result to uppercase.
       2. Then split it into words.

       You can write two decorators and apply them:

       ```python
       import functools

       def uppercase_decorator(func):
           @functools.wraps(func)
           def wrapper(*args, **kwargs):
               result = func(*args, **kwargs)
               return result.upper()
           return wrapper

       def split_string_decorator(func):
           @functools.wraps(func)
           def wrapper(*args, **kwargs):
               result = func(*args, **kwargs)
               return result.split()
           return wrapper

       @split_string_decorator
       @uppercase_decorator
       def greet():
           return 'hello world'

       print(greet())  # Output: ['HELLO', 'WORLD']
       ```

       Here, the function’s logic is split: first uppercase conversion, then string splitting—each in its own decorator[1][5].

       ## Why Use This Pattern?

       - **Separation of concerns**: Each decorator encapsulates a single aspect, making your code easier to maintain, test, and reuse[1][6][9].
​       - **Composable behaviors**: You can stack/reorder behaviors as needed.
​       - **Cleaner implementation**: The core function stays focused on its business logic.

       ## Common Uses

       - **Argument or result preprocessing/postprocessing** (e.g., stripping strings, timing execution, error handling)[7].
​       - **Cross-cutting concerns** like authentication, logging, caching.
​       - **Extending methods in classes** with shared behaviors (see below caveat when decorating class methods)[3][4][8].

       ## Important Caveats

       - When applying decorators to class methods, the decorator receives either a function or a method, depending on implementation. You may need to ensure your decorator handles `self`/`cls` correctly, or else prefer using `functools.wraps` and testing thoroughly[3][4][8].
​       - The order of decorator application matters, especially if each processes return values or arguments—always document and manage dependencies between them[1][5].

       ## Summary Table

       | Use Case         | Approach                      | Notes                           |
       |------------------|------------------------------|---------------------------------|
       | Preprocessing    | Decorator before function    | E.g., strip, validate inputs    |
       | Postprocessing   | Decorator after function     | E.g., format, cache results     |
       | Cross-cutting    | Multiple stacked decorators  | E.g., logging + auth + cache    |

       In summary, splitting a concrete implementation using decorators lets you modularize, extend, and compose functionality in clear, reusable units[1][6][9]. This is a foundational pattern in Python for writing clean, maintainable, and flexible code.

       [1] https://www.datacamp.com/tutorial/decorators-python
       [2] https://www.thepythoncodingstack.com/p/demystifying-python-decorators
       [3] https://stackoverflow.com/questions/62857278/different-behavior-when-implementing-a-decorator-in-python-with-a-function-or-a
       [4] http://blog.dscpl.com.au/2014/01/how-you-implemented-your-python.html
       [5] https://www.youtube.com/watch?v=NyHuBGqCBII
       [6] https://www.codingem.com/what-are-python-decorators/
       [7] https://stackoverflow.com/questions/13698549/python-decorator-for-function-argument-preprocessing
       [8] https://github.com/GrahamDumpleton/wrapt/blob/develop/blog/10-performance-overhead-when-applying-decorators-to-methods.md
       [9] https://itnext.io/python-decorators-demystified-cef81d7e986d
       [10] https://refactoring.guru/design-patterns/decorator/python/example

elaboration linked to flask / django:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
       Your observation is highly relevant: **Flask and Django make extensive use of Python decorators to split and modularize concrete implementations of web functionality.** This pattern is foundational to how these frameworks are built and used.

       ### Contextualized Explanation with Flask/Django

       - **In Flask:** Every view (endpoint) is just a function, and decorators are used to transparently inject additional behavior before or after your function executes. The classic example is `@app.route`, which registers the function as a route handler. Beyond routing, decorators are used extensively for **authentication (`@login_required`), templating, caching, API argument validation, CORS, response formatting, and more**.[1][2][5]

           - For example, a "login required" decorator checks authentication before the underlying view runs:
           ```python
           from functools import wraps
           from flask import g, request, redirect, url_for

           def login_required(f):
               @wraps(f)
               def decorated_function(*args, **kwargs):
                   if g.user is None:
                       return redirect(url_for('login', next=request.url))
                   return f(*args, **kwargs)
               return decorated_function

           @app.route('/secret')
           @login_required
           def secret_view():
               ...
           ```
           Here, **the authentication check is split out from your business logic** into a reusable decorator[5].

           - Another example, templating can be split using a decorator. Rather than explicitly rendering a template in every view, you can return just the context dictionary, and a decorator handles the rendering:
           ```python
           def templated(template):
               def decorator(f):
                   @wraps(f)
                   def decorated_function(*args, **kwargs):
                       ctx = f(*args, **kwargs)
                       return render_template(template, **ctx)
                   return decorated_function
               return decorator

           @app.route('/')
           @templated('index.html')
           def index():
               return dict(value=42)
           ```
           This **splits view organization (data preparation) from view rendering**[1][5].

       - **In Django:** The pattern is similar, although Django adds a distinction between **middleware** (which wraps the whole request/response pipeline) and **decorators** (which wrap only specific views).
​           - Decorators like `@login_required`, `@require_POST`, `@csrf_exempt`, etc., **modularize cross-cutting concerns** so they can be composed and reused across multiple views.
​           - Django middleware shares some similarities with decorators, acting at a broader scope[4].

       ### Why Frameworks Encourage This

       - **Separation of concerns:** Decorators allow you to extract generic, orthogonal logic (like permissions, logging, or caching) from the core business function, improving maintainability and testability.
​       - **Composable behaviors:** By stacking multiple decorators (order matters), you can chain checks, transformations, or wrappers around your views in a clear, declarative fashion[2].
​       - **Minimizing boilerplate:** You avoid repetition—rather than implementing access checks, error handling, or formatting logic in each view, you write these as reusable decorators once.

       ### Practical Takeaways

       - **Mental Model:** Think of the decorator stack as a pipeline: each decorator acts before or after your main logic, transforming input, output, or context as needed.
​       - **Framework idioms:** Nearly every Flask or Django extension adds its value via decorators to manage permissions, caching, rate limits, etc.
​       - **Custom use:** You can easily define your own decorators to factor out repeated steps in your app for preprocessing (e.g., parameter parsing), postprocessing (e.g., formatting responses), or injecting cross-cutting security checks.

       In summary, **the practice of splitting implementation using decorators (as in Flask/Django) is a core idiom for composing, reusing, and organizing web application logic in Python frameworks**—and is a concrete, high-impact use of the broader decorator pattern described previously[1][2][5].

       ---
       [1]: View Decorators — Flask Documentation (3.1.x)
       [2]: Advanced patterns for views and routing - Explore Flask
       [4]: Are middlewares an implementation of the Decorator ...
       [5]: 12. View Decorators

       [1] https://flask.palletsprojects.com/en/stable/patterns/viewdecorators/
       [2] http://explore-flask.readthedocs.org/en/latest/views.html
       [3] https://www.geeksforgeeks.org/python/decorator-method-python-design-patterns/
       [4] https://stackoverflow.com/questions/48696631/are-middlewares-an-implementation-of-the-decorator-pattern
       [5] https://tedboy.github.io/flask/patterns/viewdecorators.html
       [6] https://flask.palletsprojects.com/en/stable/design/
       [7] https://moldstud.com/articles/p-flask-and-restful-design-patterns-a-practical-overview-for-developers
       [8] https://stackoverflow.com/questions/48248771/design-patterns-for-flask-api-implementation/63376123
       [9] https://python.plainenglish.io/the-magic-of-python-decorators-from-basics-to-advanced-techniques-94d25dc5f920
       [10] https://dev.to/senichimaro/flask-decorator-something-57gd

checking if something is callable
just use the callable() builitn and it will give us a boolean on whether it’s callable

for custom obj to be callable, just have to implement __call__
arbitrary Python objects may also be made to behave like functions. Implementing a call instance method is all it takes.
- usecase - example use case could be having an alias to a default func within the class e.g. bingo.pick() and bingo()

usecase - implementing decorators that hold state across decorator invocation
:NOTER_PAGE: (270 0.4916420845624386 . 0.35096774193548386)
Another good use case for call is implementing decora‐ tors. Decorators must be callable, and it is sometimes convenient to “remember” something between calls of the decorator (e.g., for memoization—caching the results of expensive computations for later use) or to split a complex implementation into separate methods. The functional approach to creating functions with internal state is to use closures. Closures, as well as decorators, are the subject of Chapter 9.

closures are the functional approach to having functions with internal state (in contrast to decorators being used for this same purpose)
KIV this for a future chapter (chapter 9)

From Positional to Keyword-Only Parameters

Python’s argument declaration features.

This part is about argument captures when defining functions.

There’s two types of interesting ways to pass variadic params:

positional *args
keyword based **kwargs

NOTE: the usual params can always be passed in keyword fashion (unless it’s positional-only)

Keyword-only parameters
There’s two types of argument passing to consider when writing functions:
- variable positional args
  - accepting multiple positional arguments as tuple using *<arg_name>
  - this allows us to have variadic params
  - this gives us a positional tuple for these args
  ``def tag(name, *content, class_=None, **attrs):’’
- we can define a keyword-only argument signature for functions
  To specify keyword-only arguments when defining a function, name them after the argument prefixed with *.
  - if don’t want to support variable position args, then do def f(a, *, b):
  Keyword-only arguments are a feature of Python 3. In Example 7-9, the class_ parameter can only be given as a keyword argument—it will never capture unnamed positional arguments. To specify keyword-only arguments when defining a function, name them after the argument prefixed with *. If you don’t want to support variable positional arguments, then just do this: def f(a, *, b):

Positional-Only Parameters (using /)
- defining position-only params to a function
  e.g. def divmod(a,b,/)
  after the / we can specify args as per usual
  To define a function requiring positional-only parameters, use / in the parameter list. This example from “What’s New In Python 3.8” shows how to emulate the divmod built-in function: def divmod(a, b, ): return (a / b, a % b) All arguments to the left of the / are positional-only. After the /, you may specify other arguments, which work as usual.

Packages for Functional Programming

The intent wasn’t to support the typical functional paradigms but there is support.

The operator module is for useful operations, the functools modules is for useful higher order functions.

functools also has some higher order functions that can be used as decorators (e.g. cache, singledispatch)

The operator Module

the operator module has a bunch of callable functions that we can use as params for higher order functions.
TO_HABIT: use itemgetter and attrgetter more when accessing nested objects!

provides function equivalents to operators for trivial things

These are callable functions, e.g. they are

itemgetter: Essentially, itemgetter(1) creates a function that, given a collection, returns the item at index 1. That’s easier to write and read than lambda fields: fields[1], which does the same thing.
attrgetter: attrgetter supports nested accesses
:NOTER_PAGE: (275 0.24311023622047243 . 0.09819121447028424)
A sibling of itemgetter is attrgetter, which creates functions to extract object attributes by name. If you pass attrgetter several attribute names as arguments, it also returns a tuple of values. In addition, if any argument name contains a . (dot), attrgetter navigates through nested objects to retrieve the attribute.
attrgetter demo
:NOTER_PAGE: (275 0.4183070866141732 . 0.09819121447028424)

Example 7-14. Demo of attrgetter to process a previously defined list of namedtuple called metro_data (the same list that appears in Example 7-13) >>> from collections import namedtuple >>> LatLon = namedtuple(‘LatLon’, ’lat lon’) >>> Metropolis = namedtuple(‘Metropolis’, ’name cc pop coord’) >>> metro_areas = [Metropolis(name, cc, pop, LatLon(lat, lon)) … for name, cc, pop, (lat, lon) in metro_data] >>> metro_areas[0] Metropolis(name=‘Tokyo’, cc=‘JP’, pop=36.933, coord=LatLon(lat=35.689722, lon=139.691667)) >>> metro_areas[0].coord.lat 35.689722 >>> from operator import attrgetter >>> name_lat = attrgetter(’name’, ‘coord.lat’) >>> >>> for city in sorted(metro_areas, key=attrgetter(‘coord.lat’)): … print(name_lat(city)) … (‘São Paulo’, -23.547778) (‘Mexico City’, 19.433333) (‘Delhi NCR’, 28.613889) (‘Tokyo’, 35.689722) (‘New York-Newark’, 40.808611)

also can refer to this perplexity overview of what attrgetter can do:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
The `operator.attrgetter` function in Python is a convenient way to create a callable that retrieves attributes from objects. It can be particularly useful when working with deeply nested dictionaries or objects. Below are examples demonstrating how to use `operator.attrgetter` for accessing values within a deeply nested dictionary structure.

### Example 1: Accessing Nested Attributes in Objects

Suppose you have a class structure representing a person and their address:

```python
from operator import attrgetter

class Address:
    def __init__(self, city, state):
        self.city = city
        self.state = state

class Person:
    def __init__(self, name, address):
        self.name = name
        self.address = address

# Create instances of Address and Person
address1 = Address("New York", "NY")
address2 = Address("Los Angeles", "CA")

person1 = Person("Alice", address1)
person2 = Person("Bob", address2)

# List of persons
people = [person1, person2]

# Use attrgetter to access nested attributes
get_city = attrgetter('address.city')
cities = list(map(get_city, people))

print(cities)  # Output: ['New York', 'Los Angeles']
```

### Example 2: Accessing Nested Values in a Dictionary

If you have a nested dictionary structure, you can use `attrgetter` to retrieve values from it. Here’s an example:

```python
from operator import itemgetter

# Nested dictionary structure
data = [
    {"name": "Alice", "location": {"city": "New York", "state": "NY"}},
    {"name": "Bob", "location": {"city": "Los Angeles", "state": "CA"}},
]

# Accessing the city using itemgetter for nested dictionaries
get_city = itemgetter('location')
cities = [itemgetter('city')(get_city(person)) for person in data]

print(cities)  # Output: ['New York', 'Los Angeles']
```

### Example 3: Combining Multiple Attributes

You can also use `attrgetter` to fetch multiple attributes at once. For instance, if you want to get both the city and state from the nested structure:

```python
from operator import attrgetter

class Address:
    def __init__(self, city, state):
        self.city = city
        self.state = state

class Person:
    def __init__(self, name, address):
        self.name = name
        self.address = address

# Create instances of Address and Person
address1 = Address("New York", "NY")
address2 = Address("Los Angeles", "CA")

person1 = Person("Alice", address1)
person2 = Person("Bob", address2)

# List of persons
people = [person1, person2]

# Use attrgetter to access both city and state
get_location = attrgetter('address.city', 'address.state')
locations = list(map(get_location, people))

print(locations)  # Output: [('New York', 'NY'), ('Los Angeles', 'CA')]
```

### Summary

- **`operator.attrgetter`**: This function is useful for creating callables that can retrieve attributes from objects. It works well with nested structures.
- **Accessing Nested Attributes**: You can easily access deeply nested attributes using `attrgetter`, which simplifies your code and improves readability.
- **Combining Attributes**: You can retrieve multiple attributes at once by passing them as arguments to `attrgetter`.

These examples illustrate how `operator.attrgetter` can be effectively used for accessing nested attributes in both object-oriented and dictionary-based data structures in Python.

Citations:
[1] https://florian-dahlitz.de/articles/introduction-to-pythons-operator-module
[2] https://note.nkmk.me/en/python-operator-usage/
[3] https://teamtreehouse.com/community/keyitemgetterattrgetter
[4] https://discuss.python.org/t/operator-attrgetter-itemgetter-for-setting/20544
[5] https://www.digitalocean.com/community/tutorials/how-to-use-args-and-kwargs-in-python-3
[6] https://stackoverflow.com/questions/61452346/python-attrgetter-that-handles-none-values-and-can-be-used-in-a-loop
[7] https://www.kdnuggets.com/2023/02/python-function-arguments-definitive-guide.html
[8] https://www.youtube.com/watch?v=on77DvsVgrI

methodcaller vs defining a lambda for it

``Example 7-15. Demo of methodcaller: second test shows the binding of extra’’

usecases:
1. it allows us to partially freeze some args to an existing function
  e.g. hyphenate = methodcaller('replace', ' ', '-')
methodcaller is likely better than using a lambda because it’s more like a pointer to a callable function, with args as opposed to a new lambda being defined in each execution in the case of using lambdas
references: SO post and the perplexity dump below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
The `operator.methodcaller` function in Python provides a way to create a callable that invokes a method on an object, using the method's name specified as a string. This can be particularly useful in scenarios where you need to call the same method repeatedly on different objects or when working with higher-order functions like `map` or `filter`.

### Value Proposition of Using `operator.methodcaller`

1. **Readability**: Using `methodcaller` can improve readability by clearly indicating that you are calling a method, especially when the method name is dynamic or when you are applying it across multiple items.

2. **Performance**: `methodcaller` can be more efficient than using a lambda function because it avoids the overhead of creating a new function object for each call. Instead, it creates a single callable that can be reused.

3. **Simplicity**: It simplifies code when you need to apply the same method call to multiple objects, making your intentions clearer without having to define multiple lambda functions.

4. **Consistency**: It provides a consistent interface for calling methods, especially when dealing with collections of objects where the method names might be determined at runtime.

### Example Usage of `operator.methodcaller`

Here's how you can use `operator.methodcaller` in practice:

#### Example 1: Basic Usage

```python
from operator import methodcaller

# Define a list of strings
strings = ["hello", "world", "python"]

# Use methodcaller to call the upper() method on each string
upper_case_strings = list(map(methodcaller('upper'), strings))

print(upper_case_strings)  # Output: ['HELLO', 'WORLD', 'PYTHON']
```

In this example, `methodcaller('upper')` creates a callable that converts strings to uppercase. The `map` function applies this callable to each string in the list.

#### Example 2: Calling Methods with Arguments

You can also use `methodcaller` to call methods that require arguments:

```python
from operator import methodcaller

# Define a list of strings
strings = ["hello", "world", "python"]

# Use methodcaller to call the rjust() method on each string
right_justified_strings = list(map(methodcaller('rjust', 10, '.'), strings))

print(right_justified_strings)  # Output: ['.....hello', '.....world', '....python']
```

In this case, `methodcaller('rjust', 10, '.')` creates a callable that right-justifies each string to a width of 10 characters, filling with dots.

### Comparison with Lambda Functions

While you could achieve similar functionality using lambda functions, there are notable differences:

- **Performance**: As mentioned earlier, using `methodcaller` can be faster than using a lambda because it avoids the overhead associated with creating new function objects for each call.

- **Conciseness**: For simple method calls, `methodcaller` can make your code cleaner and more concise compared to writing out lambda expressions.

- **Readability**: Using `methodcaller` explicitly indicates that you're invoking a method, which may enhance code clarity compared to lambda functions that might require additional context to understand what they do.

### Conclusion

Using `operator.methodcaller` provides several advantages over defining lambdas for method calls, including improved readability, performance benefits, and simplicity. It is particularly useful in functional programming contexts where you need to apply methods across collections of objects efficiently and clearly.

Citations:
[1] https://stackoverflow.com/questions/2705104/lambda-vs-operator-attrgetterxxx-as-a-sort-key-function
[2] https://florian-dahlitz.de/articles/introduction-to-pythons-operator-module
[3] https://martinheinz.dev/blog/54
[4] https://treyhunner.com/2018/09/stop-writing-lambda-expressions/
[5] https://note.nkmk.me/en/python-operator-usage/
[6] https://realpython.com/python-operator-module/
[7] https://news.ycombinator.com/item?id=28104674
[8] https://discuss.python.org/t/operator-attrgetter-itemgetter-for-setting/20544

Freezing Arguments with functools.partial
it’s a partial application of another function, where we freeze some params
e.g. nfc = functools.partial(unicodedata.normalize, 'NFC')
Another is partial: given a callable, it produces a new callable with some of the arguments of the original callable bound to predetermined values. This is useful to adapt a function that takes one or more arguments to an API that requires a callback with fewer arguments.

Chapter Summary

main ideas are that you can assign functions to variables, pass them to other functions, store them in data structures, and access function attributes, allowing frameworks and tools to act on that information.
Python uses a statement-oriented syntax in which expressions cannot contain statements, and many language constructs are statements —including try/catch, which is what I miss most often when writing lambdas. This is the price to pay for Python’s highly readable syntax.
it’s a good thing that the “functional features” are limited in python because it helps make the language easier to use.

NOTE: python doesn’t do tail recursion optimisation (TCO)

reasons:

hard to do stack tracing, makes it harder to debug
not aligned with python’s idioms
- typically TCO necessary to avoid stack overflow, in languages that are fundamentally functional where deep recursion is idiomatic
hard to implement for the python interpreter.
current interpreter is supposed to :
- allow for dyanmic typing
- allow for introspection
it’s hard to implement the TCO stuff AND also follow these principles.

biggest obstacle to wider adop‐ tion of functional programming idioms in Python is the lack of tail-call elimination, an optimization that allows memory-efficient computation of a function that makes a recursive call at the “tail” of its body. In another blog post, “Tail Recursion Elimina‐ tion”, Guido gives several reasons why

here’s a more comprehensive perplexity output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Python does not implement tail recursion optimization (TCO), meaning recursive functions—even if tail-recursive—consume a new stack frame for each call, and will hit the recursion depth limit rather than looping indefinitely. This is a deliberate design decision, most famously articulated by Python’s creator, Guido van Rossum[2][3][4][8][10].

### Guido van Rossum’s Reasoning

Guido has explicitly written about why Python doesn’t and won’t adopt TCO:
- **Readability and Debuggability:** A major reason is that Python values clear tracebacks for debugging. If TCO were implemented, the call stack would no longer show the chain of tail-recursive calls, since repeated “tail” calls would overwrite each other instead of expanding the stack. This would make it much harder to trace the logical path that led to an error[2][3].
- **Not Essential for Python’s Intended Use:** In languages that are fundamentally functional (like Scheme), deep recursion is idiomatic, and TCO is necessary to avoid stack overflow. In Python, the expectation is that if you need repeated iteration, you should use explicit loops, which are readable and efficient. Recursion in Python is more commonly used for problems where the recursion depth will be limited in practice, such as tree traversal or parsing, rather than as a substitute for iteration[2][4][8][10].
- **Implementation Complexity:** Due to the dynamic, introspective nature of Python, reliably transforming all tail calls into jumps is technically challenging and would complicate the interpreter. Guido also noted that you can't always know at compile time what function or method is actually being called due to Python’s dynamic features[2].

### Community Patterns and Workarounds

- Python coders generally prefer explicit loops over deep recursion for iterative processes.
- Some clever decorators and tricks (often involving stack frame introspection and exception handling) can mimic TCO in restricted circumstances, but these are not supported officially and can introduce their own overhead or edge cases[5][7].
- The maximum recursion depth in standard Python is limited (e.g., 1,000 frames), highlighting that Python is not optimized for deep recursion by design[5][10].

### Guido’s Perspective (from “Tail Recursion Elimination”)

> "Tail recursion elimination is necessary in functional languages with no side effects, like Scheme, but not in a language with explicit state and tracebacks like Python. Tail recursion requires changing semantics that people may not expect, and it destroys stack traces." [2]

### Summary Table

| Language         | Tail Recursion Optimization? | Reason                                                        |
|------------------|-----------------------------|---------------------------------------------------------------|
| Python           | No                          | Preserves debuggability, prefers explicit loops               |
| Scheme/Lisp      | Yes                         | Recursion is idiomatic, required for efficient iteration      |
| Haskell          | Yes                         | Functional style, deep recursion common                       |

In summary, Python intentionally does not optimize tail calls to preserve debuggability, consistency, and code clarity. This is a conscious part of the language philosophy articulated by Guido and is unlikely to change[2][3][4][10].

[1] https://en.wikipedia.org/wiki/Tail_call
[2] http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
[3] https://stackoverflow.com/questions/13591970/does-python-optimize-tail-recursion
[4] https://www.reddit.com/r/learnpython/comments/q2oggt/does_python_support_tailrecursion/
[5] https://riptutorial.com/python/example/17857/tail-recursion-optimization-through-stack-introspection
[6] https://stackoverflow.com/questions/37193076/why-is-tail-recursion-optimization-faster-than-normal-recursion-in-python
[7] https://stackoverflow.com/questions/27417874/tail-recursion-optimization-decorator-in-python
[8] https://www.designgurus.io/answers/detail/does-python-optimize-tail-recursion
[9] https://www.geeksforgeeks.org/dsa/why-is-tail-recursion-optimization-faster-than-normal-recursion/
[10] https://www.geeksforgeeks.org/dsa/tail-recursion-in-python/

Chapter 8. Type Hints in Functions

What’s New in This Chapter

About Gradual Typing

what it means by a gradual type system
“type hints are optional at all levels”
:NOTER_PAGE: (285 0.308300395256917 . 0.14267185473411154)
- it has an interplay between duck typing and nominal typing
- a type system that is optional
  - by optional it means that we need to be able to silence it
  - we can silence it at varying levels of abstraction
- a type system that doesn’t catch type errors @ runtime
  - only for static analysis
- doesn’t enhance performance
Type hints are optional at all levels: you can have entire packages with no type hints, you can silence the type checker when you import one of those packages into a mod‐ ule where you use type hints, and you can add special comments to make the type checker ignore specific lines in your code.

Gradual Typing in Practice

Starting with Mypy

Making Mypy More Strict

GOTCHA: accidentally using = instead of : for type hints
so this is legal but also a typo: def hex2rgb(color=str) -> tuple[int,int,int]
wherein we accidentally wrote = instead of :.
Just have to be careful for these things because the static analyser won’t point it out (since it’s legal).

Good Style:
- No space between the parameter name and the :; one space after the :
- Spaces on both sides of the = that precedes a default parameter value
Use blue instead of black for the static typechecking, it’s more aligned with python’s idioms.

A Default Parameter Value

EXTRA NOTE: python prefers single quotes by default for strings
``using single quotes’’

Using None as a Default $\implies$ use Optional
the idea here is that None can be a better default value to use. So the type hinting should use an Optional
It still needs a default value (of None) because typehints are meaningless at runtime.
BTW, it’s not the annotation that makes the param optional, it’s the provisioning of a default value for that param.

Types Are Defined by Supported Operations

what’s a “type”?
in a practical sense, see it as the set of supported operations
a supported operation here refers to whether the data object has the associated operator function defined or not.
So the example given is abc.Sequence and it does not have the __mul__ implemented, so if the function is this then the type checker will complain
1 2 3 4
from collections import abc def double(x: abc.Sequence): return x * 2

Gradual Type System: an interplay b/w duck typing and nominal typing

``have the interplay of two different views of types:’’

the key idea is, when do we want to detect typing errors: if @ runtime, then it’s more aligned with duck typing. if @ compile time, then it’s aligned with nominal typing.

duck typing (implicitly, structural typing):
- focuses on “behaviour”, only enforced at runtime
- objects are “typed” but variables aren’t
- what really matters is what operations are supported $\implies$ that’s why it’s duck typing.
  “if it quacks like a duck” means if it has an implementation like that and the implementation supports the arguments provided
- naturally this type checking is done at runtime
nominal typing:
- focuses on “type identity”
- “nominal” because it depends on the name, referring to the declaration that was made (like a label)
- compatibility of type depends on what the explicitly-defined type is
NOTE: a static checker may complain about type errors even if the code will actually work and execute without issues.

there’s a duality to be balanced here
This little experiment shows that duck typing is easier to get started and is more flexi‐ ble, but allows unsupported operations to cause errors at runtime. Nominal typing detects errors before runtime, but sometimes can reject code that actually runs—such as the call alert_bird(daffy)

Types Usable in Annotations

The Any Type
- the purpose of defining an any type
  - more general types $\implies$ narrower interfaces in the sense that they support fewer operations.
  - need for a special wildcard type: so you’d want to have something that can accept values of every type but not end up having a narrow interface $\rightarrow$ that’s why we have any
  - so, Any is a magic type that sits at the bottom and at the top of the type hierarchy (from the POV of the typechecker).
  More general types have narrower interfaces, i.e., they support fewer operations. The object class implements fewer operations than abc.Sequence, which implements fewer operations than abc.MutableSequence, which implements fewer operations than list. But Any is a magic type that sits at the top and the bottom of the type hierarchy. It’s simultaneously the most general type—so that an argument n: Any accepts values of every type—and the most specialized type, supporting every possible operation. At least, that’s how the type checker understands Any
- Contrasting subtype-of vs consistent-with relations
  In a gradual type-system there are elements of behavioural sub-typing (the classic one that adheres to LSP principle) as well as a more flexible compatibility notion in the form of consistent sub-typing.
  - subtype-of relationship: behavioural sub-typing adheres to LSP
    - LSP was actually defined in the context of supported operations:
      If an object of T2 substitutes an object of type T1 and the program still behaves correctly, then T2 is a subtype-of T1.
    T2 is expected. This focus on supported operations is reflected in the name behavioral subtyping,
  - consistent-with relationship: that’s what the any is for
    this is the part where Any is consistence with both up and down the heirarchy.

Simple Types and Classes
- Can just directly use them for type-hinting.
- for classes, consistent-with is defined like subtype-of: a subclass is consistent with all its superclasses.
- exception: int is Consisten-With complex
  - all the numeric types are directly subclassed from object.
  - int has a superset of functions but it’s not really a subclass of complex but it is still consistent-with complex!

Optional and Union Types
- even the optional type is just syntax sugar for Union[myType , None].
- the latest syntax allows us to use A | B instead of Union[A, B].
- NOTE: we can actually define return types that are Unions, but this makes it ugly because the caller of this function now needs to handle the type checking at runtime.
- Union is more useful with types that are not consistent among themselves.
  For example:
  Union[int, float] is redundant because int is consistent-with float.
  If you just use float to annotate the parameter, it will accept int values as well.
- syntactic sugar for optional and union type: |
  Better Syntax for Optional and Union in Python 3.10 We can write str | bytes instead of Union[str, bytes] since Python 3.10. It’s less typing, and there’s no need to import Optional or Union from typing. Contrast the old and new syntax for the type hint of the plural parameter of show_count: plural: Optional[str] = None plural: str | None = None
  The | operator also works with isinstance and issubclass to build the second argument: isinstance(x, int | str). For more, see PEP 604—Complementary syntax for Union[].
- try not to define return values with union types
  it means the responsibility of doing type checking on the return values is on the consumer of the function $\rightarrow$ bad pattern

Generic Collections (defining types for collections like list[str])
- python collections (container classes) are generally heterogeneous
- Generic types can be declared with type parameters to specify the type of the items they can handle.
- the simplest form of generic type hints is container[item] where container is any container type; examples being:
  - list
  - set
  - abc.MutableSet
- references:
  - see the official docs on GenericAlias
- Situations that python’s type annotations won’t be able to handle:
  - unsupported 1 - can’t type check array.array typecode for python v 3.10
  - unsupported 2 - when collection defined with typecode, overflow is not checked for
    yet another reminder that these numerics in python are not fixed-width
    constructor argument, which determines whether integers or floats are stored in the array. An even harder problem is how to type check integer ranges to prevent OverflowError at runtime when adding elements to arrays. For example, an array with typecode=‘B’ can only hold int values from 0 to 255. Cur‐ rently, Python’s static type system is not up to this challenge.

Tuple Types

There are 3 ways we can annotate tuple types:

annotating them as records
annotating them as records with named fields
annotating them as immutable sequences

tuples as records
Just use the builtin like e.g. def geohash(lat_lon: tuple[float,float]) -> str:

for tuples being used as records with named fields $\implies$ using NamedTuple

can “alias” it using a named tuple – follows the consistent-with relationship

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  from typing import NamedTuple
  from geolib import geohash as gh

          PRECISION = 9

          class Coordinate(NamedTuple):
              lat: float
              lon: float
              # NOTE this wrapper prevents static checkers from complaining that the geohash lib does not have typehints.
          def geohash(lat_lon:Coordinate) -> str:
                  return gh.encode(*lat_lon, PRECISION)

So here, Coordinate is consistent-with tuple[float,float]
because of this consistency, if a fn signature was def display(lat_lon: tuple[float, float]) -> str:, then Coordinate NamedTuple will still work

1
None

for tuples to be used as immutable sequences

Objective here is to annotate tuples of unspecified length that are used as immutable lists

We specify a single type, followed by a comma and ...

This ellipsis is useful to us.

e.g. tuple[int, ...] is a tuple with int items.

note: we can’t tie down a particular length though

Here’s a consolidated example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from collections.abc import Sequence
        def columnize(
                # 1: this is a sequence of strings
                sequence: Sequence[str], num_columns: int = 0
                # 2: return type below means it's a list of tuples and the tuples only contain strings.
        ) -> list[tuple[str, ...]]:
                if num_columns == 0:
                        num_columns = round(len(sequence) ** 0.5)

                num_rows, reminder = divmod(len(sequence), num_columns)
                num_rows += bool(reminder)
                return [tuple(sequence[i::num_rows]) for i in range(num_rows)]

Generic Mappings

the syntax is just MappingType[KeyType, ValueType]
we can annotate local variables! e.g. index: dict[str, set[str]] = {}

interesting example of an inverted index
``returning an inverted index’’
There’s a whole bunch of literature on inverted indexes. This does a value -> key mapping.
see wikipedia article on this

good example code

In the example below:

the local variable of index is annotated because the typechecker will complain otherwise

the walrus operator := is used to define a name for an expression-output

TO_HABIT: this is useful and I already use it for while loops, but I can potentially use it for other expressions as well.

in the example for charidx.py:

1
2
3
4
   # we name the expression for the if-predicate so that we can use it thereafter
   if name := unicodedata.name(char, ''):
           for word in tokenize(name):
                   index.setdefault(word, set()).add(char)

More information about the walrus operator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
   The **walrus operator** (`:=`), introduced in Python 3.8, allows for assignment expressions, enabling you to assign a value to a variable as part of an expression. This feature can enhance code readability and efficiency in certain contexts. Here’s a detailed overview of its functionality, use cases, and implications.

   ### What is the Walrus Operator?

   - **Syntax**: The walrus operator is used as follows:
     ```python
     variable := expression
     ```
     This assigns the result of `expression` to `variable` and returns that value.

   - **Purpose**: The primary purpose of the walrus operator is to allow assignments to be made within expressions, reducing redundancy and improving code conciseness.

   ### Key Use Cases

   1. **Reducing Redundant Calculations**:
​      - The walrus operator can be particularly useful when you want to use a value multiple times without recalculating it.
      ```python
      # Without walrus operator
      result = [func(x) for x in data if func(x)]

      # With walrus operator
      result = [y for x in data if (y := func(x))]
      ```
      In this example, `func(x)` is called only once per iteration instead of twice.

   2. **Cleaner Loop Constructs**:
​      - It simplifies loops where the loop condition depends on user input or other calculations.
      ```python
      # Using walrus operator
      while (data := input("Enter your data: ")) != "":
          print("You entered:", data)
      ```
      This eliminates the need for an initial assignment before entering the loop.

   3. **Conditional Assignments**:
​      - You can assign a value within an if statement, making the code more readable.
      ```python
      if (match := re.match(pattern, s)):
          print("Match found:", match.group())
      ```

   4. **Accumulate Data In-Place**:
​      - The walrus operator can also be used to accumulate values while iterating.
      ```python
      c = 0
      print([(c := c + x) for x in [5, 4, 3, 2]])  # Output: [5, 9, 12, 14]
      ```

   ### Advantages

   - **Conciseness**: It reduces boilerplate code by allowing assignments within expressions.
​   - **Performance**: It can improve performance by avoiding repeated function calls or calculations.
​   - **Readability**: In certain contexts, it makes the code clearer by showing intent directly where values are being assigned and used.

   ### Considerations

   - **Readability vs. Complexity**: While it can enhance readability, excessive or inappropriate use may lead to complex and hard-to-read code. It's important to balance conciseness with clarity.
​   - **Avoiding Nested Expressions**: Using nested walrus operators can make code difficult to understand and maintain.

   ### Etymology of "Walrus"

   The term "walrus operator" is informal and comes from the resemblance of the `:=` symbol to a walrus's eyes and tusks. The playful name was popularized in discussions about its introduction and has since become widely accepted in the Python community.

   ### Mental Model

   To conceptualize the walrus operator:
​   - Think of it as a way to "capture" a value while simultaneously using it in an expression.
​   - Visualize it as a tool that allows you to hold onto something (the value) while you continue working with it immediately (the expression).

   ### Summary

   The walrus operator (`:=`) in Python provides a powerful way to assign values within expressions, enhancing code conciseness and performance in specific scenarios. While it offers significant advantages, careful consideration should be given to its use to maintain code clarity and avoid unnecessary complexity.

   Citations:
   [1] https://www.geeksforgeeks.org/walrus-operator-in-python-3-8/
   [2] https://martinheinz.dev/blog/79
   [3] https://www.kdnuggets.com/how-not-to-use-pythons-walrus-operator
   [4] https://realpython.com/python-walrus-operator/
   [5] https://www.reddit.com/r/Python/comments/jmnant/walrus_operator_good_or_bad/
   [6] https://stackoverflow.com/questions/73644898/why-is-python-walrus-operator-needed-instead-of-just-using-the-normal-assig
   [7] https://realpython.com/python-operator-module/
   [8] https://www.digitalocean.com/community/tutorials/how-to-use-args-and-kwargs-in-python-3

the tokenize function is a generator. KIV for chapter 17 for a deep dive into this.

Example 8-14. charindex.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

import re
import unicodedata

from collections.abc import Iterator

RE_WORD = re.compile(r"\w+")
STOP_CODE = sys.maxunicode + 1


def tokenize(text: str) -> Iterator[str]:
    """
    return iterable of uppercased words
    """
    for match in RE_WORD.finditer(text):
        yield match.group().upper()


def name_index(start: int = 32, end: int = STOP_CODE) -> dict[str, set[str]]:
    index: dict[str, set[str]] = {}

    for char in (chr(i) for i in range(start, end)):
        if name := unicodedata.name(char, ""):
            for word in tokenize(name):
                index.setdefault(word, set()).add(char)
    return index

Abstract Base Classes
- PRINCIPLE: Robustness Principle / Postel’s Law:
  “Be conservative in what you send, be liberal in what you accept.”
- it makes sense to define a generic type hint (of abstract classes) so that we can support many concrete implementations of it.
- rule of thumb - better to use abc.Mapping or abc.MutableMapping instead of dict
  Because it will support more mapping types
  Therefore, in general it’s better to use abc.Mapping or abc.MutableMapping in parameter type hints, instead of dict (or typing.Dict in legacy code).
- fall of the “numeric tower” of numeric class-hierarchy
  - there used to be a bunch of ABCs for numeric types, but now it’s not useful because numeric types are special.
    they are directly subclassed from Object type and are consistent-with each other.
  - this numeric tower is a linear hierarchy of ABCs with Number at the top
    1. Number
    2. Complex
    3. Real
    4. Rational
    5. Integral
  Point being that the static type checking of things within the numeric tower doesn’t work well – have to use the explicit types, KIV the solution for it, comes in a later chapter
  Those ABCs work perfectly well for runtime type checking, but they are not sup‐ ported for static type checking. The “Numeric Tower” section of PEP 484 rejects the numbers ABCs and dictates that the built-in types complex, float, and int should be treated as special cases, as explained in “int Is Consistent-With complex” on page
- 3 options to type-annotate numeric things
  1. use a concrete type instead e.g. int, float, complex
  2. declare a union type Union[float, Decimal, Fraction]
  3. Use numeric protocols e.g. SupportsFloat
    kiv numeric protocols for chapter 13
  In practice, if you want to annotate numeric arguments for static type checking, you have a few options:
  1. Use one of the concrete types int, float, or complex—as recommended by PEP

    >
    > 3.  Declare a union type like Union[float, Decimal, Fraction].
    > 4.  If you want to avoid hardcoding concrete types, use numeric protocols like Sup
    >
    > portsFloat, covered in “Runtime Checkable Static Protocols” on page 468.
    > The upcoming section “Static Protocols” on page 286 is a prerequisite for understand‐
    > ing the numeric protocols.
    > Meanwhile, let’s get to one of the most useful ABCs for type hints: Iterable.

Generic Iterables
- Python Typeshed Project
  Not that important.
  Just for compatibility initially.
  It is a way to provide “headers” with type annotations.
  This is how the type annotations are retrofit in existing stdlibs because the stdlib fucntions have no annotations.
  It relies on a .pyi file that’s basically like a c-header file.
  :NOTER_PAGE: (310 0.6666666666666667 . 0.2703549060542797)
  ``Stub Files and the Typeshed Project’’
- Explicit Type Aliases are supported, they improve readability
  Read more in PEP613.
  Though it seems that there’s a separate syntax for this. FromTo: TypeAlias = tuple[str, str]
  1 2
  from typing import TypeAlias FromTo: TypeAlias = tuple[str, str]
- ⚠️ Danger of unbounded iterables on memory requirements
  - GOTCHA: iterable arguments need to be completely consumed. This poses a risk if we have infinite iterables (e.g. cyclic generators).
    this is something to keep in mind about.
  - however, the value of this is that it allows flexibility and the ability to inject in generators instead of prebuilt sequences
  return a result. Given an endless iterable such as the itertools.cycle generator as input, these functions would consume all memory and crash the Python process. Despite this potential danger, it is fairly common in modern Python to offer functions that accept an Iterable input even if they must process it completely to return a result.

Parameterized Generics and TypeVar

for us to refer to a generic type, we have to use TypeVars
KIV the fact that TypeVar also allows us to define covariants and contravariants in addition to bounds.

type var bound @ point of usage, is a reflection on the result type
where T is a type vari‐ able that will be bound to a specific type with each usage. This allows a parameter type to be reflected on the result type.

why TypeVar is needed (and unique to python)

TypeVar is a construct that is unique to the python language
introduces the variable name in the current namespace as opposed to getting that variable declared beforehand
it’s unique because languages like C, Java, Typescript don’t needt he name of type variables to be declared beforehand, so they don’t need such a construct

mental model: it’s a variable representing a type instead of being a type by itself

see more on typevar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
  The concept of **TypeVar** in Python is a unique construct primarily used for creating generic types, allowing developers to write functions and classes that can operate on any data type while maintaining type safety. This feature is particularly useful in statically typed languages, but it has specific implications and uses in Python, which is dynamically typed. Here’s a breakdown of why TypeVar is significant in Python and how it differs from similar concepts in languages like JavaScript.

  ### Understanding TypeVar in Python

  1. **Generic Programming**:
​     - **TypeVar** allows you to define a placeholder for a type that can be specified later when the function or class is instantiated. This enables generic programming, where you can write code that works with any data type.
​     - Example:
       ```python
       from typing import TypeVar, Generic

       T = TypeVar('T')

       class Wrapper(Generic[T]):
           def __init__(self, value: T):
               self.value = value

       int_wrapper = Wrapper(10)  # T is inferred as int
       str_wrapper = Wrapper("Hello")  # T is inferred as str
       ```

  2. **Type Safety**:
​     - TypeVar enhances type safety by ensuring that the operations performed on the generic type are valid for the specific type passed during instantiation. This helps catch errors at development time rather than runtime.

  3. **Flexibility**:
​     - It allows for more flexible and reusable code. You can create functions and classes that can handle multiple types without duplicating code for each specific type.

  ### Comparison with JavaScript

  JavaScript does not have a direct equivalent to Python's TypeVar due to its dynamic typing system. Here are some key differences:

  1. **Dynamic vs. Static Typing**:
​     - JavaScript is dynamically typed, meaning types are determined at runtime and variables can hold values of any type without explicit declarations.
​     - In contrast, Python’s TypeVar allows for static type checking when using tools like `mypy`, enabling developers to specify expected types while still maintaining flexibility.

  2. **Lack of Generics**:
​     - While JavaScript supports some level of generics through its type systems (like TypeScript), it does not have built-in constructs like TypeVar that are part of the core language syntax.
​     - In TypeScript (a superset of JavaScript), generics are defined differently, using angle brackets (`<T>`), but they do not use a construct like `TypeVar` to define a variable type that can be reused across multiple functions or classes.

  3. **Type Inference**:
​     - Python's TypeVar allows for type inference based on context, which can help with readability and maintainability of code. JavaScript's dynamic nature means that developers often rely on documentation or comments to convey expected types.

  ### Etymology of "TypeVar" and Mental Model

  The term **TypeVar** combines "Type" (referring to data types) and "Var" (short for variable). This naming emphasizes that it acts as a variable representing a type rather than being a concrete type itself.

  #### Mental Model:
​  - Think of **TypeVar** as a placeholder or a template for a data type:
​    - Imagine it as an empty box labeled "T" where you can put different items (data types) later. When you define a function or class using TypeVar, you’re saying, “This box can hold anything; just tell me what it will hold when you use it.”
​  - This concept aligns with generic programming principles found in other languages but is uniquely adapted to Python's dynamic typing environment.

  ### Summary

  - **TypeVar** is a powerful construct in Python that enables generic programming by allowing developers to create flexible and reusable code while maintaining type safety.
​  - Unlike JavaScript, which lacks direct support for generics in its core syntax, Python provides TypeVar as part of its typing module, facilitating static type checking.
​  - The term "TypeVar" reflects its role as a variable representing types, allowing developers to think in terms of templates or placeholders when designing their functions and classes.

  Citations:
  [1] https://stackoverflow.com/questions/55345608/instantiate-a-type-that-is-a-typevar
  [2] https://discuss.python.org/t/non-uniqueness-of-typevar-on-python-versions-3-12-causes-resolution-issues/37350
  [3] https://guicommits.com/python-generic-type-function-class/
  [4] https://typing.readthedocs.io/en/latest/spec/generics.html
  [5] https://www.reddit.com/r/learnpython/comments/1adbgfp/should_i_use_a_typevar/
  [6] https://dagster.io/blog/python-type-hinting
  [7] https://docs.python.org/es/3.13/library/typing.html
  [8] https://www.typescriptlang.org/play/typescript/language-extensions/nominal-typing.ts.html

make the [] operator work on classes like Sequence[T]. But the name of the T variable inside the brackets must be defined somewhere—otherwise the Python interpreter would need deep changes to support generic type notation as special use of []. That’s why the typing.TypeVar constructor is needed: to introduce the variable name in the cur‐ rent namespace. Languages such as Java, C#, and TypeScript don’t require the name of type variable to be declared beforehand,

Restricting/Bounding the TypeVar
- there might be a need to explicitly restrict using a whilelist of types instead of letting the consistent-with subtyping do its job.
  Without the restriction, anything that is consistent with T will work, but that’s unideal because the function that’s consuming the type most likely needs this to be restricted
- we have 2 ways to restrict the possible types assigned to T:
- [1] restricted TypeVar – references a whitelist
  This is a fixed whitelist.
  Problem is that, it may not be easy to maintain if numerous items in the list.
  that’s where bounding can be done.
  ``NumberT = TypeVar(‘NumberT’, float, Decimal, Fraction)’’
- [2] bounded TypeVar – defines an upper bound on the type, works on anything that is consistent-with
  - sets an upper boundary for the acceptable types.
    e.g. HashableT = TypeVar('HashableT', bound=Hashable) then the variable could be Hashable or any of its subtypes
  - careful not to get confused with the use of the word “bound” for that named param to TypeVar. It’s just
  - this becomes the same generics construct as in Java
  The solution is another optional parameter of TypeVar: the bound keyword parame‐ ter. It sets an upper boundary for the acceptable types. In Example 8-18, we have bound=Hashable, which means the type parameter may be Hashable or any subtype- of it.14

Predefined TypeVars
- AnyStr is an example of such a predefined type var, supports both bytes and str .

Static Protocols via typing.Protocols

A protocol in the historical sense is an informal interface. KIV proper introduction to Protocols till Chapter 13.

In the context of type hints,

A protocol is really all about structural typing. Types match if the behaviours are consistent-with each other.

This feature is also known as “static duck typing”. It’s because we make duck typing explicit for static type checkers.

the solution to annotate the series parameter of top was to say “The nominal type of series doesn’t matter, as long as it implements the __lt__ method.” Python’s duck typing always allowed us to say that implicitly, leaving static type checkers clueless. That’s the contrast with implicit duck typing that we have been seeing all along

protocol definition vs implementation
- protocol can be defined by subclassing typing.Protocol
  it’s a class of its own,
  here’s an example:
  1 2 3 4 5
  from typing import Protocol, Any class SupportsLessThan(Protocol): def __lt__(self, other: Any) -> bool: ...
  and then we can use this protocol to define a TypeVar: LT = TypeVar('LT', bound=SupportsLessThan)
  NOTE:
  1. it subclasses typing.Protocol
  2. class body has one or more methods. the methods have ... in their bodies.
    this is sufficient to define the type signature for the protocol, and that’s what matters / is used to determine if something adheres to a protocol.
- the implementer of a protocol doesn’t need to inherit, register or declare any relationship with the class that defines the protocol
a protocol type is defined by specifying one or more methods, and the type checker verifies that those methods are imple‐ mented where that protocol type is required. In Python, a protocol definition is written as a typing.Protocol subclass. However, classes that implement a protocol don’t need to inherit, register, or declare any rela‐ tionship with the class that defines the protocol. It’s up to the type checker to find the available protocol types and enforce their usage.

use case: when we can’t just define a boundary, but we want to define a protocol for the functions it supports
were able to use typing.Hashable as the upper bound for the type parameter. But now there is no suitable type in typing or abc to use, so we need to create it.

T just needs to be checked if it’s consistent-with the protocol

disambiguation b/w protocols and abstract classes (as interfaces) in OOP (see perplexity thread)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
  To clarify the concepts of **protocols** and **interfaces** in Python, and to compare them with Java's virtual interfaces, let's break down the definitions and functionalities.

  ### Disambiguation: Protocols vs. Interfaces in Python

  1. **Protocols**:
​     - **Definition**: A protocol in Python, introduced in PEP 484, defines a set of methods and properties that a class should implement, but it does not require explicit inheritance. Protocols allow for structural typing, meaning that if a class has the required methods and properties, it can be considered to implement the protocol.
​     - **Usage**: Protocols are defined using the `typing.Protocol` class. They are useful for static type checking with tools like `mypy` and allow for "duck typing" — if an object behaves like a certain type, it can be used as that type regardless of its actual class.
​     - **Example**:
       ```python
       from typing import Protocol

       class Drawable(Protocol):
           def draw(self) -> None:
               ...

       class Circle:
           def draw(self) -> None:
               print("Drawing a circle")

       def render(shape: Drawable) -> None:
           shape.draw()

       circle = Circle()
       render(circle)  # This works because Circle has a draw method
       ```

  2. **Interfaces**:
​     - **Definition**: In Python, interfaces are often implemented using abstract base classes (ABCs) defined in the `abc` module. An interface specifies a contract that classes must adhere to by explicitly declaring methods that must be implemented.
​     - **Usage**: Interfaces require explicit inheritance from an ABC. This means that if a class claims to implement an interface, it must inherit from the ABC and implement all its abstract methods.
​     - **Example**:
       ```python
       from abc import ABC, abstractmethod

       class Drawable(ABC):
           @abstractmethod
           def draw(self) -> None:
               pass

       class Circle(Drawable):
           def draw(self) -> None:
               print("Drawing a circle")

       def render(shape: Drawable) -> None:
           shape.draw()

       circle = Circle()
       render(circle)  # This works because Circle implements Drawable
       ```

  ### Comparison with Java's Virtual Interfaces

  - In Java, interfaces are explicitly defined contracts that classes must implement. A class can implement multiple interfaces, providing flexibility similar to Python's protocols.
​  - Java's interfaces support default methods (methods with an implementation), which is not directly available in Python's protocols or ABCs without additional workarounds.
​  - Both Java interfaces and Python protocols allow for polymorphism; however, Python's protocols enable duck typing without requiring explicit declarations of interface implementation.

  ### Summary of Differences

  | Feature               | Protocols (Python)                  | Interfaces (Python via ABC)               | Java Interfaces                       |
  |-----------------------|-------------------------------------|-------------------------------------------|---------------------------------------|
  | Definition            | Structural typing                   | Explicit method contracts                  | Explicit method contracts              |
  | Inheritance           | No explicit inheritance required    | Requires explicit inheritance              | Requires explicit implementation       |
  | Flexibility           | More flexible (duck typing)        | Less flexible (strict adherence needed)   | Supports multiple interface implementation |
  | Default Methods       | No default methods                  | No default methods                        | Supports default methods               |
  | Use Case              | Type checking with mypy            | Enforcing method implementation            | Defining contracts for behavior       |

  ### Conclusion

  In summary, while both protocols and interfaces serve to define expected behaviors in Python, they do so in different ways. Protocols offer more flexibility through structural typing, while interfaces enforce strict adherence through inheritance. Understanding these distinctions helps in designing systems that leverage polymorphism effectively.

  If you have further questions or need more clarification on any specific points regarding protocols, interfaces, or their comparisons with other languages like Java, feel free to ask!

  Citations:
  [1] https://stackoverflow.com/questions/67233702/python-terminology-interface-vs-protocol
  [2] https://blog.glyph.im/2021/03/interfaces-and-protocols.html
  [3] https://sinavski.com/post/1_abc_vs_protocols/
  [4] https://testdriven.io/tips/9f452585-e673-4617-8f35-ac85ab413e14/
  [5] https://www.reddit.com/r/Python/comments/10ikape/interfaces_with_protocols_why_not_ditch_abc_for/
  [6] https://news.ycombinator.com/item?id=26488156
  [7] https://andrewbrookins.com/technology/building-implicit-interfaces-in-python-with-protocol-classes/
  [8] https://discuss.python.org/t/add-built-in-flatmap-function-to-functools/21137

protocol has one or more method definitions, with … in their bodies. A type T is consistent-with a protocol P if T implements all the methods defined in P, with matching type signatures.

example
the examples below use MyPy’s debugging facilities, take note.
- typing.TYPE_CHECK can be used to guard against runtime function calls
  - things like reveal_type() is a Mypy debugging facility, not a regural function.
  typing.TYPE_CHECKING constant is always False at runtime, but type check‐ ers pretend it is True when they are type checking.
- reveal_type() is a pseudofunction, a mypy debugging facility
  ``reveal_type() pseudofunction call, showing the inferred type of the argument.’’

Callables via typing.Callable
- allows us to hint the type of Higher Order Functions that are taking in callables
  parameterized like so: Callable[[ParamType1, ParamType2], ReturnType]
  The params list can have zero or more types.
- if we need a type hint to match a function with a flexible signature, replace the whole parameter list with a ...
  Callable[..., ReturnType]
  other than that, there’s NO syntax to annotate optional or kwargs
- Variance in Callable Types
  With generic type params, we now have to deal with type hierarchies and so we have to deal with type variance.
  KIV variance on Chapter 15
  - covariance
    example: Callable[[], int] is a subtype-of Callable[[], float] because int is a subtype of float
    $\implies$ Callable is covariant on the return type because the subtype-of relationships of the types int and float is in the same direction as the relationship of the Callable types that use them as return types
  - most parameterized generic types are invariant

NoReturn via typing.NoReturn

for functions that never return

actually used for no returns like exception throws in the case of sys.exit() that raises SystemExit

extra: typeshed-like stub files don’t define default values, so they use ... instead
``Stub files don’t spell out the default values, they use … instead.''
:NOTER_PAGE: file:///Users/rtshkmr/org/future_vyapari/books/Luciano Ramalho - Fluent Python_ Clear, Concise, and Effective Programming-O’Reilly Media (2022).pdf :ID: ./Luciano Ramalho - Fluent Python_ Clear, Concise, and Effective Programming-O’Reilly Media (2022).pdf-annot-325-10

extra: the use of ellipsis operator ... is context-dependent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
The `...` operator in Python, known as the **ellipsis**, is a built-in singleton object of type `ellipsis`. Its role varies depending on context, and it’s often used where meaning can be ambiguous because Python itself does not mandate one specific use. Here are its main uses:

1. **Placeholder for Incomplete Code**
   You can use `...` inside functions, classes, or other blocks to indicate "code not yet implemented" or "to be done later," similar to `pass`. For example:
   ```python
   def my_function():
       ...
   ```
   This works as a no-op placeholder, and some developers prefer it stylistically over `pass`.

2. **Extended Slicing in Multi-Dimensional Arrays**
   In libraries like **NumPy**, `...` is very useful for slicing arrays with many dimensions. It stands for "expand all missing dimensions" in slice notation, allowing you to avoid explicitly typing many colons:
   ```python
   import numpy as np
   arr = np.arange(27).reshape(3, 3, 3)
   result = arr[..., 0]  # Equivalent to arr[:, :, 0]
   ```
   This slices all elements along all earlier dimensions and selects index 0 of the last dimension.

3. **Unique Default or Sentinel Value**
   Sometimes `...` is used as a unique sentinel or default parameter value to distinguish "no argument provided" from `None` or other possible values:
   ```python
   def greet(name, greeting=...):
       if greeting is ...:
           greeting = "Hello"
       print(f"{greeting}, {name}!")
   ```

4. **Representation of Recursive or Circular References**
   When you print objects with circular references (e.g., a list containing itself), Python shows `[...]` in the output to avoid infinite recursion. This is a display convention, not an actual replacement of the object.

5. **Type Hinting Uses**
   The ellipsis appears in some advanced type hinting scenarios, for example, to indicate variable-length tuples (`Tuple[int, ...]`) or unspecified arguments in callables.

### Summary Table of Ellipsis Roles

| Role                                | Description & Example                         |
|------------------------------------|-----------------------------------------------|
| Placeholder for unimplemented code | Use `...` inside functions/classes instead of `pass` |
| Extended slicing in multidimensional arrays | `arr[..., 0]` to slice across many dimensions |
| Unique default/sentinel value      | Default function arguments different from None |
| Circular reference representation  | Shown as `[...]` in printed recursive data to avoid infinite loops |
| Type hinting                       | Used in tuple or callable annotations to indicate variable length |

In essence, `...` is a versatile built-in object intended to serve as a **syntactic convenience and semantic marker** that gains meaning from context, especially in libraries like NumPy and in type hinting.

If you are seeing it "everywhere" ambiguously, it’s because its meaning tends to be **context-dependent**, and it’s often used as a compact placeholder or slicing shorthand.

Let me know if you want detailed examples or specific usage in any particular context like NumPy, type hints, or function placeholders.

---
References:
1. Scaler: Uses of Ellipsis in Python[1]
2. Real Python: When to Use Ellipsis[2]
3. Python Land: Python Ellipsis Explained[4]
4. Explaining Ellipsis in Function Defaults and NumPy Slicing[3]

[1] https://www.scaler.com/topics/python-ellipsis/
[2] https://realpython.com/python-ellipsis/
[3] https://www.kdnuggets.com/exploring-pythons-ellipsis-more-than-just-syntax-sugar
[4] https://python.land/python-ellipsis
[5] https://stackoverflow.com/questions/772124/what-does-the-ellipsis-object-do
[6] https://www.reddit.com/r/learnpython/comments/12pqfz5/til_about_ellipses_in_python/
[7] https://gist.github.com/promto-c/f51cc2c0eb8742ce5cc3e65601df2deb
[8] https://www.geeksforgeeks.org/python/what-is-three-dots-or-ellipsis-in-python3/
[9] https://mbizsoftware.com/to-what-purpose-does-a-python-ellipsis-perform/

Annotating Positional Only and Variadic Parameters

Consider this example:

1
2
3
4
5
6
7
8
9
from typing import Optional

def tag(
        name: str,
        /, # this marks the preceding arguments as positional only
        *content: str, # for the arbitratry positional params
        class_: Optional[str] = None,
        **attrs: str, # this is for the variadic named kwargs
      ) -> str:

So what we see here is that:

for the arbitrary positional params, it’s all fixed to str
for the kwargs, it’s **atrs: <mytype> where mytype would be the type of the value and the key will be str

Imperfect Typing and Strong Testing

Some limitations to the type hinting capabilities:

unsupported: useful things like argument unpacking
handy features can’t be statically checked; for example, argument unpack‐ ing like config(**settings).

unsupported: advanced features like properties, descriptors, meta things
properties, descriptors, metaclasses, and metaprogram‐ ming in general are poorly supported or beyond comprehension for type checkers.

since can’t hint data constraints, type hinting doesn’t help with correctness of business logic
Common data constraints cannot be expressed in the type system—even simple ones. For example, type hints are unable to ensure “quantity must be an integer > 0” or “label must be a string with 6 to 12 ASCII letters.” In general, type hints are not help‐ ful to catch errors in business logic.

conclusion - robustness of python codes comes mainly from quality unit-testing
concluded: “If a Python program has adequate unit tests, it can be as robust as a C++, Java, or C# program with adequate unit tests (although the tests in Python will be faster to write).”

Chapter Summary

Protocol and how it enables static duck typing is useful because it’s very ‘pythonic’
- in many ways, type hinting is very unPythonic, but typing.Protocol fits nicely..
- protocols should be seen as the bridge between python’s duck-typed core and the nominal typing that allows static type checkers to catch bugs.
3.8, Protocol is not widely used yet—but it is hugely important. Protocol enables static duck typing: the essential bridge between Python’s duck-typed core and the nominal typing that allows static type checkers to catch bugs.

using the term “generics” in python is kinda funny
because it’s actually doing the opposite by tying-down exactly what the types can be instead of keeping it more “generic”
Generics or Specifics? From a Python perspective, the typing usage of the term “generic” is backward. Com‐ mon meanings of “generic” are “applicable to an entire class or group” or “without a brand name.” Consider list versus list[str]. The first is generic: it accepts any object. The sec‐ ond is specific: it only accepts str. The term makes sense in Java, though. Before Java 1.5, all Java collections (except the magic array) were “specific”: they could only hold Object references, so we had to cast the items that came out of a collection to

for a deeper dive into variance
see the PEP for it

we can do runtime type-checking in python

for advanced versions of runtime typechecking (i.e. anything beyond doing guards for isinstance) we have to rely on some libraries

See this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
  Yes, you can perform runtime type checking in Python, and there are several ways to achieve this. Here’s a detailed overview based on the search results:

  ### Runtime Type Checking in Python

  1. **Dynamic Typing**: Python is a dynamically typed language, meaning that types are determined at runtime. This allows for flexibility but can lead to type-related errors if not managed properly.

  2. **Type Checking Tools**:
​     - **`isinstance()`**: The built-in function `isinstance()` can be used to check if an object is an instance of a specific class or a tuple of classes. However, it does not support checking against complex type annotations defined in the `typing` module.
       ```python
       x = 5
       if isinstance(x, int):
           print("x is an integer")
       ```

  3. **TypeGuard Library**:
​     - Libraries like **TypeGuard** provide runtime type checking capabilities that can validate function arguments and return types based on type annotations.
​     - You can use the `@typechecked` decorator to automatically check types at runtime.
       ```python
       from typeguard import typechecked

       @typechecked
       def add(a: int, b: int) -> int:
           return a + b

       add(1, 2)  # Works fine
       add(1, "2")  # Raises TypeError at runtime
       ```

  4. **Other Libraries**:
​     - **`runtime-type-checker`**: This library allows you to perform type checks at runtime using decorators or direct function calls like `check_type()`.
       ```python
       from runtime_type_checker import check_type

       check_type("hello", str)  # OK
       check_type(123, str)      # Raises TypeError
       ```

  5. **Pydantic and Marshmallow**: These libraries are often used for data validation and serialization, providing runtime checks based on type annotations.
​     - **Pydantic Example**:
       ```python
       from pydantic import BaseModel

       class User(BaseModel):
           name: str
           age: int

       user = User(name="Alice", age=30)  # Valid
       user = User(name="Alice", age="30")  # Raises validation error
       ```

  ### Summary of Key Points

  - **Type Annotations**: While Python supports type hints (introduced in PEP 484), they are primarily for static analysis tools like `mypy`. By default, Python does not enforce these types at runtime.
​  - **Runtime Type Checking Libraries**: To enforce type checks during execution, you can use libraries such as TypeGuard, runtime-type-checker, Pydantic, or Marshmallow.
​  - **Flexibility vs. Safety**: The dynamic nature of Python provides flexibility but may lead to errors that can be caught with proper type checking mechanisms.

  ### Conclusion

  In summary, while Python does not enforce type hints at runtime by default, you can implement runtime type checking using various libraries designed for this purpose. These tools help ensure that your code adheres to the expected types defined in your annotations, providing an additional layer of safety in your applications.

  Citations:
  [1] https://stackoverflow.com/questions/43646823/python-3-5-checking-type-annotation-at-runtime
  [2] https://github.com/agronholm/typeguard
  [3] https://pypi.org/project/runtime-type-checker/
  [4] https://testdriven.io/blog/python-type-checking/
  [5] https://discuss.python.org/t/proposal-for-enhancing-runtime-type-checking-in-python-using-pattern-matching-and-type-hints-in-function-definitions/59391
  [6] https://www.datacamp.com/tutorial/type-checking-in-python
  [7] https://www.infoworld.com/article/2268917/get-started-with-python-type-hints.html
  [8] https://realpython.com/python-type-checking/

``covers runtime type checking validation.''

Chapter 9. Decorators and Closures

there’s a need to disambiguate the following:

“decorators” in the context of design patterns vs
“decorators” as a python source code syntax feature to do basic metaprogramming

“decorators” in the context of source code that alters the syntax tree when walking and annotating it

b and c are related

See elaboration here:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
       Absolutely—the term “decorator” is overloaded in the programming lexicon, and disambiguating its different meanings is essential, especially for a technical audience steeped in first-principles thinking.

       ## 1. **Decorator as an Object-Oriented Design Pattern**

       **Definition:**
​       - The **Decorator Pattern** (from the Gang of Four design patterns) is a structural object-oriented pattern.
​       - **Mental Model:** It’s about composing behaviors at runtime by “wrapping” objects so you can add, remove, or alter responsibilities dynamically, without modifying the original object’s code.
​       - **Canonical Use:** You create a base interface, concrete components, and “decorator” classes implementing the same interface but containing an instance of the component (or another decorator). The decorator adds extra functionality before/after delegating calls to the wrapped object.

       **Example:**
       ```python
       class Coffee:
           def cost(self):
               return 5

       class MilkDecorator:
           def __init__(self, coffee):
               self._coffee = coffee

           def cost(self):
               return self._coffee.cost() + 2  # adds cost of milk

       coffee = Coffee()
       milk_coffee = MilkDecorator(coffee)
       print(milk_coffee.cost())  # Output: 7
       ```
       **Key Points:**
​       - The object’s interface remains constant.
​       - You can layer (nest) decorators to compose features.
​       - Pattern is about **runtime behavior extension** via object composition.

       ## 2. **Decorator as a Python Source Code Syntax Feature (Function/Method/Class Annotation)**

       **Definition:**
​       - In Python, a *decorator* is language syntax (the `@decorator` notation) that allows alteration or augmentation of functions, methods, or classes.
​       - **Mental Model:** This is a form of metaprogramming: you take a callable (“the decorated”) and replace it with the result of passing it to a decorator function (or class). The original entity is replaced—effectively rewiring its meaning at definition time.
​       - This mechanism operates during **definition time**—when the source code is executed—and can be used for logging, access control, transformation, memoization, etc.

       **Example:**
       ```python
       def my_decorator(func):
           def wrapper(*args, **kwargs):
               print("Before")
               return func(*args, **kwargs)
           return wrapper

       @my_decorator
       def greet():
           print("Hello")
       ```
​       - At load-time, `greet = my_decorator(greet)`.

       **Key Points:**
​       - Not limited to adding or layering behavior: you can totally replace functionality, register, introspect, or annotate.
​       - Strongly linked to Python’s first-class, runtime-accessible function objects.
​       - Used pervasively in frameworks (Flask, Django, pytest, etc.).

       ## 3. **Decorator as an Abstract Syntax Tree (AST) Transformer**

       **Definition:**
​       - In advanced metaprogramming scenarios, *decorator* can also refer to code that inspects and modifies the **AST itself**—that is, the parse tree Python generates before executing code.
​       - **Mental Model:** These are meta-tools (not the `@decorator` feature) that programmatically traverse, annotate, or even rewrite code during, for example, static analysis, linting, or source-to-source transformation.
​       - Examples include tools like `ast` module in Python’s standard library, or third-party packages for code analysis, type-checking, or metaclass generation.

       **Example (AST transformation):**
       ```python
       import ast

       class FunctionAnnotator(ast.NodeTransformer):
           def visit_FunctionDef(self, node):
               # Insert or rewrite a decorator in the AST node
               ...
               return node

       tree = ast.parse(open('somefile.py').read())
       # Walk/transform the tree
       FunctionAnnotator().visit(tree)
       ```
       **Key Points:**
​       - This usage of “decorator” is about source-level transformations, not runtime behavior or object composition.
​       - Such tools might annotate or rewrite code to reflect design-time properties or enforce invariants.
​       - Operates at a different abstraction layer: **syntax tree instead of runtime objects**.

       ## **Disambiguating the Terms**

       - **Decorator Pattern (OOP):**
       Object composition to add functionality via wrappers, used in class/instance structures. “Decorator” is a *pattern* in the sense of design, independent of language syntax.
​       - **Python Decorator Syntax:**
       Metaprogramming at definition time (`@decorator`), altering functions/methods through higher-order functions or callables. This is a *language feature* deeply integrated into Python’s runtime and loading mechanisms.
​       - **AST/Syntax Tree Decorator/Transformer:**
       Tools that mutate or annotate the language’s syntax tree itself, often before code ever runs. This is a *meta-level code transformation* technique, not a runtime object operation.

       ## **Mental Model and Practical Takeaway**

       - Whenever we encounter the word “decorator,” first clarify: **Are we talking about a design pattern, a Python syntactic feature, or code transformation tools working with the syntax tree?**
​       - For a **tech leader**, it’s crucial to recognize **the same term can encode different mental models**, and context defines which conceptual layer we’re operating in—object structure, runtime function composition, or compile-time/source-level transformation.

       > **In summary:**
       > - The term **“decorator”** in software can mean (1) a design pattern for dynamic behavioral extension, (2) the Python syntax feature for code transformation at definition time, or (3) a concept/tool for manipulating code at the syntax-tree level. These meanings are distinct in logic, purpose, and implementation, and using their precise terminology and mental models prevents architectural ambiguities in code review, design discussions, and tooling selection.

       **References:**
       1. Gamma, Helm, Johnson, Vlissides. “Design Patterns: Elements of Reusable Object-Oriented Software.”
       2. Python Language Reference, section on function definitions and decorators.
       3. Python `ast` module documentation and real-world linter/analysis tools (e.g., Black, mypy internals).

closures are a foundational concept that help us reason about:
- how to write decorators and use the nonlocal keyword
- programming that uses callbacks, or when writing python code in a functional fashion.

precursor learning objectives

• How Python evaluates decorator syntax • How Python decides whether a variable is local • Why closures exist and how they work • What problem is solved by nonlocal With this grounding, we can tackle further decorator topics: • Implementing a well-behaved decorator • Powerful decorators in the standard library: @cache, @lru_cache, and @single dispatch • Implementing a parameterized decorator

What’s New in This Chapter

Decorators 101

definition: “callable that takes another function as argument”
- 3 facts to summarise it:
  1. a decorator is a function or another callable
  2. a decorator may replace the decorated function with a different one
  3. decorators are executed IMMEDIATELY when a module is LOADED (@ load-time)
- When you “decorate” a function in Python (either with the @decorator syntax or by manual assignment), you are:
  - Passing the original function object to a decorator callable (function or class).
  - Receiving back the return value of the decorator, which is typically—but not always—an inner function that wraps (or sometimes replaces) the original.
  - Binding a variable (often the same name as the original function) to this new object.
- class decorators also exist!

When Python Executes Decorators - `import-time` vs `run-time`

import-time vs runtime import time would refer to when the module is loaded.
- within a script $\rightarrow$ when the script starts
- within a different module $\rightarrow$ when the importing is actually done

to emphasize that function decorators are executed as soon as the module is imported, but the decorated functions only run when they are explicitly invoked. This highlights the difference between what Pythonistas call import time and runtime.

Registration Decorators

typically decorators define an inner function that uses the decorated function and return that inner function.
this in a way, supports the GOF decorator pattern as well

Variable Scope Rules

this is a fundamental concept:

compare with JS, if we have a variable defined local to the scope of a function,
then in python, it will always assume that it’s a local variable and it won’t look further in the outer scopes to resolve that name.
in JS, the name resolution may go to higher scopes and we might potentially modify a global variable unknowingly
in python, refs to a locally defined name will never bubble out to a larger scope
But the fact is, when Python compiles the body of the function, it decides that b is a local variable because it is assigned within the function. The generated bytecode reflects this decision and will try to fetch b from the local scope. Later, when the call f2(3) is made, the body of f2 fetches and prints the value of the local variable a, but when trying to fetch the value of local variable b, it discovers that b is unbound.

so in this example, we have 3 scopes in action actually:

module global scope
the f3 function’s scope

the nonlocal scope

though this part is not super obvious yet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
       b = 6
       def f3(a):
           global b
           print(a)
           print(b)

           b = 9

       print(f3(3))

       print(b)

the nature of the variable (local vs. global) can’t change in the body of a function

Closures

summary:
a closure is a function that retains the bindings of the free variables that exist when the function is defined, so that they can be used later when the function is invoked and the defining scope is no longer available.
the only situation in which a function may need to deal with external variables that are nonglobal is when it is nested in another function and those variables are part of the local scope of the outer function.

a closure is a function—let’s call it f—with an extended scope that encompasses variables referenced in the body of f that are not global variables or local variables of f.
Such variables must come from the local scope of an outer function that encompasses f.
It does not matter whether the function is anonymous or not; what matters is that it can access nonglobal variables that are defined outside of its body.

disambiguating closures from anon functions

Actually, a closure is a function—let’s call it f—with an extended scope that encom‐ passes variables referenced in the body of f that are not global variables or local vari‐ ables of f. Such variables must come from the local scope of an outer function that encompasses f. It does not matter whether the function is anonymous or not; what matters is that it can access nonglobal variables that are defined outside of its body.

refer to this for the verbose disambiguation between closures and anon functions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
  To disambiguate **closures** and **anonymous functions** in programming languages, particularly in the context of Python, we need to clarify their definitions, characteristics, and how they relate to each other.

  ### Definitions

  1. **Anonymous Functions**:
​     - An **anonymous function** (often referred to as a **lambda function** in Python) is simply a function that does not have a name. It is defined using the `lambda` keyword and can be used wherever function objects are required.
​     - Example in Python:
       ```python
       add = lambda x, y: x + y
       print(add(2, 3))  # Output: 5
       ```
​     - Anonymous functions are useful for short, throwaway functions that are not reused elsewhere.

  2. **Closures**:
​     - A **closure** is a function that captures the lexical scope in which it was defined, allowing it to access variables from that scope even when it is executed outside of that scope.
​     - In Python, closures are typically created when a nested function references variables from its enclosing scope.
​     - Example in Python:
       ```python
       def outer_function(x):
           def inner_function(y):
               return x + y  # 'x' is captured from the outer scope
           return inner_function

       closure = outer_function(10)
       print(closure(5))  # Output: 15
       ```
​     - Here, `inner_function` forms a closure that captures the variable `x` from `outer_function`.

  ### Key Differences

  1. **Nature**:
​     - **Anonymous Function**: A function without a name; it can be defined and used inline.
​     - **Closure**: A function that retains access to its lexical scope even when invoked outside of that scope.

  2. **Creation**:
​     - **Anonymous Function**: Can be created anywhere, typically for short-lived operations (e.g., as arguments to higher-order functions).
​     - **Closure**: Created when a nested function references variables from its enclosing scope.

  3. **Scope Access**:
​     - **Anonymous Function**: Does not inherently capture any external state unless it is defined within another function.
​     - **Closure**: Specifically designed to capture and retain access to its surrounding state.

  4. **Use Cases**:
​     - **Anonymous Functions**: Often used for callbacks or functional programming constructs (like `map`, `filter`, etc.).
​     - **Closures**: Useful for data hiding and encapsulation, maintaining state across function calls.

  ### Python Context

  In Python, both anonymous functions and closures are supported:

  - **Anonymous Functions (Lambdas)**:
    ```python
    square = lambda x: x * x
    print(square(4))  # Output: 16
    ```

  - **Closures**:
    ```python
    def make_multiplier(factor):
        def multiplier(x):
            return x * factor  # 'factor' is captured from the outer scope
        return multiplier

    double = make_multiplier(2)
    print(double(5))  # Output: 10
    ```

  ### Summary

  - All closures can be considered anonymous functions if they are defined without a name, but not all anonymous functions create closures.
​  - The distinction lies in the ability of closures to capture and maintain access to their lexical environment, which anonymous functions do not inherently possess unless they are part of a nested structure.
​  - In Python, this distinction allows for powerful programming patterns such as decorators and higher-order functions while maintaining clean and concise code.

  By understanding these concepts, developers can effectively utilize both anonymous functions and closures in their programming practices, leveraging their unique characteristics for various scenarios.

  Citations:
  [1] https://stackoverflow.com/questions/4912116/closure-vs-anonymous-function-difference
  [2] https://steemit.com/php/@hvonsteemit/the-power-of-anonymous-functions-closures-in-php
  [3] https://www.reddit.com/r/PHP/comments/4s1f9l/anonymous_functions_vs_closures_the_way_you_like/
  [4] https://thoughtbot.com/blog/back-to-basics-anonymous-functions-and-closures
  [5] https://doc.rust-lang.org/book/ch13-01-closures.html
  [6] https://www.tutorialspoint.com/javascript-closures-vs-anonymous-functions
  [7] https://www.php.net/manual/en/functions.anonymous.php
  [8] https://realpython.com/python-type-hints-multiple-types/

free variables: variables that is not bound in the local scope

Consider this example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def make_averager():
        series = []

        def averager(new_value):
                # this assignment here makes series locally bound for the lifespan of this averager function
                series.append(new_value)
                total = sum(series)
                return total / len(series)

        return averager

so avg is a reference to the inner function, averager which has the outer series in its scope.

within averager, series WAS originally a local variable because of the assignment within its function body

however after make_averager returns, that local scope is gone.

within averager, series is a free variable, a variable that is NOT bound in the local scope.

the closure for averager extends the scope of that function to include the binding for the free variable, series

Figure 9-1. The closure for averager extends the scope of that function to include the binding for the free variable series. Inspecting the returned averager object shows how Python keeps the names of local and free variables in the code attribute that represents the compiled body of the function. Example 9-10 demonstrates. Example 9-10. Inspecting the function created by make_averager in Example 9-8 >>> avg._code_.co_varnames (’new_value’, ’total’) >>> avg._code_.co_freevars

how free variables are kept within __closure__ attributes
:NOTER_PAGE: (344 0.5105177993527509 . 0.33121019108280253)
- closures are actually represented using cell items / attrs
  avg.__closure__[0].cell_contents
- alternatively, we could inspect the __code__ for that variable, just to observe this.
  avg.__code__.co_freevars which gives the free variables for that function
is kept in the closure attribute of the returned function avg. Each item in avg.__closure__ corresponds to a name in avg. code .co_freevars. These items are cells, and they have an attribute called cell_con tents where the actual value can be found.

only situation for why a function may require access to variables outside its scope
that the only situation in which a function may need to deal with external vari‐ ables that are nonglobal is when it is nested in another function and those variables are part of the local scope of the outer function.

The nonlocal Declaration

using nonlocal keyword to explicitly define free variables
- lets you declare a variable as a free variable even when it is assigned within the function. If a new value is assigned to a nonlocal variable, the binding stored in the closure is changed.
- prevents the use-before-definition error because the variable will be regarded as a free-variable if it’s indicated as nonlocal
To work around this, the nonlocal keyword was introduced in Python 3. It lets you declare a variable as a free variable even when it is assigned within the function. If a new value is assigned to a nonlocal variable, the binding stored in the closure is changed.

Variable Lookup Logic - how python resolves names

Rules for defining how to fetch a variable x that appears in function:
1. if we have a global x, then x comes from and is assigned to the x global variable module
  NOTE: python doesn’t have a program global scope, only module global scopes
2. if we have a nonlocal x declaration then x comes from and is assigned to the x local variable of the nearest surrounding function where x is defined
3. If x is a parameter / assigned a value in the function body, then x is the local variable
4. If x is referenced (read only), but not assigned (no modification) and is NOT a parameter:
  - x is looked up the local scopes of the surrounding function bodies (non-local scopes) $\rightarrow$ free variable!
  - if nothing, then will be read from the module global scope
  - if still nothing, then read from __builtins__.__dict__

:NOTER_PAGE: (346 . 0.496385)

Implementing a Simple Decorator

typical behavior of a decorator: it replaces the decorated function with a new function that accepts the same arguments and (usually) returns whatever the decorated function was supposed to return, while also doing some extra processing.

this is partly aligned with the GOF Decorator Pattern in OOP.

In both cases, we extend behaviour, but the way we do it is different.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  Your summary of a typical Python decorator—that it “replaces the decorated function with a new function that accepts the same arguments and (usually) returns whatever the decorated function was supposed to return, while also doing some extra processing”—captures how decorators are commonly used in Python. However, this pattern only partly aligns with the **Gang of Four (GOF) Decorator Pattern** as described in object-oriented design.

  ### GOF Decorator Pattern (Design Pattern)
​  - The **GOF Decorator Pattern** is about **dynamically attaching new responsibilities to objects** at runtime by wrapping them in decorator objects[2][7].
​  - In the traditional pattern, decorators are classes that implement the same interface as the objects they wrap. Additional behavior is layered by composition, and you can recursively chain decorators together[2][3][5].
​  - The emphasis is on **object composition** and runtime extension, not on function replacement or metaprogramming.

  ### Python Decorators (Language Feature)
​  - **Python decorators** (via the `@decorator` syntax) are a **language feature** for transforming functions or classes at definition time[1]. A Python decorator is usually a higher-order function or class that takes a function (or class) and returns a replacement—often a function that wraps the original, accepting the same parameters, returning the same output, and adding pre- or post-processing.
​  - This approach is similar in spirit to the design pattern because it allows extension or modification of behavior without changing the original function directly[1]. The decorator often "wraps" the original so that calling the decorated function results in the new ("decorator") logic being executed, often around a call to the original.

  ### Key Distinctions
​  - The **GOF pattern operates at the object/class level**, using composition and polymorphism.
​  - The **Python feature generally operates at the function (or class) level**, using higher-order functions for metaprogramming and code transformation at definition time.
​  - Both systems enable behavioral extension, and Python decorators can be used to implement the decorator pattern, but they're not always identical in mechanics or intention[1][5].

  ### In summary:
​  - Your description of the typical behavior matches how Python decorators are commonly implemented and used.
​  - This overlap is **inspired by the GOF Decorator Pattern**; the intent—to extend or alter behavior without modifying the original—is present in both.
​  - However, **Python's decorator syntax and idioms are a language feature enabling, but not limited to, the classical design pattern**. The mechanics and use cases often go beyond, including metadata registration, memoization, method validation, and many scenarios not contemplated by the original OOP design pattern[1][3][7].

  So: **your summary matches common Python decorator behavior, which parallels the GOF Decorator Pattern but is not restricted to it—they share intent, but not all implementation details**[1][2][5].

  [1] https://stackoverflow.com/questions/17927129/is-python-decorator-related-to-the-decorator-design-pattern
  [2] https://www.digitalocean.com/community/tutorials/gangs-of-four-gof-design-patterns
  [3] https://refactoring.guru/design-patterns/decorator
  [4] https://www.reddit.com/r/Python/comments/r0apia/which_of_the_design_patterns_of_the_og_gang_of/
  [5] https://python-patterns.guide/gang-of-four/decorator-pattern/
  [6] https://www.geeksforgeeks.org/system-design/decorator-pattern/
  [7] https://en.wikipedia.org/wiki/Decorator_pattern
  [8] https://github.com/tuvo1106/python_design_patterns

TIP: using @functools.wraps() will allow the decoration to have the same docs and variadic kwargs and such
this is an example of a standard, ready-to-use decorator

``def clock(func): @functools.wraps(func) def clocked(*args, **kwargs):''

Decorators in the Standard Library

Memoization with functools.cache
- stacking decorators composes them, starting from the bottom of the stack then applies outwards towards the top of the stack
  1 2 3
  @alpha @beta def my_fn():
  which is equivalent to nested functions my_fn = alpha(beta(my_fn))
- args to the decorated function needs to be hashable
  this is because the internal representation for the lru_cache is actually a dict and the keys are from the positional and kwargs used in the calls.
  arguments taken by the decorated function must be hashable, because the underlying lru_cache uses a dict to store the results, and the keys are made from the positional and keyword arguments used in the calls.
- a good usecase for @cache is caching remote API calls!
- careful that we don’t end up using all the available memory ( may happen since this cache is unrestricted ). we can use functools.lru_cache(maxsize=<mymaxsize>) instead

Using @lru_cache to keep the memory bounded
- there’s a default of 128 which is 128 entries in the cache
- for optimal performance, maxsize parameter should be a power of 2 to make it optimal
- typed parameter, if true, discriminates argument type within its store, so 1.0 and 1 are treated as two different entries in the store.

Single Dispatch to Contribute to Generic Functions (aka Multimethods)
- python doesn’t have method overloading (like in Java, for example)
- in this example for htmlize, with multiple rules, we can:
  - let htmlize act as a dispatcher that dispatches multiple specific functions e.g. htmlize_str and so on
- so, functools.singledispatch decorator:
  - allows different modules to contribute to the overall solution, and lets you easily provide specialized functions even for types that belong to third-party packages that you can’t edit.
  - a generic function here is a group of functions to perform the same operation in different ways, depending on the type of the first argument
  - it’s singledispatch because only the first argument is used as the determinant, else it would have been called “multiple dispatch”
  - it’s going to use a custom @<generic_fn_name>.register to do the bundling of the specific functions
- implementing it:
  - the function that we apply the @singledispatch decorator to will end up being the name of the generic function that we want to bundle functionality for. Also, the base case implementation (e.g. something that works with a generic Object type), will be found there.
  - for the other specialised functions (contributors to the generic function) , we can use the custom @<generic_fn>.register to register them
    - the specialised function
    - only 1st argument matters
    - we can define them at varying levels of class-specificity
      e.g. bool is a subtype-of numbers.Integral, but the singledispatch logic seeks the implementation with the most specific matching type, regardless of the order they appear in the code.
    - we can also stack other decorators atop the register decorator because the @<base>.register decorator returns the undecorated function.
  - TIP: register specialised functions to handle ABCs or Protocols to enjoy more polymorphic behaviour
    Using ABCs or typing.Protocol with @singledispatch allows your code to support existing or future classes that are actual or virtual subclasses of those ABCs, or that implement those protocols.
    KIV virtual subclasses till chapter 13
  - remember that we can register functions that works with 3rd party libraries!
this analogous to function overloading in typed languages
here’s the functools docs for it
- NOTE it’s NOT supposed to be a parallel to java method overloading, it’s to support MODULAR EXTENSION
  - modular extension in the sense that each module can register a specialized function for each type it supports.
  @singledispatch is not designed to bring Java-style method over‐ loading to Python. A single class with many overloaded variations of a method is better than a single function with a lengthy stretch of if/elif/elif/elif blocks. But both solutions are flawed because they concentrate too much responsibility in a single code unit—the class or the function. The advantage of @singledispatch is supporting modular extension: each module can register a speci‐ alized function for each type it supports. In a realistic use case, you would not have all the implementations of generic functions in the same module as in

Parameterized Decorators

Key Idea: use a decorator factory to create the decorator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  registry = set()

  # this here is a decorator factory
  def register(active=True):
          def decorate(func):
                  print('running register'
                        f'(active={active})->decorate({func})')
          if active:
                  registry.add(func)
          else:
                  registry.discard(func)
          return func
       return decorate

  @register(active=False)
  def f1():
          print('running f1()')

  @register()
  def f2():
          print('running f2()')

  def f3():
          print('running f3()')

NOTE: we can’t just call @name anymore, we have to actually call that factory function

Chapter Summary

dynamic vs lexical scoping

lexical scope is the norm: free variables are evaluated considering the environment where the function is defined. Lexical scope complicates the implementation of languages with first-class functions, because it requires the support of closures. On the other hand, lexical scope makes source code easier to read.

``Dynamic Scope Versus Lexical Scope''

coming to terms b/w decorator pattern and python decorators

Note that I am not suggesting that function decorators should be used to implement the decorator pattern in Python programs. Although this can be done in specific sit‐ uations, in general the decorator pattern is best implemented with classes to represent the decorator and the components it will wrap.

Chapter 10. Design Patterns with First-Class Functions

design pattern: general recipe for solving common design problems
language independent; however practically some languages already have inbuilt recipes for some of these patterns
e.g. Generators in python are the inbuilt version of the Iterator pattern.
in the context of languages that support first-class functions, the fact that we can leverage functions as first-class objects is useful to make code simpler.
the following classic patterns might need a rethink because functions can do the same work as classes while improving readability and reducing boilerplate:
1. Strategy Pattern
2. Command Pattern
3. Template Method
4. Visitor Pattern

What’s New in This Chapter

Case Study: Refactoring Strategy

the objective of this case study is to see how we can leverage functions as first-class objects.

Classic Strategy
- what it is:
  - “Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.”
  - participants:
    1. Context
      - Provides a service by delegating some computation to interchangeable components that implement alternative algorithms.
      - in the example, this is the Order
    2. Strategy
      - interface common to the components that implement the different algorithms.
        it’s the Promotion abstract class
    3. Concrete Strategy
      - one of the concrete classes that implement the abstract class

Function-Oriented Strategy
- useful characteristics form the class-based implementation:
  1. concrete strategies have a single useful method
  2. strategy instances are stateless and hold no instance attributes
  It’s because of these reasons, we can consider replacing the concrete strategies with simple functions, and removing the abstract class.
- “strategy objects often make good flyweights” is the advice, wherein the cons of the Strategy pattern, which is its runtime cost (e.g. when instantiating the strategy) is addressed by using a Flyweight pattern.
  now we end up getting more boilerplate
- the python way of using first class functions works well in general because:
  - in most cases, concrete strategies don’t need to hold internal state because they deal with data injected by the context $\implies$ good enough to use plain old functions.
  - a function is more lightweight than an instance of a user-defined class + we can just create each function once and use it.

Choosing the Best Strategy using MetaStrategy: Simple Approach
- Once you get used to the idea that functions are first-class objects, it naturally follows that building data structures holding functions often makes sense.

Finding Strategies in a Module
- Modules are also first-class objects
- globals() returns the current global symbol table. We can inspect attributes of the class object and get the function attributes defined within it like so: promos = [func for _, func in inspect.getmembers(promotions, inspect.isfunction)]
- I see this as a “pull method” almost where we try to pull together attributes that might make sense. Naturally a registration decorator approach makes more sense already so that we can do a “pull method” approach

Decorator-Enhanced Strategy Pattern

Here’s the example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Promotion = Callable[[Order], Decimal]

promos: list[Promotion] = [] # NB: this is a module global

# this is a registration decorator, it simply registers this function
def promotion(promo: Promotion) -> Promotion:
        promos.append(promo)
        return promo

def best_promo(order: Order) -> Decimal:
        """Compute the best discount available"""
        return max(promo(order) for promo in promos)

@promotion
def fidelity(order: Order) -> Decimal:
        """5% discount for customers with 1000 or more fidelity points"""
        if order.customer.fidelity >= 1000:
                return order.total() * Decimal('0.05')
        return Decimal(0)

@promotion
def bulk_item(order: Order) -> Decimal:
        """10% discount for each LineItem with 20 or more units"""
        discount = Decimal(0)
        for item in order.cart:
                if item.quantity >= 20:
                        discount += item.total() * Decimal('0.1')
        return discount

@promotion
def large_order(order: Order) -> Decimal:
        """7% discount for orders with 10 or more distinct items"""
        distinct_items = {item.product for item in order.cart}
        if len(distinct_items) >= 10:
                return order.total() * Decimal('0.07')
        return Decimal(0)

Advantages of using this decorator:

promo strategy functions don’t need special names, flexibility in naming
the registration decorator also becomes a highlighting of the purpose of the function being decorated
- also makes it easy to just comment out the decorator
registration can be done from any other module, anywhere in the system as long as we use the same registering decorator

The Command Pattern

The goal of Command Pattern is to decouple an object that invokes an operation (the invoker) from the provider object that implements it (the receiver).
put a Command object between the two, implementing an interface with a single method, execute, which calls some method in the receiver to perform the desired operation.
- Invoker doesn’t need to know the interface of the receiver
- different receivers can be adapted through different Command subclasses
- the invoker is configed with a concrete command and calls its execute method to operate it.
some pointers from the example:
- we have commands and command receivers.
  Command receivers are the objects that implement the action specific to a command.
  There can be multiple receivers that may respond to a command.
“Commands are an object-oriented repalcement for callbacks”. Nice. Depends on use-case but we could directly implement the callbacks if we want.
How to use simple callback functions directly?
Instead of giving the invoker a Command instance, we can simply give it a function. Instead of calling command.execute(), the invoker can just call command(). The MacroCommand can be implemented with a class implementing __call__. Instances of MacroCommand would be callables, each holding a list of functions for future invocation.
if we need more complex command usage (e.g. with undo) then we just need to keep necessary state, we could put it within classes like MacroCommand and we can use a closure to hold the internal state of a function between calls.

Chapter Summary

the GOF book’s patterns should be seen as steps in the design process of a system rather than end-points or structures that have to be implemented.
this will allow us to not mindlessly add in boilerplate or structures that actually would have better ways of getting implemented if we had thought about the language’s idioms
- In python’s case, functions or callable objects provide a more natural way of implementing callbacks in Python than mimicking the Strategy or the Command patterns

Part III. Classes and Protocols

Chapter 11. A Pythonic Object

Learning Objectives:

builtins that convert objects to other types
alternative constructors
extending the formatting mini language

What’s New in This Chapter

Object Representations

python has the following ways to get common string representations. everything except for bytes should return Unicode strings:
- repr(), depends on __repr__: developer’s POV
- str() depends on __str__: user’s POV
- bytes() depends on __bytes__ for byte sequence
- format() depends on __format__: for f-strings
  str.format() method also relies on this.

Vector Class Redux

this example is really useful

some notes from the example:

__iter__ is what makes the object iterable, so that we can do things like:
- unpacking and so on
we can implement this by using a generator expression to yield the components one after the other
kiv the implentation of __eq__ and operator overloading until then

An Alternative Constructor

@classmethod allows a method to be called on a class.

1
2
3
4
5
6
  @classmethod
  def frombytes(cls, octets):
          typecode = chr(octets[0])
          memv = memoryview(octets[1:]).cast(typecode)

          return cls(*memv)

Naturally, this ISN’T a static method. Here’s an outline of the diffs. The next section is about this actually.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
  Yes, class methods and static methods are different in Python; they serve distinct purposes and have different access to class and instance data.

  - **Class methods** are defined using the `@classmethod` decorator. Their first parameter is `cls`, which refers to the class itself (not an instance). Class methods can access and modify class-level attributes and are often used for factory methods or operations that affect the class as a whole. They cannot directly access or modify instance-specific data[1][2][5][6][8].

  - **Static methods** are defined using the `@staticmethod` decorator. They do not take `self` or `cls` as their first argument and have no automatic access to either instance or class state. Static methods are utility functions, included in a class for organizational purposes, but they behave like plain functions that happen to live in a class’s namespace. They cannot access or modify class or instance data[1][2][4][5][6][8].

  ### Key Differences

  | Aspect                     | Class Method                  | Static Method                      |
  |----------------------------|-------------------------------|------------------------------------|
  | Decorator                  | `@classmethod`                | `@staticmethod`                    |
  | First parameter            | `cls` (the class)             | None required                      |
  | Accesses class data?       | Yes, via `cls`                | No                                 |
  | Accesses instance data?    | No                            | No                                 |
  | Use case                   | Factory methods, class state  | Utility functions                  |
  | Can modify class state?    | Yes                           | No                                 |
  | Can modify instance state? | No                            | No                                 |

  #### Example Usage

  ```python
  class Example:
      count = 0

      @classmethod
      def increment_count(cls):
          cls.count += 1

      @staticmethod
      def add(x, y):
          return x + y
  ```

  - `increment_count()` can change `Example.count`.
​  - `add(2, 3)` simply returns the sum and does not access or change any class or instance data[2][5][6][8].

  In summary: **Class methods operate on the class and its state; static methods are independent and do not touch class or instance data.**

  [1] https://realpython.com/instance-class-and-static-methods-demystified/
  [2] https://www.geeksforgeeks.org/python/class-method-vs-static-method-python/
  [3] https://stackoverflow.com/questions/136097/what-is-the-difference-between-staticmethod-and-classmethod-in-python
  [4] https://www.python-engineer.com/posts/difference-classmethod-and-staticmethod/
  [5] https://www.linkedin.com/pulse/exploring-differences-between-class-methods-static-python
  [6] https://accuweb.cloud/resource/articles/class-vs-static-method-in-python
  [7] https://www.boardinfinity.com/blog/class-and-static-method-in-python-differences/
  [8] https://flexiple.com/python/class-method-vs-static-method
  [9] https://www.youtube.com/watch?v=PIKiHq1O9HQ

`@classmethod` Versus `@staticmethod`

classmethod: to define a method that operates on the class and not on instances. classmethod changes the way the method is called, so it receives the class itself as the first argument, instead of an instance.
a good use case is for alternative constructors like we see here in frombytes, or for operations that work at the class-level
NOTE: having learnt java in school, this seems to directly map to the use of static functions in java.
HOWEVER @classmethod in Python is not the same as Java static methods. In Python, classmethods are more powerful: they get the class as an argument, so they are aware of inheritance (subclasses), and can instantiate correct subclasses—something Java static methods cannot directly do, since they are not polymorphic in the same way and are resolved at compile-time
staticmethod: this changes a method so that it receives no special first argument. In essence, a static method is just like a plain function that happens to live in a class body, instead of being defined at the module level.
this just has no access to the class or instance data.
seems like there aren’t many good usecases for this.

Formatted Displays

the different invokers delegate the formatting logic to the dunder method __format__(<format_spec>)
two ways to get the formatting specifier:
1. the second arg in format(my_obj, format_spec)
2. within replacement fields (which are {}) that delimit the specifier within an f-string or the fmt in fmt.str.format().
  e.g. '1 BRL = {rate:0.2f} USD'.format(rate=brl)
  in this example, the rate is not part of the specifier, it’s the kwarg for the replacement field. The actual specifier is just '0.2f'
  we could also directly reference this: f'1 USD = {1 / brl:0.2f} BRL' (notice the use of the )
Replacement fields:
A format string that looks like '{0.mass:5.3e}' has 2 separate notations:
1. field name:
  0.mass to the left is the field_name for the replacement syntax.
  it can be an arbitrary expression in an f-string.
2. formatting specifier:
  5.3e after the colon is the formatting specifier
  this is just the formatspec mini language (ref)

FormatSpec Structure

fstrings vs str.format()
- the fstrings allows in place replacement
- the others separate it and make it more of an interpolation. This is great when we wanna separate it.

let me do a bot dump for this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
The **format specifier** (or `format_spec`) is a mini-language used inside replacement fields of Python format strings (e.g., in `'{0.mass:5.3e}'`) to control how the formatted value is presented.

### Breakdown of the Notation:

- **Field name:**
The part before the colon (`:`), like `0.mass` in your example, specifies what value to format. It can be a positional argument (`0`), an attribute access (`mass`), or an arbitrary expression in f-strings.

- **Format specifier:**
The part after the colon (`5.3e` in your example) is the **format spec mini-language** that instructs Python how to render the value. This includes alignment, width, precision, type (e.g., scientific notation), padding, and more.

### About the Format Spec Mini-Language

The format spec language is defined in Python’s string formatting documentation and includes syntax like:

```
[[fill]align][sign][#][0][width][,][.precision][type]
```

Where:

- **fill**: Any character used to fill the padding space.
- **align**: `` (right), `=` (pad after sign), `^` (center).
- **sign**: `+`, `-`, or space for numeric sign display.
- **#**: Alternate form (e.g., include `0x` for hex).
- **0**: Enables zero-padding.
- **width**: Minimum field width as a number.
- **,**: Enables thousands separator.
- **.precision**: Number of decimal places (or significant digits).
- **type**: Type of conversion, such as:
- `e` or `E` for scientific notation,
- `f` or `F` for fixed-point,
- `g` or `G` for general format,
- `d` for decimal integer,
- `s` for string, etc.

Your example `{0.mass:5.3e}` formats the value `mass` of the first argument in scientific notation (`e`), with a total width of 5 characters and 3 digits after the decimal point.

### Practical Use

- Format specs can control numeric formatting (precision, notation).
- They can align text inside fixed-width fields.
- Support thousand separators and padding.
- Allow highly readable and localized string outputs.

### References and Deep Dive

- Official Python docs on [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#formatspec)
- Real Python tutorial on [Python’s Format Mini-Language](https://realpython.com/python-format-mini-language/)
- Examples for format spec syntax are available in many Python formatting guides.

If you want, I can provide a detailed summary or a "bot dump" of the format spec mini-language with examples—just let me know!

[1] https://realpython.com/python-format-mini-language/
[2] https://docs.python.org/3/library/string.html
[3] https://realpython.com/python-string-formatting/
[4] https://stackoverflow.com/questions/44551535/access-the-cpython-string-format-specification-mini-language-parser
[5] https://www.telerik.com/blogs/string-formatting-python
[6] https://docs.python.org/3/tutorial/inputoutput.html
[7] https://learnpython.com/blog/python-string-formatting/
[8] https://pyformat.info
[9] https://mimo.org/glossary/python/formatted-strings

The Format Specification Mini-Language is extensible because each class gets to interpret the format_spec argument as it likes.

Enhancing/extending FormatSpec for our custom objects

extending is as simple as just having a bunch of if-elses pased on the format spec that we get.

seems like it’s a good practice to remove the 'p' suffix from fmt_spec. This is so that the subsequent formatting calls on the individual components downstream this part of the code doesn’t get confused.

Thus, to preserve compatibility with the built-in format machinery, the custom 'p' suffix must be removed before passing the fmt_spec down to format()

elaboration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
    Great question! Let's analyze the code snippet and the rationale behind removing the custom `'p'` suffix from the `fmt_spec` in the `__format__` method.

    ### Context Recap: What Is Happening Here?

    The method `__format__` is being customized in a class that likely represents a 2D vector or complex number-like object with components and angle:

    ```python
    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]
            coords = (abs(self), self.angle())
            outer_fmt = ''
        else:
            coords = self
            outer_fmt = '({}, {})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(*components)
    ```

    - **`fmt_spec`**: The format specification string passed when formatting, e.g., in an f-string.
​    - **Custom `'p'` suffix:** If the format spec ends with `'p'`, the method interprets it as a special request to format the vector in *polar coordinates* — providing `(r, θ)` instead of `(x, y)`.
​    - If `'p'` is detected, it is **stripped off** (`fmt_spec = fmt_spec[:-1]`), so it doesn't confuse the subsequent formatting calls on the individual float components.

    ### Why Remove the Custom `'p'` Suffix from `fmt_spec`?

    #### 1. **Format specifiers need to be valid for the inner formatting calls**

    - The method eventually calls `format(c, fmt_spec)` on each component, where each `c` is a float (radius and angle) or float-like.
​    - The Python built-in formatter (`format()` function) and the numeric format mini-language **do not recognize `'p'`** as a valid format code.
​    - If `'p'` were left in the format spec string, calling `format(float_value, fmt_spec)` would raise a `ValueError` because `'p'` is not a defined type specifier.

    Thus, to **preserve compatibility with the built-in format machinery**, the custom `'p'` suffix must be removed **before** passing the `fmt_spec` down to `format()`.

    #### 2. **Maintaining a clean interface for users**

    - The `'p'` suffix acts as a *signal* to the `__format__` method that **polar formatting** should be used.
​    - By removing it early, the code simply leverages the underlying float formatting logic for the numeric components while exposing a neat, domain-specific format feature.
​    - This lets users write something like `format(vec, ".2fp")` -- meaning "format this vector in polar coordinates, with two decimal places," where `" .2f"` is the standard float format and `'p'` is the custom extension.

    #### 3. **Separation of concerns**

    - The `'p'` flag is handled at the level of `__format__` **only** to choose the representation (Cartesian vs polar).
​    - The output formatting of numeric components is delegated to the built-in or standard format spec mini-language.
​    - Removing `'p'` keeps these concerns cleanly separated and avoids leaking the custom format logic into the lower formatting layers.

    ### Mental Model: Decorator Analogy to Formatter Chains

    We can think of this like a decorator on the `format` function:

    - The outer function (`__format__`) decorates or wraps the usual formatting logic.
​    - It handles the special parameter (`'p'`) that changes *what* is formatted (polar vs Cartesian).
​    - Then it **cleans** the format spec for the underlying calls that format numeric components, passing on only the valid portion (`.2f`, `.3e`, etc.).
​    - This preserves **compatibility, safety, and separation** between the custom feature and standard behavior.

    ### Summary

    | Reason                                   | Explanation                                 |
    |------------------------------------------|---------------------------------------------|
    | Prevent invalid format spec errors       | `'p'` is **not valid** in built-in float format codes; must be removed before numeric formatting calls. |
    | Clean user interface for extended formatting | `'p'` is a **custom flag** for polar representation, distinct from numeric formatting codes. |
    | Separation of concerns                    | Handle polar/cartesian choice separately from numeric formatting of components. |
    | Avoid breaking the built-in `format()` contract | Passing `'p'` would cause exceptions or undefined behavior. |

    ### Optional: What Happens if You Don’t Strip `'p'`?

    If the code passes on the whole string including `'p'`, e.g., `"5.2fp"`, to `format(1.23, "5.2fp")` on a float:

    - Python raises a `ValueError` like:

      ```
      ValueError: Unknown format code 'p' for object of type 'float'
      ```

    Because `p` is an unknown format type in the standard format mini-language.

    If you want, I can provide more detailed examples or links to the Python format spec mini-language documentation or even discuss how this relates to PEP 3101 formatting protocol and how to design robust `__format__` implementations.

    **In essence, the removal of the custom 'p' suffix from the format spec string ensures that this domain-specific extension coexists safely with Python’s built-in formatting system, maintaining robustness and good user experience.**

we should try to avoid confusion for the users of the custom object. Just try not to reuse a code-letter that already exists in similar object types (make reference to the formatspec mini language to see what is already available)

A Hashable Vector2d

what hashability unlocks:
- we can keep sets of our custom object
- we can use our custom object as dict keys
how?
- have to implement __hash__ and __eq__ and make the object immutable.
  technically the immutability (by protecting accesses and using private attrs is not a strict requirement).
- for immutability, fields can be converted to private fields (two leading underscores) and adding @property decorators to define the getter functions for these.

Supporting Positional Pattern Matching

what are positional patterns?

the examples here are non-positional because each value is provided:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  def keyword_pattern_demo(v: Vector2d) -> None:
          match v:
                  case Vector2d(x=0, y=0):
                          print(f'{v!r} is null')
                  case Vector2d(x=0):
                          print(f'{v!r} is vertical')
                  case Vector2d(y=0):
                          print(f'{v!r} is horizontal')
                  case Vector2d(x=x, y=y) if x==y:
                          print(f'{v!r} is diagonal')
                  case _:
                          print(f'{v!r} is awesome')

so a positional pattern would look something like this: case Vector2d(_, 0):

have to add __match_args__: need to add a class attribute named __match_args__, listing the instance attributes in the order they will be used for positional pattern matching
typically at least have the required args within __match_args__

Complete Listing of Vector2d, Version 3

Just to put a pin on the v3 of this didatic example:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
"""
A two-dimensional vector class

>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y)
3.0 4.0
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector2d(3.0, 4.0)
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector2d(0, 0))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector2d.frombytes(bytes(v1))
>>> v1_clone
Vector2d(3.0, 4.0)
>>> v1 == v1_clone
True

Tests of ``format()`` with Cartesian coordinates:
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of the ``angle`` method:
>>> Vector2d(0, 0).angle()
0.0
>>> Vector2d(1, 0).angle()
0.0
>>> epsilon = 10**-8
>>> abs(Vector2d(0, 1).angle() - math.pi/2) < epsilon
True
>>> abs(Vector2d(1, 1).angle() - math.pi/4) < epsilon
True

Tests of ``format()`` with polar coordinates:
>>> format(Vector2d(1, 1), 'p')  # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector2d(1, 1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1, 1), '0.5fp')
'<1.41421, 0.78540>'

Tests of `x` and `y` read-only properties:
>>> v1.x, v1.y
(3.0, 4.0)
>>> v1.x = 123
Traceback (most recent call last):
...
AttributeError: can't set attribute 'x'

Tests of hashing:
>>> v1 = Vector2d(3, 4)
>>> v2 = Vector2d(3.1, 4.2)
>>> len({v1, v2})
2
"""

from array import array
import math


class Vector2d:
    __match_args__ = ('x', 'y')
    typecode = 'd'

    def __init__(self, x, y):
        self.__x = float(x)
        self.__y = float(y)

    @property
    def x(self):
        return self.__x

    @property
    def y(self):
        return self.__y

    def __iter__(self):
        return (i for i in (self.x, self.y))

    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(array(self.typecode, self)))

    def __eq__(self, other):
        if isinstance(other, Vector2d):
            return tuple(self) == tuple(other)
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y))

    def __abs__(self):
        return math.hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))

    def angle(self):
        return math.atan2(self.y, self.x)

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]
            coords = (abs(self), self.angle())
            outer_fmt = '<{}, {}>'
        else:
            coords = self
            outer_fmt = '({}, {})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(*components)

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(*memv)

TO_HABIT: TIL that python has doctests too!
damn.

Private and “Protected” Attributes in Python

there’s no way to actually make the variables private and immutable though
name mangling: why the double underscores is useful to use
consider the case where if we don’t have a good mechanism for “private” attributes, it’s going to be possible for a child subclass to accidentally overwrite a parent-attribute
with this in mind, the mechanism for using the double underscores is to mangle the name with the class name.
Python stores the name in the instance __dict__ prefixed with a leading underscore and the class name, so in the Dog class, __mood (the “private” attr) becomes _Dog__mood, and in Beagle it’s _Beagle__mood.
This language feature goes by the lovely name of name mangling.
It’s a safety feature (to prevent accidentally access / modification) rather than a security feature.
“Protected attributes”:
for those that don’t like the name mangling feature, they can just use a convention of a single underscore and explicitly naming the attribute in a “mangled” fashion.
this has no special interpreter support though.
- GOTCHA: In modules, a single _ in front of a top-level name does have an effect:
  if you write from mymod import *, the names with a _ prefix are not imported from mymod. However, you can still write from mymod import _privatefunc.

Saving Memory with `slots`

instance attributes’ storage model:
- typically stored within a dict (__dict__), which has significant memory overhead
- if we use __slots to hold a sequence of attr names, then it’s an alternative storage model.
  stored in a hidden array / references that use less memory than a dict
using slots:
- the __slots__ attribute holds attrs in either a tuple (similar to the __match_args__) or list.
  Tuple is clearer since it implicitly shows that there’s no changing it.
- can only be defined when the class is defined, can’t be updated (referring to its shape) thereafter
- COUNTER-INTUITIVE:
  a subclass only partially inherits the effect of the parent’s __slots__.
  Partial because:
  1. slot attrs defined in the parent will still be stored in the reference array
  2. attrs not stored in the parent will end up being stored in a dict.
  Slots of the supercalsses are added to the slots of the current class.
- we can have both slots (fixed attributes) and dict (dynamic attributes)
if we need weak refs AND we’re using slots, then the slots attributes should include '__weakref__'
NOTE: slots will end up referring to the internal storage variables and match args will ref to the public attribute names for positional pattern matching.

Simple Measure of slot Savings

here’s how they did the checking of ram usage and such

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import importlib
import sys
import resource

NUM_VECTORS = 10**7

module = None
if len(sys.argv) == 2:
    module_name = sys.argv[1].replace('.py', '')
    module = importlib.import_module(module_name)
else:
    print(f'Usage: {sys.argv[0]} <vector-module-to-test>')

if module is None:
    print('Running test with built-in `complex`')
    cls = complex
else:
    fmt = 'Selected Vector2d type: {.__name__}.{.__name__}'
    print(fmt.format(module, module.Vector2d))
    cls = module.Vector2d

mem_init = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f'Creating {NUM_VECTORS:,} {cls.__qualname__!r} instances')

vectors = [cls(3.0, 4.0) for i in range(NUM_VECTORS)]

mem_final = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f'Initial RAM usage: {mem_init:14,}')
print(f'  Final RAM usage: {mem_final:14,}')

Essentially, the slot version uses half as much RAM and is faster.

Summarizing the Issues with slots
1. remember to redeclare __slots__ in each subclass to prevent their instances from having __dict__
2. the instance slots are binded ONLY at the time when the class is defined
3. classes can’t use @cached_property decorator unless the slot attributes includes '__dict__'
4. class instances can’t be targets of weak references unless we add '__weakref__' in the slots.

Overriding Class Attributes

a common use case for class attributes is to provide default values for instance attributes
what happens is that self.typecode gets default resolved by treating it as a class attribute (in the case that there’s no such instance attribute).
If you want to change a class attribute, you must set it on the class directly, not through an instance.
So do Vector2d.typecode = 'f' instead of doing v1.typecode = 'f'
the more idiomatic way is to subclass just to customise a class data attribute
subclass to customise

Chapter Summary

we can add in more control to our code (adding getters, keeping attributes immutable) on a need-basis since the consumers / places the class will be referred from will not change.
So we can stick to public attributes first.
Implementing setters and getters up-front is a distraction from the goal. In Python, we can simply use public attributes, knowing we can change them to properties later, if the need arises.
extra: in java’s enforced privacy idioms, the privacy is only really enforced if there’s a Java Security Manager that is configured and deployed on top of the java app.
this is because the reflections API (introspection tooling for Java) can be used to read the value of a private field.

Chapter 12. Special Methods for Sequences

Objectives for this chapter:
1. Make the toy vector implementation behave like a standard Python immutable flat sequence. with float elements
2. supports the following:
  1. basic sequence protocol __len__ and ___getitem__
  2. safe representation of instances with many items
  3. slicing supported properly
  4. aggregate hashing that considers every element
  5. custom formatting extensions
Useful TRICKS:
- we can get the class of an instance by doing cls = type(self)

What’s New in This Chapter

Vector: A User-Defined Sequence Type

Vector space benefits: use cases of vectors > 3 dims:
- for implementing N-dimensional vectors for info retrieval
- for vector space models, cosine similarity is usually the key metric for relevance.
the takes on the vector implementation behaviour are not mutually exclusive, they build on each other

Vector Take #1: Vector2d Compatible

the best practice for a sequence constructor is to take the data as an iterable argument in the constructor, like all built-in sequence types do.
remember the goal for a good implementation of __repr__ is that it should give serviceable output such that a user can have a chance of identifying the receiver (self).
the reprlib.repr() can be used to get a limited-lenght representation

Protocols and Duck Typing

Protocols:

context of object-oriented programming, a protocol is an informal interface, defined only in documentation and not in code.
it’s ONLY a typing/tooling construct for static analysis, it supports structural subtyping / static duck-typing.
we can partially implement part of a protocol if we wish, depending on the contextual requirements

there’s 2 kinds of protocols:

static protocols

Definition:

Static protocols in Python refer to protocol classes (from typing.Protocol) that exist solely for static type analysis during development—they have no effect at runtime unless specially marked.

Purpose:

To provide interfaces that static type checkers (like mypy or Pyright) can use for verifying whether an object “matches” a required set of methods/attributes, regardless of explicit inheritance.

Behavior:

A class matches a static protocol if it provides ALL required methods/attributes (matching names and type signatures).

There is no runtime enforcement or validation by default—type conformance is only checked when tools like mypy analyze your code.

Classes do not need to inherit from the protocol to be considered as conforming to it for static analysis

Use case:

Ensuring that different objects used in a function provide a required interface (“static duck typing”), enabling type-safe polymorphism and generic programming.

1
2
3
4
5
6
      from typing import Protocol

      class SupportsClose(Protocol):
          def close(self) -> None: ...

      # Any class with a .close() method matches SupportsClose for type checking

dynamic protocols Definition:

Dynamic protocols are protocol classes designed to support runtime checking of protocol conformance, in addition to static analysis.

Purpose:

To enable both static type checking and runtime assertions that an object supports a given protocol interface.

How:

Achieved by decorating the protocol class with @typing.runtime_checkable

Behavior:

At runtime, you can use isinstance(obj, ProtocolClass) to check if an object supports the protocol (i.e., implements the required methods/attributes).

The protocol still does not require explicit inheritance—conformance is structural.

1
2
3
4
5
6
7
8
      from typing import Protocol, runtime_checkable

      @runtime_checkable
      class SupportsClose(Protocol):
          def close(self) -> None: ...

      obj = open("file.txt")
      isinstance(obj, SupportsClose)  # True if .close() exists with correct signature

Vector Take #2: A Sliceable Sequence

delegation is an easy way to support the protocol.
Have to ensure that the types don’t change for the ones that are supposed to return our custom type, the example being used here is for slice functionality, it’s in these instances that we can’t just use delegation and have to explicitly handle it.

How Slicing Works
- some observations on how slicing is handled:
  1. the accessor s[1:5] returns a slice object
  2. we can have multiple slices in our accessing if we do something like s[1:5, 8:10] and we’ll get something like this:
    (slice(1, 5, None), slice(8, 10, None))
    from which we conclude:
    1. it’s a tuple (of slice objects) that is being returned
    2. the tuple may return multiple slice objects
  3. slice is a builtin type, with attrs start, stop, step and indices
    we found this by doing dir(slice)
    - indices exposes the tricky logic that’s implemented in the built-in sequences to gracefully handle missing or negative indices and slices that are longer than the original sequence. This method produces “normalized” tuples of non-negative start, stop, and stride integers tailored to a sequence of the given length.
      NOTE: we don’t need to implement this for the vector example here because we’ll be delegating it to the _components array

A Slice-Aware __getitem__

to make Vector behave as a sequence, we need __len__ and __getitem__

both are essential to handle slicing correctly

There’s 2 cases to handle:

case 1: we’re accessing via a slice
in this case, we have to extract out the class and then build another Vector instance from the slice of the components array.
this is what allows us to properly return Vector classes on sliced accesses.
case 2: we’re accessing via a single index
then we can extract out the index from the key using operator.index(key)
operator.index() function calls the __index__ special method. The function and the special method. It’s defined in this PEP 357
it’s different from int in the sense that operator.index() will return a TypeError for non-int arguments supplied as an attempt to access an index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  def __len__(self):
          return len(self._components)

  def __getitem__(self, key):
          # case 1: we're accessing via a slice
          if isinstance(key, slice):
                  cls = type(self)
                  return cls(self._components[key])

          # case 2: we're accessing via a single index
          index = operator.index(key)
          return self._components[index]

Vector Take #3: Dynamic Attribute Access

the __getattr__ is the fallback function if a name is not found within the various hierarchy graphs (not in instance, not in class, not in inheritance graph)

KIV part 4 of the textbook for more info on attribute lookups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  __match_args__ = ('x', 'y', 'z', 't') # allows positional pattern matching

  def __getattr__(self, name):
          cls = type(self)
          try:
                  pos = cls.__match_args__.index(name)
          except ValueError:
                  pos = -1

          if 0 <= pos < len(self._components):
                  return self._components[pos]

          msg = f'{cls.__name__!r} object has no attribute {name!r}'
          raise AttributeError(msg)

GOTCHA: since __getattr__ is a fallback, the following snippet behaves inaccurately

this is because when we do the v.x, it gets accessed to a new attribute called v.x within instance v. Therefore, the name resolution never gets done by the fallback (__getattr__)

The implementation for __getattr__ also doesn’t account for such names

$\implies$ we implement __setattr__ because the problem here is in the attribute setting, that’s not behaving properly here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  def __setattr__(self, name, value):
          cls = type(self)
          if len(name) == 1:
                  if name in cls.__match_args__:
                          error = 'readonly attribute {attr_name!r}'
                  elif name.islower():
                          error = "can't set attributes 'a' to 'z' in {cls_name!r}"
                  else:
                          error = ''
                  if error:
                          msg = error.format(cls_name=cls.__name__, attr_name=name)
                          raise AttributeError(msg)

          # default: use the superclass's __setattr__
          super().__setattr__(name, value)

For this example, we want the x and y to be readonly, that’s why we’re throwing attribute errors.

NOTE: usually getters and setters come together to ensure some consistency in the use of the objects.
here, we had to implement both __getattr__ and __setattr__
NOTE: we shouldn’t use __slots__ as a shortcut to prevent instance attribute creation, they should be used only to save memory, when needed. In this case, we prevent readonly attribute overwrites by implementing the __setattr__ properly that handles this.

Vector Take #4: Hashing and a Faster ==

implementing the hash function that is performant

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  import functools
  import operator

  def __eq__(self, other):
          return tuple(self) == tuple(other)

  def __hash__(self):
          # NOTE: use generator here for lazy operations.
          hashes = (hash(x) for x in self._components)
          return functools.reduce(operator.xor, hashes, 0)

alternatively, hash could have been implemented as:

1
2
3
          def __hash__(self):
                  hashes = map(hash, self._components)
                  return functools.reduce(operator.xor, hashes)

so the fast hash here can use an XOR:
functools.reduce(lambda a, b: a ^ b, range(n))
or using operator.xor like so functools.reduce(operator.xor, range(n))
interesting: we can see the initializer ALSo as a value to return on empty sequence (in addition to the usual “first argument in the reducing loop”).
for +, |, ^ the initializer should be 0, but for *, & it should be 1.
TO_HABIT: remember that operator provides the functionality of all Python infix operators in function form, so using it will prevent custom lambda definitions
TO_HABIT: using functools.reduce for the fast compute of a hash with huge number of components is a good use case for using reduce.

improving the performance of __eq__ doing the tuple conversion will be expensive for large vectors.

the better implementation reminds me of Java style:

1
2
3
4
5
6
7
  def __eq__(self, other):
          if len(self) != len(other):
                  return False
          for a, b in zip(self, other):
                  if a != b:
                          return False
          return True

a one liner:

1
2
  def __eq__(self, other):
          return len(self) == len(other) and all(a == b for a, b in zip(self, other))

Vector Take #5: Formatting

Chapter Summary

So this is the final code, vector_v5.py:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
"""
A multidimensional ``Vector`` class, take 5

A ``Vector`` is built from an iterable of numbers:
>>> Vector([3.1, 4.2])
Vector([3.1, 4.2])
>>> Vector((3, 4, 5))
Vector([3.0, 4.0, 5.0])
>>> Vector(range(10))
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])

Tests with two dimensions (same results as ``vector2d_v1.py``):
>>> v1 = Vector([3, 4])
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector([3.0, 4.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector([0, 0]))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0])
>>> v1 == v1_clone
True

Tests with three dimensions:
>>> v1 = Vector([3, 4, 5])
>>> x, y, z = v1
>>> x, y, z
(3.0, 4.0, 5.0)
>>> v1
Vector([3.0, 4.0, 5.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0, 5.0)
>>> abs(v1) # doctest:+ELLIPSIS
7.071067811...
>>> bool(v1), bool(Vector([0, 0, 0]))
(True, False)

Tests with many dimensions:
>>> v7 = Vector(range(7))
>>> v7
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])
>>> abs(v7) # doctest:+ELLIPSIS
9.53939201...

Test of ``.__bytes__`` and ``.frombytes()`` methods:
>>> v1 = Vector([3, 4, 5])
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0, 5.0])
>>> v1 == v1_clone
True

Tests of sequence behavior:
>>> v1 = Vector([3, 4, 5])
>>> len(v1)
3
>>> v1[0], v1[len(v1)-1], v1[-1]
(3.0, 5.0, 5.0)

Test of slicing:
>>> v7 = Vector(range(7))
>>> v7[-1]
6.0
>>> v7[1:4]
Vector([1.0, 2.0, 3.0])
>>> v7[-1:]
Vector([6.0])
>>> v7[1,2]
Traceback (most recent call last):
...
TypeError: 'tuple' object cannot be interpreted as an integer

Tests of dynamic attribute access:
>>> v7 = Vector(range(10))
>>> v7.x
0.0
>>> v7.y, v7.z, v7.t
(1.0, 2.0, 3.0)

Dynamic attribute lookup failures:
>>> v7.k
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'k'
>>> v3 = Vector(range(3))
>>> v3.t
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 't'
>>> v3.spam
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'spam'

Tests of hashing:
>>> v1 = Vector([3, 4])
>>> v2 = Vector([3.1, 4.2])
>>> v3 = Vector([3, 4, 5])
>>> v6 = Vector(range(6))
>>> hash(v1), hash(v3), hash(v6)
(7, 2, 1)

Most hash codes of non-integers vary from a 32-bit to 64-bit CPython build:
>>> import sys
>>> hash(v2) == (384307168202284039 if sys.maxsize > 2**32 else 357915986)
True

Tests of ``format()`` with Cartesian coordinates in 2D:
>>> v1 = Vector([3, 4])
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of ``format()`` with Cartesian coordinates in 3D and 7D:
>>> v3 = Vector([3, 4, 5])
>>> format(v3)
'(3.0, 4.0, 5.0)'
>>> format(Vector(range(7)))
'(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0)'

Tests of ``format()`` with spherical coordinates in 2D, 3D and 4D:
>>> format(Vector([1, 1]), 'h') # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector([1, 1]), '.3eh')
'<1.414e+00, 7.854e-01>'
>>> format(Vector([1, 1]), '0.5fh')
'<1.41421, 0.78540>'
>>> format(Vector([1, 1, 1]), 'h') # doctest:+ELLIPSIS
'<1.73205..., 0.95531..., 0.78539...>'
>>> format(Vector([2, 2, 2]), '.3eh')
'<3.464e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 0, 0]), '0.5fh')
'<0.00000, 0.00000, 0.00000>'
>>> format(Vector([-1, -1, -1, -1]), 'h') # doctest:+ELLIPSIS
'<2.0, 2.09439..., 2.18627..., 3.92699...>'
>>> format(Vector([2, 2, 2, 2]), '.3eh')
'<4.000e+00, 1.047e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 1, 0, 0]), '0.5fh')
'<1.00000, 1.57080, 0.00000, 0.00000>'
"""

from array import array
import reprlib
import math
import functools
import operator
import itertools

class Vector:
    typecode = 'd'
    __match_args__ = ('x', 'y', 'z', 't')

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find('['):-1]
        return f'Vector({components})'

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __eq__(self, other):
        return (len(self) == len(other) and
                all(a == b for a, b in zip(self, other)))

    def __hash__(self):
        hashes = (hash(x) for x in self)
        return functools.reduce(operator.xor, hashes, 0)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
        if 0 <= pos < len(self._components):
            return self._components[pos]
        msg = f"{cls.__name__!r} object has no attribute {name!r}"
        raise AttributeError(msg)

    def angle(self, n):
        r = math.hypot(*self[n:])
        a = math.atan2(r, self[n-1])
        if (n == len(self) - 1) and (self[-1] < 0):
            return math.pi * 2 - a
        else:
            return a

    def angles(self):
        return (self.angle(n) for n in range(1, len(self)))

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('h'):  # hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = itertools.chain([abs(self)], self.angles())
            outer_fmt = '<{}>'
        else:
            coords = self
            outer_fmt = '({})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(', '.join(components))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

uses itertools.chain for the __format__ function
KIV the generator tricks until chapter 17

Chapter 13. Interfaces, Protocols, and ABCs

python has 4 ways to define and use interfaces:
1. Duck typing
2. goose typing: using ABCs
  ^ focus of this chapter
3. static typing: traditional static typing using the typing module
4. static duck typing
  popularised by GoLang, supported by typing.Protocol
this chapter is about the typing that revolves around interfaces.

The Typing Map

The two dimensions introduced here:

runtime vs static checking
structural (based on method’s provided by the object) vs nominal (based on the name of its class/superclass)

What’s New in This Chapter

Two Kinds of Protocols

In both cases, we don’t need to do any sort of explicit registration for the protocol (or to use inheritance).

Dynamic Protocol
Implicit, defined by convention as per documentation.
A good example is the protocols within the interpreter, seen in the “Data Model” of the language ref. e.g. Sequence, Iterable
Can’t be verified by type checkers
Static Protocol
An explicit definition as a subclass of typing.Protocol
ABCs ca n be used to define an explicit interface (similar in outcome to static protocols).

Programming Ducks

Python Digs Sequences
- this is pretty cool: Python manages
to make iteration and the in operator work by invoking __getitem__ when __iter__ and __contains__ are unavailable.
The interpreter uses special methods (__getitem__, __iter__, etc.) dynamically to support iteration and membership tests even without explicit ABCs. This is a classic Python dynamic protocol idiom.

Monkey Patching: Implementing a Protocol at Runtime
- Monkey patching is dynamically changing a module, class, or function at runtime, to add features or fix bugs.
- in this example, we want a custom class to automatically work with random.shuffle() so that we can shuffle that sequence.
  We inspect random.shuffle() and figure out what it’s underlying functionality is, which is to rely on the __setitem__ function.
  So we can monkey patch the __setitem__ and we can achieve our desired outcome. This means that we change the module @ runtime.
- Monkey patching is powerful, but the code that does the actual patching is very tightly coupled with the program to be patched, often handling private and undocumented attributes.
- Python does not let you monkey patch the built-in types. I actually consider this an advantage, because you can be certain that a str object will always have those same methods. This limitation reduces the chance that external libraries apply conflicting patches.

Defensive Programming and “Fail Fast”

TO_HABIT: the examples here show how to do a check by checking whether it can behave like a duck instead of checking whether it’s a duck. This is a superior way of doing meaningful type checks in my opinion but there’s some possible pitfalls into doing so.

we want to be able to detect dynamic protocols without explicit checks
Failing fast means raising runtime errors as soon as possible, for example, rejecting invalid arguments right a the beginning of a function body.
Duck type checking means we should check behaviour instead of doing explicit typechecks.

Some patterns:

IDIOM: use a builtin function instead of doing type-checking $\implies$ check for method presense
- in the example, to check if the input arg is a list, instead of doing a type check at runtime, it’s suggested to use the list() constructor because that constructor will handle any iterable that fits in memory. Naturally, this copies the data.
- If we can’t accept copying, then we can do runtime check using isinstance(x, abc.MutableSequence)
- warning: what if infinite generator?
  eliminate that by calling len() on the arg, tuples, arrs and such will still pass this check

Defensive code leveraging duck types can also include logic to handle different types without using isinstance() or hasattr() tests.

suppose we want to type hint that “field_names must be a string of identifiers separated by spaces or commas”,

then our check could do something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
       Example 13-5. Duck typing to handle a string or an iterable of strings
       try: # this is an attempt, assumes that it's a string
               field_names = field_names.replace(',', ' ').split()
       except AttributeError:
               pass # if not string, then can't continue testing, just pass it

       # converting to a tuple ensures that it's iterable and we test our own copy of it (to prevent accidentally changing the input)
       field_names = tuple(field_names)
       if not all(s.isidentifier() for s in field_names):
               raise ValueError('field_names must all be valid identifiers')

This is an expressive form of using duck typing to our advantage for type checking.

Goose Typing

ABCs help to define interfaces for explicit type checking at runtime (and also work for static type checking).
- complement duck typing
- introduce virtual subclasses:
  - classes that don’t inherit from a class but are still recognized by isinstance() and issubclass()

Waterfowl and ABCs
- the strong analogy of duck typing to actual phenetics (i.e. phenotype-based) classification is great, mimics how we do duck typing (based on shape and behaviour)
- how important is the explicit type checking depends on the usage-context of an object
- parallel objects can produce similar traits and this is the case where we may have false positives on the classifications
- that’s why we need a more “explicit” way of typechecking and that’s where “goose typing” comes into the picture.
- python’s ABCs provide the register class-method which lets us “declare” that a certain class becomes a “virtual” subclass of an ABC (meets name, signature and semantic contract requirements)
  we can declare this even if the class need not have been developed with any awareness of the ABC (and wouldn’t have inherited from it) $\implies$ this is structural subtyping with ABCs where the structure is sufficient
- registration can be implicit (without us needing to register custom classes), just have to implement the special methods.
- key advice:
  1. When implementing a class that represents a concept from the standard library’s ABCs (e.g., Sequence, Mapping, Number, etc.):
    Explicitly inherit from or register with the appropriate ABC if your class fits the contract of that ABC.
    This helps make your class reliably compatible with tools, libraries, or Python code that expects these standard interfaces.
  2. If a library or framework you use defines classes but omits to formally subclass/register with the standard ABCs:
    Perform the ABC registration yourself at program startup (e.g., by manually registering the class with collections.abc.Sequence).
    This will ensure isinstance(obj, collections.abc.Sequence) checks work as intended, improving reliability and interoperability.
  3. When checking if an object matches a conceptual interface (such as being a “sequence”):
    Use isinstance(the_arg, collections.abc.Sequence) rather than checking for method presence (duck typing) or relying on type names.
    This is more future-proof and integrates with Python’s built-in and third-party tools.
  4. Avoid defining your own custom ABCs or metaclasses in production code:
    These advanced features are often overused and can lead to unnecessarily complex, harder-to-maintain code.
    The author likens custom ABCs/metaclasses to a “shiny new hammer”: appealing when you first learn them, but prone to misuse (“all problems look like a nail”).
    Emphasizes sticking to straightforward, simple code for better maintainability and happiness for you and future developers.

Tension between Duck Typing and Goose Typing

They are a continuum, not an either/or. Use duck typing for everyday, flexible code, and goose typing (ABCs and explicit interface contracts) where precision, reliability, and maintainability matter (public APIs, reusable libraries, systems programming).

See richer elaboration here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
You've highlighted a key *tension* in Python between **duck typing** ("if it quacks like a duck, treat it as a duck") and **ABC-based explicit/virtual interface type checking** ("goose typing," using `isinstance` with domain-driven interfaces made explicit via ABCs). Let’s clarify how these approaches relate and when to use each.

### Duck Typing (Classic Pythonic Style)

**Principle**:
*Check for behavior, not type.*
If you expect an object to behave like a sequence, just try to use it as such (e.g., iterate over it, index into it). If it fails at runtime, raise an error immediately (“fail fast”).

**Code example:**
```python
def head(seq):
    return seq[0]  # works for any object supporting indexing
```

- If `seq` supports indexing (`__getitem__`), this just works.
- This is *pure duck typing*—no explicit type checks or interface enforcement.

### Goose Typing: ABCs and Virtual Subclassing

**Principle**:
Use Python’s Abstract Base Classes (`collections.abc.Sequence`, `MutableMapping`, etc.) to define *explicit interfaces* for families of types (not just classes).

- You can *register* a class as a "virtual subclass" of an ABC, even if it doesn't inherit from it.
- Now, `isinstance(obj, collections.abc.Sequence)` will treat your class as a Sequence.

**Benefits:**
- Enables more robust and explicit type checks.
- Useful when interoperating with other libraries, for documentation, for static-checking tools, or when you want to *fail fast* with a clear, predictable check.
- Python’s standard library and third-party tools often use `isinstance(x, ABC)` checks internally, so registering your custom class improves **interoperability**.

### How do these approaches relate?

- **Duck typing**: Write code that operates on any object with the required behaviors (methods/attributes), with no formal interface contract.
    - Pro: maximal flexibility; principle of least surprise for Pythonistas.
    - Con: can result in runtime errors that are sometimes hard to debug or anticipate.

- **Goose typing**: Explicitly document and/or enforce the “duck” shape via ABCs—*preferably standard library ABCs*—and use `isinstance` checks for run-time or static confirmation.
    - Con: Slightly more rigid, but much clearer and more future-proof if you want predictable, explicit interfaces.

**In modern large-scale or library codebases, combining duck typing for flexibility with ABCs for explicit interface definition (“goose typing”) gives you both power and predictability.**

### Practical Guidance from Your Notes

- Use **duck typing** by default; just try the operation and handle exceptions—this is flexible, concise, and Pythonic.
- For situations where:
    - You need to *fail fast*,
    - You want to document and defend API boundaries,
    - You’re interacting with other libraries/frameworks using ABCs,

  **it’s recommended to explicitly register your classes with appropriate ABCs and use `isinstance` checks**.

- This isn't an either/or—it's a continuum. Use *duck typing* for everyday, flexible code, and *goose typing* (ABCs and explicit interface contracts) where precision, reliability, and maintainability matter (public APIs, reusable libraries, systems programming).

### Metaphor Recap

- **Duck typing**: "Does it walk and quack like a duck? Use it as a duck."
- **Goose typing**: "Let's agree on what waterfowl are. If you’re a duck, get your name on the official registry, so everyone knows for sure and tools can verify it."

### References
- [Python's isinstance docs][1]
- [Structural subtyping via ABCs and the register method][11]

**Summary:**
- *"Duck typing"* means acting on behaviors and handling failures.
- *"Goose typing"* means using ABCs for explicit, reliable interface checks (`isinstance` and virtual subclassing).
- Use each to the degree that your problem, API guarantees, and interoperability demand. Both are core to Pythonic code.

Let me know if you want idiomatic examples for when to use each or more detailed pros/cons for specific scenarios!

[1] https://realpython.com/what-does-isinstance-do-in-python/
[2] https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance
[3] https://ioflood.com/blog/python-isinstance-function-guide-with-examples/
[4] https://switowski.com/blog/type-vs-isinstance/
[5] https://www.reddit.com/r/learnpython/comments/hg6fyd/do_not_compare_types_use_isinstance_instead/
[6] https://github.com/python/typing/issues/1363
[7] https://stackoverflow.com/questions/3111611/is-this-use-of-isinstance-pythonic-good
[8] https://realpython.com/python-type-checking/
[9] https://www.curiousefficiency.org/posts/2004/12/type-checking-in-python/
[10] https://typing.python.org/en/latest/guides/type_narrowing.html
[11] https://peps.python.org/pep-0544/

Subclassing an ABC
This is about using goose-typing in practice.
- we can use the subclassing as a benchmark for this section on subclassing:
  - needed to adhere strictly to the interface.
    therefore there’s a need to implement concrete versions of all the abstract methods defined in the abc.
  - concrete methods are implemented in terms of the public interface of the class, so it’s possible for us to subclass without any knowledge of the internal structure of the instances.

ABCs in the Standard Library
- some places we can find useful ABCs: collections.abc module (most widely used), io package, numbers package
- from collections.abc
  NOTE: photo is outdated, from python 3.6 Sequence, Mapping and Set are subclassed from Collection, which is a child of Iterable, Container, Sized
  Remember that each of the immutable collections have a mutable subclass.
  - if insinstance(obj, Hashable) returns False, you can be certain that obj is not hashable. But if the return is True, it may be a false positive.
    also for isinstance(obj, Iterable), we might have false negatives. This is because Python may stil be able to iterate over obj using __getitem__
  - TO_HABIT: duck typing is the most accurate way to determine if an instance is hashable/iterable: if we just call hash(obj) / iter(obj)

Defining and Using an ABC

this is only for learning purposes, we should avoid implementing our own ABCs and metaclasses.

A good usecase for ABCs, descriptors, metaclasses are for building frameworks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  # tag::TOMBOLA_ABC[]

  import abc

  class Tombola(abc.ABC):  # <1> subclass abc.ABC to define an ABC

      @abc.abstractmethod
      def load(self, iterable):  # <2> use this decorator, keep the body empty, can include in docstring
          """Add items from an iterable."""

      @abc.abstractmethod
      def pick(self):  # <3>
          """Remove item at random, returning it.

          This method should raise `LookupError` when the instance is empty.
          """

      def loaded(self):  # <4> ABC may include concrete methods.
          """Return `True` if there's at least 1 item, `False` otherwise."""
          return bool(self.inspect())  # <5>

      def inspect(self):
          """Return a sorted tuple with the items currently inside."""
          items = []
          while True:  # <6>
              try:
                  items.append(self.pick())
              except LookupError:
                  break
          self.load(items)  # <7>
          return tuple(items)
  # end::TOMBOLA_ABC[]

some observations:

since this is abstract, we can’t know what the concrete subclasses will actually use for the implementation $\implies$ we end up trying to use the other abstract functions more so than assuming things.
it’s OK to provide concrete methods in ABCs, as long as they only depend on other methods in the interface.
For example, for inspect, we use the abstract pick function and load to return it to the original state.
Before ABCs existed, abstract methods would raise NotImplementedError to signal that subclasses were responsible for their implementation.
NOTE: an @abstractmethod method can have a base implementation. The subclass will still need to override it but the subclass will also be able to access it using super() and build onto / directly use the super functionality.

LANG_LIMITATION: there’s no formal method for “adding” expected error types (exceptions) in Python Interfaces. No exception contracts.

Docs are the only practical way to make expected exceptions explicit in Python interfaces today. Writing thoughtful docstrings and custom exception classes is the accepted best practice. ABCs enforce method presence not exception contracts.

more elaboration here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
     When it comes to specifying or "adding" expected error types (exceptions) in Python interfaces like abstract base classes (ABCs) or general functions, **the language itself provides no formal mechanism** to declare which exceptions a method or function should raise, unlike some statically typed languages that have checked exceptions.

     ### How do we communicate expected error types in Python then?

     1. **Documentation is the de facto standard for specifying expected exceptions**
​        - Docstrings are the primary place to declare what errors a method can raise. This is how Python developers indicate usage interface contracts including possible exceptions.
​        - For example:

          ```python
          def divide(x, y):
              """
              Divide x by y.

              Raises:
                  ZeroDivisionError: If y is zero.
                  TypeError: If inputs are not numbers.
              """
              return x / y
          ```

     2. **ABCs and raising `NotImplementedError` for abstract methods**
​        - When defining abstract methods in ABCs, it is common to raise `NotImplementedError` to indicate subclasses *must* implement that method.
​        - This is the only *exception-related interface* that ABCs imply formally in code.

     3. **Static typing tools (e.g., MyPy) do not check for exceptions raised**
​        - Current Python type checkers mostly ignore exception flow or explicitly declared exceptions. There is no built-in or standard way to express exception contracts in type hints.

     4. **Custom exception classes for domain-specific errors**
​        - For clarity and maintainability, if your interface or library can raise expected errors, you should define and document custom exception classes.
​        - You communicate the valid exceptions by naming them in documentation and/or user guides.

     ### Why does Python not have explicit exception declarations on interfaces?

     - Python follows an **EAFP (Easier to Ask Forgiveness than Permission)** ethos.
​     - Explicit exception declarations would add verbosity and complexity.
​     - Pythonic style encourages **handling exceptions where you can recover** and letting others propagate upward naturally.
​     - **Fail-fast philosophy** encourages letting the system raise unexpected exceptions during development and handling them appropriately in higher layers.

     ### Summary table:

     | Approach               | Mechanism                    | Remarks                                         |
     |------------------------|------------------------------|------------------------------------------------|
     | Expected exceptions    | Documented in docstrings      | Widely accepted convention                      |
     | ABC interface contract | Raise `NotImplementedError`  | Defines required implementations, not errors raised in general |
     | Static typing          | No standard exception syntax | No checked exceptions like in Java, C#         |
     | Custom exceptions      | Define exception classes     | Clarifies error types, improves maintainability|
     | Runtime enforcement    | Try/except handlers          | Handle errors where recovery/alternative is feasible |

     ### Additional notes:

     - If you want to **make expected exceptions more discoverable**, consider tools that generate API docs (Sphinx, pdoc) that especially call out `:raises:` sections in your docstrings.
​     - In complex frameworks, **middleware or wrapper layers** may catch and re-raise or convert exceptions for clearer error handling without explicit declaration in the interface.
​     - Some third-party libraries or custom frameworks might support more formal error policies (contracts), but this is not core Python.

     **In essence:**
     **Docs are the only practical way to make expected exceptions explicit in Python interfaces today.** Writing thoughtful docstrings and custom exception classes is the accepted best practice. ABCs enforce method presence **not** exception contracts.

     If you want, I can help you draft a template for documenting expected exceptions clearly in your Python APIs.

     [1] https://docs.python.org/3/library/exceptions.html
     [2] https://realpython.com/python-built-in-exceptions/
     [3] https://stackoverflow.com/questions/57658862/making-an-abstract-base-class-that-inherits-from-exception
     [4] https://docs.python.org/3/library/abc.html
     [5] https://mypy.readthedocs.io/en/stable/error_code_list.html
     [6] https://labex.io/tutorials/python-how-to-handle-abstract-method-exceptions-437221
     [7] https://blog.sentry.io/practical-tips-on-handling-errors-and-exceptions-in-python/
     [8] https://accuweb.cloud/resource/articles/explain-python-valueerror-exception-handling-with-examples

ABC Syntax Details
- we used to have the other abstract decorators: @abstractclassmethod, @abstractstaticmethod, @abstractproperty but they’re deprecated now because we can decorator stack
- when decorator stacking, @abc.abstractmethod MUST be the innermost decorator
  the order of decorators matter.
  1 2 3 4 5
  class MyABC(abc.ABC): @classmethod @abc.abstractmethod def an_abstract_classmethod(cls, ...): pass

Subclassing an ABC
- delegation of functions (e.g. init delegates to another ABC’s functions) seems to be a good idea to keep the code consistent
- whether to override the concrete implementations from the ABC is our choice to make

A Virtual Subclass of an ABC

Here’s an example of a subclass:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from random import randrange

from tombola import Tombola

@Tombola.register  # <1> NOTE: being used as a decorator here, could have been a plain function invocation as well
class TomboList(list):  # <2>

    def pick(self):
        if self:  # <3>
            position = randrange(len(self))
            return self.pop(position)  # <4>
        else:
            raise LookupError('pop from empty TomboList')

    load = list.extend  # <5>

    def loaded(self):
        return bool(self)  # <6>

    def inspect(self):
        return tuple(self)

# Tombola.register(TomboList)  # <7>

it’s a “trust me bro” but if we lie, we still get caught by the usual runtime exceptions
issubclass and isinstance will work but there’s no real inheritance of any methods or attributes from the ABC
- this happens because inheritance is guided by the __mro__ class attribute ( for method resolution order ) and in this case, only “real” superclasses exist in the __mro__
syntax:
- usually a plain function invocation, can be done in a decorator style as well
  Tombola.register(TomboList) function invocation style (called after the class definition)
  @Tombola.register (decorator style)

Usage of register in Practice

Structural Typing with ABCs

typically we use nominal typing for ABCs. it happens when we have an explicit inheritance, which registers a class with its parent and this links the name of the parent to the sub class and that’s how at runtime, we can do issubclass checks.

Dynamic and Static Duck Typing are two approaches to static typing

we can do consistent-with structural subtyping as well if the class implements the methods defined in the type

this works because parent subclass (abc.Sized) implements a special class method named __subclasshook__. The __subclasshook__ for Sized checks whether the class argument has an attribute named __len__

this is the implementaion within ABCMeta

1
2
3
4
5
6
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                    return True
        return NotImplemented

we shouldn’t add the hook for our custom functions. It’s not dependable to rely on this implicit behaviour.

Static Protocols

The Typed double Function
- duck typing allows us to write code that is future-compatible!

Runtime Checkable Static Protocols aka Dynamic Protocol

typing.Protocol can be used for both static and runtime checking
if we want to use it for runtime checking, then we need to add @runtime_checkable to the protocol definition
how this works is that typing.Protocol is an ABC and so it supports __subclass__ hook and adding the runtime checkable decorator allows us to make the protocol support isinstance / issubclass checks. Because Protocol inherits from ABC-related machinery, @runtime_checkable allows the __subclasshook__ to behave accordingly for runtime isinstance and issubclass checks.
NOTE: it’s still checking for consistent-with to check if it’s the same type.
caveat: performance/side-effect trade-offs
Careful if side effects or expensive operations if methods checked by __subclasshook__ have such costs

ready to use runtime checkables:

check numeric convertibility:

typing.SupportsComplex

1
2
3
4
5
6
7
8
      @runtime_checkable
      class SupportsComplex(Protocol):
              """An ABC with one abstract method __complex__."""
              __slots__ = ()

              @abstractmethod
              def __complex__(self) -> complex:
                      pass

RECIPE: TO_HABIT: if you want to test whether an object c is a complex or SupportsComplex, you can provide a tuple of types as the second arg to isinstance: isinstance(c, (complex, SupportsComplex))
I had no idea this was a thing.
alternatively, we can use the Complex ABC within the numbers module.
1 2
import numbers isinstance(c, numbers.Complex)
type checkers don’t seem to recognise the ABCs within the numbers abc
typing.SupportsFloat

“Duck Typing is Your Friend”

Often, ducktyping is the better approach for runtime type checking. WE just try the operations you need to do on the object.

So in the complex number situation, we have a few approaches we could take:

approach: runtime checkable static protocols

1
2
3
4
    if isinstance(o, (complex, SupportsComplex)):
            # do something that requires `o` to be convertible to complex
    else:
            raise TypeError('o must be convertible to complex')

approach: goose typing using numbers.Complex ABC

1
2
3
4
    if isinstance(o, numbers.Complex):
            # do something with `o`, an instance of `Complex`
    else:
            raise TypeError('o must be an instance of Complex')

approach:⭐️ duck typing and the EAFP (Easier to ask for forgiveness principle).

1
2
3
4
    try:
            c = complex(o)
    except TypeError as exc:
            raise TypeError('o must be convertible to complex') from exc

Limitations of Runtime Protocol Checks
- @ runtime, type hints are ignored, so are isinstance and issubclass checks against static protocols
- problem: isinstance / issubclass checks only look at the presence or absence of methods, without checking their signatures, much less their type annotations. That would have been too costly.
  this is because that type checking is not just a matter of checking whether the type of x is T: it’s about determining that the type of x is consistent-with T, which may be expensive.
  since they only do this, we can end up getting false positives on these type checks.

Supporting a Static Protocol
- the point below is now deprecated. We can just run it as is.
- using from __future__ import annotations allows typehints to be stored as strings, without being evaluated at import time, when functions are evaluated.
  so if we were to define the return type as the same class that we’re building, then we would have to use this import else it’s a use-before-definition error.
  this is the postponed evaluation of annotations

Designing a Static Protocol

trick: single-method protocols make static duck typing more useful and flexible
After a while, if you realise a more complete protocol is required, then you can combine two or more protocols to define a new one

example Here’s the protocol definition, it has a single function

1
2
3
4
5
6
  from typing import Protocol, runtime_checkable, Any

  @runtime_checkable
  class RandomPicker(Protocol):
      # NOTE the elipsis operator usage
      def pick(self) -> Any: ...

and here are some tests written for it

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import random
from typing import Any, Iterable, TYPE_CHECKING

from randompick import RandomPicker  # <1>

class SimplePicker:  # <2>
    def __init__(self, items: Iterable) -> None:
        self._items = list(items)
        random.shuffle(self._items)

    def pick(self) -> Any:  # <3>
        return self._items.pop()

def test_isinstance() -> None:  # <4>
    popper: RandomPicker = SimplePicker([1])  # <5>
    assert isinstance(popper, RandomPicker)  # <6>

def test_item_type() -> None:  # <7>
    items = [1, 2]
    popper = SimplePicker(items)
    item = popper.pick()
    assert item in items
    if TYPE_CHECKING:
        reveal_type(item)  # <8>
    assert isinstance(item, int)

observations:
1. not necessary to import the static protocol to define a class that implements it

Best Practices for Protocol Design
1. Align with Interface Segregation Principle: clients should not be forecd to depend on interfaces they don’t use. This gives the two following advice:
  1. Narrow interfaces (often with a single method) are more useful.
    Ref Martin Fowler post on role interfaces
  2. Client Code Protocols: Good to define the protocol near the “client code” (where it’s being used) instead of a library.
    Useful for extensibility and mock-testing.
2. Naming:
  just name based on nouns that make sense and is minimalistic, nothing too fancy here.
  - clear concept $\rightarrow$ plain names (Iterator, Container)
  - gives callback methods $\rightarrow$ SupportsX e.g. SupportsRead
  - read/write attrs or getter/setter methods $\rightarrow$ HasX eg. HasItems
3. Create Minimalistic protocols and extend them later by creating derived protocols

Extending a Protocol

1
2
3
4
5
6
from typing import Protocol, runtime_checkable
from randompick import RandomPicker

@runtime_checkable  # <1> need to reimport, this won't get inherited
class LoadableRandomPicker(RandomPicker, Protocol):  # <2> have to define the Protocol
    def load(self, Iterable) -> None: ...  # <3> OOP-like, only need to include the extended function, the super protocol's functions will be "inherited"

instead of adding methods to the original protocol, it’s better to derive a new protocol from it.
keeps protocols minimal and aligns with Interface Segregation Principle – is really narrow interfaces here.
GOTCHA: not entirely the same as inheritance
- the decorator @runtime_checkable needs to be re-applied
- in the super class fields, we still need to add Protocol along with the rest of the protocols that we are extending
- similar to inheritance: the functions being extended will be inherited by the derived class. We only need to indicate the new functions in the derived class.

The numbers ABCs and Numeric Protocols
- Objective: we want to be able to support static type checking, and we want to be able to do this for external libraries that register their types as virtual subclasses of numbers ABCs.
- Current Approach: use the numeric protocols within typing module
- numbers.Number has no methods $\implies$ numeric tower not useful for static type checking (it’s useful for runtime type checking though)
- GOTCHA: decimal.Decimal is not registered as a virtual subclass of numbers.Real. The reason is that, if you need the precision of Decimal in your program, then you want to be protected from accidental mixing of decimals with floating-point numbers that are less precise.
  because real (floats) are less precise and we don’t wanna interchange with them and have information losses.
- Takeaways:
  1. The numbers ABCs are fine for runtime type checking, but unsuitable for static typing.
  2. The numeric static protocols SupportsComplex, SupportsFloat, etc. work well for static typing, but are unreliable for runtime type checking when complex numbers are involved.

Chapter Summary

contrasted dynamic protocols (that support duck typing) and static protocols (static duck typing)
1. for both, just need to implement necessary methods, no explicit registration needed
2. runtime effect:
  Static protocol no runtime effect.
  Dynamic protocol is runtime checkable. Aka when we @runtime_checkable a static protocol, then it becomes a dynamic protocol.
3. NOTE: this is a different contrast from Dynamic Duck Typing vs Static Duck typing
  Dynamic Duck typing is the fail fast approach, where we “try and see it”
  Static Duck Typing is the contract based use of Protocols
  This is a subtle but often confusing distinction. Dynamic duck typing is Python’s inherent runtime behavior, while static duck typing reflects the formal contract via protocols at type-checking time
Python interpreter’s support for sequence and iterable dynamic protocols.
The interpreter uses special methods (__getitem__, __iter__, etc.) dynamically to support iteration and membership tests even without explicit ABCs. This is a classic Python dynamic protocol idiom.
monkey patching: adhering to the protocol @ runtime
defensive programming: detect structural types using try/except and failing fast instead of explicit checks using isinstance or hasattr checks
IDIOM: This is a widely advocated Python idiom: “EAFP (Easier to Ask Forgiveness than Permission)”,
Goose typing:
- creating and using ABCs
- traditional subclassing and registration
- __subclasshook__ special method as a way for ABCs to support structural typing based on methods that fulfill interface define in the ABCs (without a direct registration)
Static protocols
- is kind of the sttuctural interface in the python world.
- @runtime_checkable actually leverages __subclasshoook__ to support structural typing at runtime,
  though the best use of these protocols is with static type checkers.
  type hints make structural typing more reliable.
- design of static protocol:
  - keep the narrow interface
  - keep the definition near to usage
  - extend it when you need to add functionality; in line with interface segregation principle.
Numbers ABCs and Numeric Protocols:
- numeric static protocols (e.g . SupportsFloats) has shortcomings
main message of this chapter is that we have four complementary ways of programming with interfaces in modern Python, each with different advantages and drawbacks.
You are likely to find suitable use cases for each typing scheme in any modern Python codebase of significant size.
Rejecting any one of these approaches will make your work as a Python programmer harder than it needs to be.

Possible Misconceptions
Adjacent Gotchas and Difficult Concepts You Might Misconstrue or Overlook
1. Runtime Checking Limits of Dynamic Protocols: Runtime `isinstance` checks with `@runtime_checkable` protocols are limited to checking presence of attributes/methods (using `hasattr` internally) and do not verify method signatures, argument types, or behavior correctness. This can give false positives if method signatures do not match—only static type checkers guarantee that.
2. `_subclasshook_` Complexity and Pitfalls: While powerful, implementing or overriding `_subclasshook_` can be tricky because it must handle all subclass checks gracefully and correctly, respecting caching and fallback behaviors to avoid subtle bugs. Excessive or ill-considered use may confuse the MRO and class hierarchy assumptions.
3. Difference Between ABC Registration and Protocol Conformance: Registering a class as a virtual subclass of an ABC influences `isinstance` checks but does not affect static type checking, whereas protocols influence static (and optionally runtime) interface conformance. Bridging these self-consistently in a codebase can sometimes be confusing.
4. Protocols and Inheritance vs Nominal Typing: Protocols enable structural typing, eschewing nominal inheritance for interface compatibility, but this can lead to subtle type checking behaviors where classes unintentionally conform just by method names, masking incorrect assumptions. This requires developers to design protocols and type hints thoughtfully.
5. Static Type Checking Requires Adoption of Tooling: The benefits of static protocols are realized only when using type checkers; pure runtime execution won’t enforce protocols unless combined with runtime checkable features. Adoption means introducing additional tooling and some learning curve for teams.
6. Monkey Patching Risks: While useful at runtime for dynamic protocol adherence, monkey patching comes with maintainability and debugging risks, especially when changing behaviors of widely used or critical classes. It can also mask design flaws if overused.
7. Difference Between Static and Runtime Failure Modes: Static protocols help catch interface noncompliance early, but dynamic duck typing detects mismatches only at runtime, often deeper within program flow, affecting error locality and debuggability.
Supporting References
- Real Python: Python Protocols: Leveraging Structural Subtyping (2024)
- The Turing Taco Tales: Static Duck Typing With Python’s Protocols (2024)
- Xebia: Protocols In Python: Why You Need Them (2022)
- PEP 544 – Protocols: Structural Subtyping (2017) (Historical and spec source)
- Python official docs on typing and Abstract Base Classes
Mental Model Summary for You as a Tech Leader
Your notes effectively capture the layered nature of interface programming in Python:
- At the lowest layer, Python runtime embraces dynamic duck typing: just try it and fail fast.
- To improve runtime type recognition and interoperability, Python uses ABCs with virtual subclassing (`register`) and `_subclasshook_` (“goose typing”), enabling `isinstance` semantics on structural grounds.
- To further support static analysis tooling, Python offers static protocols that check structure without inheritance, giving formal contracts for type checkers.
- Finally, runtime-checkable protocols bridge these worlds, allowing runtime `isinstance` checks on protocols designed primarily for static typing.
Together, these patterns compose a robust, hybrid approach adaptable to many scales and requirements—**rejecting any will unnecessarily limit your Python design flexibility and safety guarantees**

Chapter 14. Inheritance: For Better or for Worse

Focus areas for this chapter:

The super() function
The pitfalls of subclassing from built-in types
Multiple inheritance and method resolution order
Mixin classes

chapter introduces multiple inheritance for those who have never used it, and provides some guidance on how to cope with single or multiple inheritance if you must use it.

What’s New in This Chapter

The super() Function

example use cases
- when a subclass overrides a method of a superclass
  and we want to let the superclass method do its job then add more logic to it
- when we let the superclasses do their part in init fns
LANG_LIMITATION / IDIOM: unlike java constructor that automatically calls the nullary super constructor, python doesn’t do this so we need to ALWAYS manually write this in.
1 2 3
def __init__(self, a, b) : super().__init__(a, b) ... # more initialization code
- it will work (but not recommended) for us to hardcode the base class and call that base class’s function.
  Also won’t work well with the multiple inheritance stuff

Subclassing Built-In Types Is Tricky

Main takeaway:
Subclassing built-in types like dict or list or str directly is error-prone because the built-in methods mostly ignore user-defined overrides. Instead of subclassing the built-ins, derive your classes from the collections module using UserDict, UserList, and UserString, which are designed to be easily extended.
- it’s a flaw in method delegation within the C Language code of the builtin types (only affects classes derived directly from those types).
Major Caveat: bypassing behaviour
the code of the built-ins (written in C) usually does not call methods overridden by user-defined classes.
this applies for other dunder methods calling the overriden method.
using the overriding method directly is likely to work still.
This built-in behavior is a violation of a basic rule of object-oriented programming: the search for methods should always start from the class of the receiver (self), even when the call happens inside a method implemented in a superclass.
virtual vs nonvirtual methods
virtual: late-bound
non-virtual: bound at compile time
in python, every method is like latebound like a vritual method
builtins written in C seem to be nonvirtual by default (at least in CPython).

Multiple Inheritance and Method Resolution Order

guiding question:
if we do multiple inheritance and both super classes have overlapping method names, how to make reference to the correct super function from the subclass
$\implies$ this is the diamond problem and we wanna see how python solves this
2 factors that determined the activation sequences:
1. MRO of the leaf class
  Goes all the way from current class all the way to the object class
  Defines the activation order
2. use of super() in each method
  Determines whether a particular method will be activated.
  So if the method calls super() then we move to the next class in the MRO order and execute that.
  How?
  It’s not necessarily a BFS, it uses the C3 Algorithm (not important to understand unless need to wrangle complex hierarchies.)
MRO accounts for inheritance graph. Amongst siblings, it’s determined by the subclass declaration.
e.g. Leaf(B, A), Leaf(A, B) are two different subclass declarations.
Cooperative Methods: methods that call super()
Cooperative methods enable cooperative multiple inheritance. These terms are intentional: in order to work, multiple inheritance in Python requires the active cooperation of the methods involved.
GOTCHA: cooperative methods can be a cause of subtle bugs. $\implies$ That’s why it is recommended that every method m of a nonroot class should call super().m().

A noncooperative method can be the cause of subtle bugs.
Manycoders reading Example 14-4 may expect that when method A.pong calls super.pong(), that will ultimately activate Root.pong. But if B.pong is activated before, it drops the ball.
That’s why it is recommended that every method m of a nonroot class should call super().m().

Mixin Classes

definition:
- designed to be sub classed together with at least one other class as part of a multiple inheritance arrangement
- won’t provide all the functionality of a concrete object
- it’s supposed to be functionality mixins $\implies$ customizes the behaviour of child or sibling classes.
  so naturally will have some concrete methods implemented
- are a convention that has no explicit language support in python/cpp
Mixins must appear first in the tuple of base classes in a class declaration
mixins typically depend on sibling classes that implements / inherits methods with the same signature
therefore, they must appear early in the MRO of a subclass that uses it

Case-Insensitive Mappings

See this beautiful example

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
"""
Short demos
===========

``UpperDict`` behaves like a case-insensitive mapping`::

# tag::UPPERDICT_DEMO[]
    >>> d = UpperDict([('a', 'letter A'), (2, 'digit two')])
    >>> list(d.keys())
    ['A', 2]
    >>> d['b'] = 'letter B'
    >>> 'b' in d
    True
    >>> d['a'], d.get('B')
    ('letter A', 'letter B')
    >>> list(d.keys())
    ['A', 2, 'B']

# end::UPPERDICT_DEMO[]

And ``UpperCounter`` is also case-insensitive::

# tag::UPPERCOUNTER_DEMO[]
    >>> c = UpperCounter('BaNanA')
    >>> c.most_common()
    [('A', 3), ('N', 2), ('B', 1)]

# end::UPPERCOUNTER_DEMO[]

Detailed tests
==============

UpperDict uppercases all string keys.

    >>> d = UpperDict([('a', 'letter A'), ('B', 'letter B'), (2, 'digit two')])


Tests for item retrieval using `d[key]` notation::

    >>> d['A']
    'letter A'
    >>> d['b']
    'letter B'
    >>> d[2]
    'digit two'


Tests for missing key::

    >>> d['z']
    Traceback (most recent call last):
      ...
    KeyError: 'Z'
    >>> d[99]
    Traceback (most recent call last):
      ...
    KeyError: 99


Tests for item retrieval using `d.get(key)` notation::

    >>> d.get('a')
    'letter A'
    >>> d.get('B')
    'letter B'
    >>> d.get(2)
    'digit two'
    >>> d.get('z', '(not found)')
    '(not found)'

Tests for the `in` operator::

    >>> ('a' in d, 'B' in d, 'z' in d)
    (True, True, False)

Test for item assignment using lowercase key::

    >>> d['c'] = 'letter C'
    >>> d['C']
    'letter C'

Tests for update using a `dict` or a sequence of pairs::

    >>> d.update({'D': 'letter D', 'e': 'letter E'})
    >>> list(d.keys())
    ['A', 'B', 2, 'C', 'D', 'E']
    >>> d.update([('f', 'letter F'), ('G', 'letter G')])
    >>> list(d.keys())
    ['A', 'B', 2, 'C', 'D', 'E', 'F', 'G']
    >>> d  # doctest:+NORMALIZE_WHITESPACE
    {'A': 'letter A', 'B': 'letter B', 2: 'digit two',
    'C': 'letter C', 'D': 'letter D', 'E': 'letter E',
    'F': 'letter F', 'G': 'letter G'}

UpperCounter uppercases all `str` keys.

Test for initializer: keys are uppercased.

    >>> d = UpperCounter('AbracAdaBrA')
    >>> sorted(d.keys())
    ['A', 'B', 'C', 'D', 'R']

Tests for count retrieval using `d[key]` notation::

    >>> d['a']
    5
    >>> d['z']
    0

"""
# tag::UPPERCASE_MIXIN[]
import collections

def _upper(key):  # <1>
    try:
        return key.upper()
    except AttributeError:
        return key

class UpperCaseMixin:  # <2>
    def __setitem__(self, key, item):
        super().__setitem__(_upper(key), item)

    def __getitem__(self, key):
        return super().__getitem__(_upper(key))

    def get(self, key, default=None):
        return super().get(_upper(key), default)

    def __contains__(self, key):
        return super().__contains__(_upper(key))
# end::UPPERCASE_MIXIN[]

# tag::UPPERDICT[]
class UpperDict(UpperCaseMixin, collections.UserDict):  # <1>
    pass

class UpperCounter(UpperCaseMixin, collections.Counter):  # <2>
    """Specialized 'Counter' that uppercases string keys"""  # <3>
# end::UPPERDICT[]

Multiple Inheritance in the Real World

Usage of multiple inheritance is not the norm in cpp or python really.

There are some known good use cases though.

ABCs Are Mixins Too
- Just like Java, we support multiple inheritance of interfaces
- for those ABCs that have concrete methods, they play 2 roles:
  1. they play the role of an interface definition
  2. they also play the role of mixin classes

ThreadingMixIn and ForkingMixIn

Django Generic Views Mixins

the original generic views were functions so they were not extensible
have to start from scratch if we want to do something similar but not the same

Concrete subclasses of View are supposed to implement the handler methods, so why aren’t those methods part of the View interface? The reason: subclasses are free to implement just the handlers they want to support. A TemplateView is used only to display content, so it only implements get

reminder on why:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
  Great question! What you’re touching on is a common and important aspect of Python’s interface and subclassing philosophy, especially in frameworks like Django (which uses **View** classes).

  ### In Python, do you have to implement *all* methods of an interface/abstract base class?

  **Short answer:**
  **Not necessarily.** Unlike some statically typed languages with strict interface contracts, Python’s interfaces and abstract base classes (ABCs) often allow (or even expect) subclasses to implement only the parts of the interface that make sense for them.

  ### Why?

  1. **Python’s dynamic and flexible nature**
​     - Python favors *duck typing* and optional implementation over strict enforcement.
​     - ABCs in Python can define optional methods, or provide default no-op or generic implementations in base classes.

  2. **Partial method implementation is common in frameworks**
​     - Consider a `View` base class that defines many “handler methods” like `get()`, `post()`, `put()`, etc.
​     - Concrete subclasses (e.g., `TemplateView`, `CreateView`) implement only the handlers relevant to the use case.
​     - This lets you write minimal classes that do exactly what you need, without burden.

  3. **Interface methods can be *optional* or *protocol-like***
​     - Instead of forcing every subclass to implement an entire interface, the base class or the framework’s machinery calls a method *if it exists*.
​     - If the method is not implemented, either a default behavior occurs, or a runtime error only arises if the method is actually called.

  4. **Explicit design: selective handler implementation**
​     - The `TemplateView` subclass only implements the `get()` method because it only responds to HTTP GET requests.
​     - If you called `post()` on a `TemplateView`, it might raise a `NotImplementedError` or return a “method not allowed” response.
​     - This selective implementation is a common design pattern in web frameworks.

  ### Contrast to stricter languages or interfaces

  In languages like Java or C#, interfaces typically require **all** declared methods to be implemented. Python’s philosophy is more permissive:

  - **You can think of many interfaces or ABCs in Python as “soft” contracts.**
​  - **You implement the “part you use.”**
​  - This works well because Python’s runtime dispatch is dynamic: calls to methods that don’t exist will fail only if you exercise them.

  ### Summary

  - In Python, you **do not always have to implement every method** defined by an interface or abstract base class.
​  - Interfaces or base classes often define *optional* or *selective* methods.
​  - Subclasses implement only the methods relevant to their behavior.
​  - This design promotes flexibility and simplicity in code, especially in toolkits like Django where views handle different HTTP methods.
​  - Calling a method that’s not implemented typically leads to a runtime error only if that method is invoked.

  If you want, I can provide a small Python example illustrating this partial implementation pattern (e.g., a base class with several methods, and subclasses implementing only some of them). Would that be helpful?

It does take some time to learn how to leverage class-based views and how to extend them to fulfill specific application needs, but I found that it was worthwhile to study them.
Value that the mixins give for Django:
They eliminate a lot of boilerplate code, make it easier to reuse solutions, and even improve team communication—for example, by defining standard names to templates, and to the variables passed to template contexts. Class-based views are Django views “on rails.”

Multiple Inheritance in Tkinter
- the class heirarchy SHOULD NOT be very deep
  usually it’s around 3 or 4 levels of concrete classes
- GUI toolkits are where inheritance is most useful. The hierarchies can get really deep in them.

⭐️ Coping with Inheritance

These are the rules of thumb that we need to rely on.

We have to do so because there’s no general theory about inheritance that can guide us against creating incomprehensible, brittle designs.

Favor Object Composition over Class Inheritance
- do composition and delegation
  it can even replace the use of mixins and make behaviours available to different classes.
- subclassing is a form of tight coupling and tall inheritance trees tend to be brittle.

Understand Why Inheritance Is Used in Each Case
Reasons FOR using inheritance:
1. creates a subtype, so it’s a is-a relationship best done with ABCs
2. avoids code duplication by reuse, Mixins are useful for this too
The realisation here is that to prevent code reuse, inheritance is only an implementation detail, we can do composition & delegation too. However, interface inheritance is separate matter.

Make Interfaces Explicit with ABCs
1. Multiple inheritance of ABCs is not problematic.
  - An ABC should subclass only abc.ABC or other ABCs.
  - if a class is intended to define an interface, it should be an explicit ABC or a typing.Protocol subclass.

Use Explicit Mixins for Code Reuse
- for reuse by multiple unrelated subclasses, without implying an “is-a” relationship,
- not to be instantatied
- since there’s no formal convention, try to Suffix the mixin name with Mixin

Provide Aggregate Classes to Users
- A class that is constructed primarily by inheriting from mixins and does not add its own structure or behavior is called an aggregate class.
- group together combinations of ABCs or mixins
  we can now just use the aggregate class without having to figure out in which order they should be declared to work as intended.
- typically just has an empty body (with docstring / pass)

Subclass Only Classes Designed for Subclassing
- some superclass methods may ignore the subclass overrides in unexpected ways.
  $\implies$ we should subclass only those that are intended to be extended.
- how to check?
  - see the docs, if it’s a base class named, that hints at it
  - the docs will also indicate which of the methods are intended to be overridden.
  - see if the @final decorator exists on the method (then it’s not intended for extension by overriding that method)

Avoid Subclassing from Concrete Classes
- if you do this, any internal state within a concrete class might get corrupted
  even if we coorperate by calling super(), there’s still many ways bugs can be introduced
- If you must use subclassing for code reuse, then the code intended for reuse should be in mixin methods of ABCs or in explicitly named mixin classes.

Tkinter: The Good, the Bad, and the Ugly

Chapter Summary

Chapter 15. More About Type Hints

This is a new chapter in this edition of the book.

What’s New in This Chapter

Overloaded Signatures

it’s the signatures that we are overloading, not the function.
remember that python doesn’t allow function overloading!

implementation:

the actual function will ned no type hints, because the overloads will take care of it

can be implemented within the same module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
        import functools
        import operator
        from collections.abc import Iterable
        from typing import overload, Union, TypeVar

        T = TypeVar('T')
        S = TypeVar('S')  # <1> for the second overload

        @overload
        def sum(it: Iterable[T]) -> Union[T, int]: ...  # <2>
        @overload
        def sum(it: Iterable[T], /, start: S) -> Union[T, S]: ...  # <3>
        def sum(it, /, start=0):  # <4>
            return functools.reduce(operator.add, it, start)

Max Overload
- pythonic apis are hard to annotate. this is because they strongly leverage the powerful dynamic features of python
  this section demonstrates what it takes to annotate the max function.

Takeaways from Overloading max
- the expressiveness of annotation markings is very limited, compared to that of python

TypedDict

gotcha: remember for json objs we’ll need to do runtime checking. the pydantic package is great for this.
Static type checking is unable to prevent errors with code that is inherently dynamic, such as json.loads()
objective: we want to be able to define the structure of a container type (heterogeneous)
we should be able to provide a type specific to a key
TypedDict have no runtime effect, only for static analysis
Gives:
1. Class-like syntax to annotate a dict with type hints for the value of each “field.”
2. A constructor that tells the type checker to expect a dict with the keys and values as specified.
  1 2 3 4 5 6 7
  from typing import TypedDict class BookDict(TypedDict): isbn: str title: str authors: list[str] pagecount: int
  looks very similar to a dataclass builder like a typing.NamedTuple but it isn’t.
@ runtime, the constructor just ends up creating a plain dict. No instance attributes, no init functions for the class, no method definitions.
none of the types will be enforced, “illegal” assignments can happen

Type Casting

type casting is for type checkers to get assisted by us

typing.cast() special function provides one way to handle type checking malfunctions or incorrect type hints in code we can’t fix.

Casts are used to silence spurious type checker warnings and give the type checker a little help when it can’t quite understand what is going on.

Does absolutely nothing @ runtime

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  # tag::CAST[]
  from typing import cast

  def find_first_str(a: list[object]) -> str:
      index = next(i for i, x in enumerate(a) if isinstance(x, str))
      # We only get here if there's at least one string
      return cast(str, a[index])
  # end::CAST[]


  from typing import TYPE_CHECKING

  l1 = [10, 20, 'thirty', 40]
  if TYPE_CHECKING:
      reveal_type(l1)

  print(find_first_str(l1))

  l2 = [0, ()]
  try:
      find_first_str(l2)
  except StopIteration as e:
      print(repr(e))

too many uses of cast is likely a code-smell; Mypy is not that useless!
why casts still have some purpose:
1. the other workarounds are worse:
  - # type: ignore is less informative
  - Any is contagious, it will have cascading effects through type inference and undermine the type checker’s ability to detect errors in other parts of the code

Reading Type Hints at Runtime

within the __annotations__ attribute, it’s a dict that has the names and their types
the return type has the key "return"
annotations are evaluated by the interpreter at import time, just like param default values

Problems with Annotations at Runtime
- extra CPU and memory load when importing
- types not yet defined are strings instead of actual types $\implies$ the forward-reference-problem
- we can use introspection helpers for this
  e.g. inspect.get_type_hints
  this is the recommended way to read type hints at runtime

Dealing with the Problem
- just keep an eye out on how to handle this, it’s likely to change from 3.10 onwards

Implementing a Generic Class

have to concretise the generic type by giving a type parameter: machine = LottoBlower[int](range(1, 11))

here’s a generic LottoBlower:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
  import random

  from collections.abc import Iterable
  from typing import TypeVar, Generic

  from tombola import Tombola

  T = TypeVar('T')

  class LottoBlower(Tombola, Generic[T]):  # <1> have to subclass Generic to declare the formal type params

      def __init__(self, items: Iterable[T]) -> None:  # <2>
          self._balls = list[T](items)

      def load(self, items: Iterable[T]) -> None:  # <3>
          self._balls.extend(items)

      def pick(self) -> T:  # <4>
          try:
              position = random.randrange(len(self._balls))
          except ValueError:
              raise LookupError('pick from empty LottoBlower')
          return self._balls.pop(position)

      def loaded(self) -> bool:  # <5>
          return bool(self._balls)

      def inspect(self) -> tuple[T, ...]:  # <6>
          return tuple(self._balls)

Basic Jargon for Generic Types
- Generic type: type with 1 or more type vars
- Formal Type Parameter: the generic type var used to define a generic type
- Parameterized type: type declared with actual type parameters (resolved)
- Actual type param: the actual types given as params when a param type is declared

Variance

useful to know if we want to support generic container types or provide callback-based APIs.
Practically speaking, most cases supported if we just support the invariant containers

the following sections use a concrete analogy to drive the point:

Imagine that a school cafeteria has a rule that only juice dispensers can be installed. General beverage dispensers are not allowed because they may serve sodas, which are banned by the school board.

code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
  from typing import TypeVar, Generic


  class Beverage:
      """Any beverage."""


  class Juice(Beverage):
      """Any fruit juice."""


  class OrangeJuice(Juice):
      """Delicious juice from Brazilian oranges."""


  T_co = TypeVar('T_co', covariant=True)


  class BeverageDispenser(Generic[T_co]):
      def __init__(self, beverage: T_co) -> None:
          self.beverage = beverage

      def dispense(self) -> T_co:
          return self.beverage


  class Garbage:
      """Any garbage."""


  class Biodegradable(Garbage):
      """Biodegradable garbage."""


  class Compostable(Biodegradable):
      """Compostable garbage."""


  T_contra = TypeVar('T_contra', contravariant=True)


  class TrashCan(Generic[T_contra]):
      def put(self, trash: T_contra) -> None:
          """Store trash until dumped."""


  class Cafeteria:
      def __init__(
          self,
          dispenser: BeverageDispenser[Juice],
          trash_can: TrashCan[Biodegradable],
      ):
          """Initialize..."""


  ################################################ exact types

  juice_dispenser = BeverageDispenser(Juice())
  bio_can: TrashCan[Biodegradable] = TrashCan()

  arnold_hall = Cafeteria(juice_dispenser, bio_can)


  ################################################ covariant dispenser

  orange_juice_dispenser = BeverageDispenser(OrangeJuice())

  arnold_hall = Cafeteria(orange_juice_dispenser, bio_can)


  ################################################ non-covariant dispenser

  beverage_dispenser = BeverageDispenser(Beverage())

  ## Argument 1 to "Cafeteria" has
  ## incompatible type "BeverageDispenser[Beverage]"
  ##          expected "BeverageDispenser[Juice]"
  # arnold_hall = Cafeteria(beverage_dispenser, bio_can)


  ################################################ contravariant trash

  trash_can: TrashCan[Garbage] = TrashCan()

  arnold_hall = Cafeteria(juice_dispenser, trash_can)


  ################################################ non-contravariant trash

  compost_can: TrashCan[Compostable] = TrashCan()

  ## Argument 2 to "Cafeteria" has
  ## incompatible type "TrashCan[Compostable]"
  ##          expected "TrashCan[Biodegradable]"
  # arnold_hall = Cafeteria(juice_dispenser, compost_can)

An Invariant Dispenser

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# tag::BEVERAGE_TYPES[]
from typing import TypeVar, Generic

class Beverage:  # <1> we establish a type hierarchy
    """Any beverage."""

class Juice(Beverage):
    """Any fruit juice."""

class OrangeJuice(Juice):
    """Delicious juice from Brazilian oranges."""

T = TypeVar('T')  # <2> simple typevar

class BeverageDispenser(Generic[T]):  # <3> Parameterised on the beverage type
    """A dispenser parameterized on the beverage type."""
    def __init__(self, beverage: T) -> None:
        self.beverage = beverage

    def dispense(self) -> T:
        return self.beverage

def install(dispenser: BeverageDispenser[Juice]) -> None:  # <4> module-global function
    """Install a fruit juice dispenser."""
# end::BEVERAGE_TYPES[]

################################################ exact type

# tag::INSTALL_JUICE_DISPENSER[]
juice_dispenser = BeverageDispenser(Juice())
install(juice_dispenser)
# end::INSTALL_JUICE_DISPENSER[]


################################################ variant dispenser

# tag::INSTALL_BEVERAGE_DISPENSER[]
beverage_dispenser = BeverageDispenser(Beverage())
install(beverage_dispenser)
## mypy: Argument 1 to "install" has
## incompatible type "BeverageDispenser[Beverage]"
##          expected "BeverageDispenser[Juice]"
# end::INSTALL_BEVERAGE_DISPENSER[]


################################################ variant dispenser

# tag::INSTALL_ORANGE_JUICE_DISPENSER[]
orange_juice_dispenser = BeverageDispenser(OrangeJuice())
install(orange_juice_dispenser)
## mypy: Argument 1 to "install" has
## incompatible type "BeverageDispenser[OrangeJuice]"
##          expected "BeverageDispenser[Juice]"
# end::INSTALL_ORANGE_JUICE_DISPENSER[]

BeverageDispenser(Generic[T]) is invariant when BeverageDispenser[OrangeJuice] is not compatible with BeverageDispenser[Juice] — despite the fact that OrangeJuice is a subtype-of Juice.
It depends on how we have defined the typevar
In this case, the function was defined with an actual type var: def install(dispenser: BeverageDispenser[Juice]) -> None:

A Covariant Dispenser

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from typing import TypeVar, Generic


class Beverage:
    """Any beverage."""


class Juice(Beverage):
    """Any fruit juice."""


class OrangeJuice(Juice):
    """Delicious juice from Brazilian oranges."""


# tag::BEVERAGE_TYPES[]
T_co = TypeVar('T_co', covariant=True)  # <1> convention to suffix it like that.


class BeverageDispenser(Generic[T_co]):  # <2> we use the typevar as the param for the generic class
    def __init__(self, beverage: T_co) -> None:
        self.beverage = beverage

    def dispense(self) -> T_co:
        return self.beverage

def install(dispenser: BeverageDispenser[Juice]) -> None:  # <3>
    """Install a fruit juice dispenser."""
# end::BEVERAGE_TYPES[]

################################################ covariant dispenser

# tag::INSTALL_JUICE_DISPENSERS[]

# both Juice and OrangeJuice aer valid in a covariant BeverageDispenser:
juice_dispenser = BeverageDispenser(Juice())
install(juice_dispenser)

orange_juice_dispenser = BeverageDispenser(OrangeJuice())
install(orange_juice_dispenser)
# end::INSTALL_JUICE_DISPENSERS[]

################################################ more general dispenser not acceptable

# tag::INSTALL_BEVERAGE_DISPENSER[]
beverage_dispenser = BeverageDispenser(Beverage())
install(beverage_dispenser)
## mypy: Argument 1 to "install" has
## incompatible type "BeverageDispenser[Beverage]"
##          expected "BeverageDispenser[Juice]"
# end::INSTALL_BEVERAGE_DISPENSER[]

covariance: the subtype relationship of the parameterized dispensers varies in the same direction as the subtype relationship of the type parameters.
- two type of types: A: type vars B: dispenser type vars
  The question is whether the we allow the variance in the same direction (co-variant).
Supports Generic type and ALSO its subtypes
Implementation notes:
- by convention, the typevar should be suffixed with _co
- just need to set covariant=True when we declare the typevar

A Contravariant Trash Can

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# tag::TRASH_TYPES[]
from typing import TypeVar, Generic

class Refuse:  # <1>
    """Any refuse."""

class Biodegradable(Refuse):
    """Biodegradable refuse."""

class Compostable(Biodegradable):
    """Compostable refuse."""

T_contra = TypeVar('T_contra', contravariant=True)  # <2>

class TrashCan(Generic[T_contra]):  # <3>
    def put(self, refuse: T_contra) -> None:
        """Store trash until dumped."""

def deploy(trash_can: TrashCan[Biodegradable]):
    """Deploy a trash can for biodegradable refuse."""
# end::TRASH_TYPES[]


################################################ contravariant trash can


# tag::DEPLOY_TRASH_CANS[]
bio_can: TrashCan[Biodegradable] = TrashCan()
deploy(bio_can)

trash_can: TrashCan[Refuse] = TrashCan()
deploy(trash_can)
# end::DEPLOY_TRASH_CANS[]


################################################ more specific trash can

# tag::DEPLOY_NOT_VALID[]
compost_can: TrashCan[Compostable] = TrashCan()
deploy(compost_can)
## mypy: Argument 1 to "deploy" has
## incompatible type "TrashCan[Compostable]"
##          expected "TrashCan[Biodegradable]"
# end::DEPLOY_NOT_VALID[]

implementation:
- use _contra suffix for the contravariant typevar
in our example, TrashCan is contravariant on the type of refuse.

Variance Review

Invariant Types
- if L (generic type) is invariant, then L[A] (parameterised type) is not a supertype or a subtype of L[B]
  This is regardless of the relationships between A and B (the actual types)
- examples: mutable collections in python are invariant so list[int] is not consistent-with list[float and vice-versa
- if a formal type param (T) appears in the type hints of the method args AS WELL AS the return types, then the parameter must be invariant
  this ensures type safety
- by default, TypeVar creates invariant types

Covariant Types
- nomenclature:
  - X :> Y: means that X is supertype of OR same as Y and vice versa
- Covariant generic types follow the subtype relationship of the actual type parameters.
  if A :> B (type B is a subclass of type A) and a we consider type C (generic type). Iff C is contravariant then C[A] :> C[B].
  Here, A and B are the actual type params.
- examples:
  - Frozen set
    float :> int and frozenset[float] :> frozenset[int]
    SAME DIRECTION
  - Iterators
    Any code expecting an abc.Iterator[float] yielding floats can safely use an abc.Iterator[int] yielding integers.
- Callable types are covariant on the return type
  this is so that the subclass will also work

Contravariant Types
- A :> B, a generic type K is contravariant if K[A] <: K[B]
  reverses the subtype relationship of the actual type parameters (opposite direction)
- A contravariant container is usually a write-only data structure, aka a sink
  Callable[[ParamType, …], ReturnType] is contravariant on the parameter types, but covariant on the ReturnType
- examples:
  1. Refuse :> Biodegradable so TrashCan[Refuse] <: TrashCan[Biodegradable]
  2. for callbacks,
    - contravariant formal parameter defines the type of args used to invoke / send data to this object $\implies$ the callback object is therefore the sink
    - covariant formal params define the types of outputs produced by the object (yield type or return type) $\implies$ callback object acts as a source the gives outputs.

Rules of Thumb

producing:
If a formal type parameter defines a type for data that comes out of the object, it can be covariant. So we can produce more specific (narrower) results than strictly required.
consuming:
If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant. So we can accept/consume more specific (narrower) results than strictly required.
consuming == producing $\implies$ invariant
If a formal type parameter defines a type for data that comes out of the object and the same parameter defines a type for data that goes into the object, it must be invariant.
To err on the safe side, make formal type parameters invariant.

Callbacks are special because they accept inputs and return output

Python’s typing system with Callable is directly designed to reflect this.

See this elaboration:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
You're absolutely on the right track! **Callbacks** (like callables or function-type objects) in Python's typing system **are indeed special** because **they play both roles:**
- They **accept inputs** (they are a "sink" for arguments), and
- They **return outputs** (they are a "source" for results)

This duality is exactly why variance in callables is *split*:
- **Parameter types are contravariant** (inputs; the callback can accept _more general_ arguments than strictly required)
- **Return types are covariant** (outputs; the callback can produce _more specific_ results than strictly required)

Let me show this with an example.

---
### Example: Callbacks, Variance, and Typing

Imagine you define this type:

```python
from typing import Callable

# A callback that takes a Biodegradable and returns a Juice
CallbackType = Callable[[Biodegradable], Juice]
```

This means:
- You need something that can accept a `Biodegradable` (or *any* base/parent of Biodegradable, i.e., `Refuse`)
- And it must return a `Juice` (or any *subtype*, e.g., `OrangeJuice`)

#### Function Matching the Signature

```python
def process_refuse(refuse: Refuse) -> OrangeJuice:
    """Accepts any refuse, returns orange juice (a subtype of Juice)."""
    return OrangeJuice()
```

This function:
- Accepts a `Refuse` (broader than just `Biodegradable`) ⇒ **contravariant**
- Returns an `OrangeJuice` (narrower than `Juice`) ⇒ **covariant**

**We can use process_refuse wherever CallbackType is required, thanks to variance:**

```python
def handle_event(cb: CallbackType) -> None:
    refuse = Biodegradable()
    juice = cb(refuse)
    print(f"{juice!r} dispensed.")

handle_event(process_refuse)  # This typechecks!
```

---
### Why?

- **Contravariant parameter**:
  A function that can handle any `Refuse` is good enough anywhere that needs just `Biodegradable` — it won't ever receive a non-biodegradable!
- **Covariant return**:
  If someone expects a `Juice`, it's fine to give them something more specific (like `OrangeJuice`).

---
## Summary Table

| Callback signature                      | Description               | Variance      |
|----------------------------------------- |---------------------------|-------------- |
| `Callable[[ParamType], ReturnType]`      | callback taking input(s), returning output | Parameter: **Contravariant**Return: **Covariant** |

- Think of input parameters as arguments being "sent in" (*sink*, contravariant)
- Think of return values as outputs "sent out" (*source*, covariant)

---
## Intuitive Analogy

- **Input arguments**: You want callbacks that can accept *at least* what you have (so a broader parameter is better).
- **Output/Return**: You want callbacks to promise to return *at least* what you asked for (so a narrower/specific return type is better).

---
### Expanded Example

#### Class hierarchy

```python
class Refuse: pass
class Biodegradable(Refuse): pass
class Compostable(Biodegradable): pass

class Beverage: pass
class Juice(Beverage): pass
class OrangeJuice(Juice): pass
```

#### Callbacks

```python
from typing import Callable

# Expected callback: takes a Biodegradable, returns a Juice
CallbackType = Callable[[Biodegradable], Juice]

def cb1(arg: Refuse) -> OrangeJuice:
    ... # Accepts any Refuse, returns a specific Juice

def cb2(arg: Compostable) -> Beverage:
    ... # Accepts only Compostable, returns a general Beverage

def cb3(arg: Biodegradable) -> Juice:
    ... # Accepts Biodegradable, returns Juice

f: CallbackType

f = cb1  # OK! Wider input, narrower output
f = cb2  # mypy: error (input too narrow, output too broad)
f = cb3  # OK! exact types
```

---
## Conclusion
So your summary is **correct**:
- **Callbacks _are both_ a sink (inputs) and a source (outputs)**.
    - **Input arguments:** Contravariant (wider is OK)
    - **Return type:** Covariant (narrower is OK)

**Python’s typing system with Callable is directly designed to reflect this.**

Implementing a Generic Static Protocol

There’s some examples here for reference, nothing special though

Chapter Summary

We have covered:

type @overload, including for the max function
typing.TypedDict, which is not a class builder,
this is useful for defining the type of a dict (keys and values) when a dict is used as a record, often with the handling of JSON data
can give a false sense of security though, since it has no runtime effect
typing.cast as a way to handle some issues with type checkers. If overdone, it’s a code smell.
Runtime type hint access including name-forwarding approaches
GENERICS!!
back to typed world
Generic Static Protocol
- allows us to be specific in the original protocol form

Chapter 16. Operator Overloading

There’s a value in allowing infix operators to handle any arbitrary type (not just primitive types):
- readable code that allows the non-primitive types to help with exactness of operations
This is why operator overloading is important.
Objectives:
1. how to overload properly
2. How an infix operator method should signal it cannot handle an operand
3. Using duck typing or goose typing to deal with operands of various types
4. The special behaviour of the rich comparison operators (e.g., =, >, <, etc.)
5. The default handling of augmented assignment operators such as +=, and how to overload them

What’s New in This Chapter

Operator Overloading 101

objective: interoperability of unary/infix/other operators with user defined objects
other operators includes (), ., [] in python
LANG_LIMITATIONS: Python Limitations on operator overloading (to protect us):
1. can’t change the meaning of the operators for built-in types
2. can’t create new operators, only can overload existing ones
3. some operators can’t be overloaded: is, and, or, not
  the bitwise versions can be overloaded though (so $, |, ~)

Unary Operators

random notes on these:
- usually x = +x= but not in some cases
- bitwise NOT is also ~x = -(x + 1)= if x is 2, then ~x = -3=
easy to implement the appropriate unary function, just make the function pure and immutable
if the receiver itself is immutable, then we can just return self.
when is x and +x not equal?
- e.g. when precision matters. E.g. when using Decimal you can set x based on a particular arithmetic precision, then change the precision and compute x=+x and because the precisions will be different we will get back a False
- e.g. when using collections.Counter
  TRICK: Unary + produces a new Counter without zeroed or negative tallies. So we can use it to copy (and remove the negatives / zeros).

Overloading + for Vector Addition

typically, sequences should support the + operator for concatenation and * for repetition.
when we have operands of diff types, we try to look for add or r_add and take a best-effort approach:
support operations involving objects of different types, Python implements a special dispatching mechanism for the infix operator special methods:
1. If a has __add__, call a.__add__(b) and return result unless it’s NotImplemented.
2. If a doesn’t have __add__, or calling it returns NotImplemented, check if b has __radd__, then call b.__radd__(a) and return result unless it’s NotImplemented.
3. If b doesn’t have __radd__, or calling it returns NotImplemented, raise TypeError with an unsupported operand types message.
GOTCHA: NotImplemented is a singleton, not the same as NotImpelmentedError
Do not confuse NotImplemented with NotImplementedError. The first, NotImplemented, is a special singleton value that an infix operator special method should return to tell the interpreter it cannot handle a given operand. In contrast, NotImplementedError is an exception that stub methods in abstract classes may raise to warn that subclasses must implement them.
note that if there’s any error, an overloaded operator function should return NotImplemented instead of other errors like TypeError.
this is so that the dispatch mechanism is not aborted prematurely

Overloading * for Scalar Multiplication

Using @ as an Infix Operator

it’s been used for matrix multiplication, has both reflected version and an in-place version
this is a useful goose typing example as well, both the ABCs implement the __subclasshook__ methods so we don’t need explicit subclassing / registration

Wrapping-Up Arithmetic Operators

Rich Comparison Operators

differs from the arithmetic operators in these ways:
1. same set of methods is used in forward and reverse operator calls (with the arguments changed as expected)
2. for != and ==, if NotImplemented then fallback to id() checks.

Augmented Assignment Operators

for immutable objects, the augment assignment operators are just syntactic sugar for the expanded version, that’s why they return new objects
for mutable objects, depends on whether we implemented the dunder methods or not
Very important: augmented assignment special methods of mutable objects must return self. That’s what users expect.
IDIOM: In general, if a forward infix operator method (e.g., mul) is

designed to work only with operands of the same type as self, it’s useless to implement the corresponding reverse method (e.g.,=_rmul_=) because that, by definition, will only be invoked when dealing with an operand of a different type.

Chapter Summary

when handling mixed operands, we have 2 choices:
- use duck typing:
  this is useful and flexible but the error messages may be less useful or even misleading
- use goose typing:
  this is useful as a compromise between flexibility and safety beacuse existing / future user-defined types can be declared as actual or virtual subclasses of an ABC
  Also if ABC implements the __subclasshook__ then it’s even more convenient because no need explicit subclassing or registration.
the in place operator is usually more flexible than its infix operator in terms of type strictness.

Part IV. Control Flow

Chapter 17. Iterators, Generators, and Classic Coroutines

iterator design pattern is builtin to python.
Every standard collection in Python is iterable. An iterable is an object that provides an iterator,

What’s New in This Chapter

A Sequence of Words

Why Sequences Are Iterable: The iter Function

the dispatch flow is like so:
1. need to iterate on x $\implies$ calls iter(x) builtin
2. try __iter__ implementation
3. elif try __getitem__, if present then fetch items by index, start from 0-index
4. fail and raise TypeError
all Python sequences are iterable:
by definition, they all implement __getitem__ (especially for backward compatibility).
std sequences also implement __iter__ and custom ones should also have this
this is an extreme form of duck typing:
an object is considered iterable not only when it implements the special method __iter__, but also when it implements __getitem__
goose typing approach, it’s just checking the existence of __iter__ method. No registration needed because abc.Iterable impelements the __subclasshook__
the ducktyped approach to typechecking for iterable is better than the goose-typing approach

Using iter with a Callable
- when used with a callable, second arg is a sentinel value for detecting the stop iteration.
  sentinel value will never really be yielded because that’s the sentinel.
- iterators may get exhausted.
- the callable given to iter() MUST NOT require arguments. If necessary, remember to convert it to a partial function (where the arguments are pre-binded) so that it’s effectively a nullary function.

Iterables Versus Iterators

python obtains iterators from iterables
any obj for which the iter() builtin can get an iterator is an iterable
- either gets it from __iter__ or indirectly from __getitem__
an iterator raises a StopIteration when there are no further items. there’s no way to check for empty other than this, and there’s no way to reset an iterator other than to create it again.
__issubclasshook__ implementation within Iterator:
1 2 3 4 5
@classmethod def __subclasshook__(cls, C): if cls is Iterator: return _check_methods(C, '__iter__', '__next__') return NotImplemented
the _check_methods is provided by the abc module
it traverses the MRO for the class and checks if methods implemented or not
MISCONCEPTION: virtual subclassing doesn’t ONLY need to be explicitly registered. The use of __issubclasshook__ that relies on __checkmethods__ is an example of implicit virtual subclassing
easiest way to typecheck for iterator is to do goosetypecheck: isinstance(x, abc,Iterator)

Sentence Classes with iter

iterators are supposed to implement both __next__ and __iter__. the iter dunder method is so that they work well in places that expect and iterable.

Sentence Take #2: A Classic Iterator
- this is just a didatic example, uses a custom class that keeps track of a cursor for the next idx to present value from and if out of bounds, marks as stop iter.

Don’t Make the Iterable an Iterator for Itself
- iterators are also iterable (because they have the __iter__ method that returns self) but iterables are NOT iterators (they can create iterators)
- common source of error is to confuse the two.
  common antipattern:
  to implement __next__ for an iterable so that an iterable is also an iterator over itself.
- so a proper implementation of the pattern requires each call to iter(my_iterable) to create a new, independent, iterator.

Sentence Take #3: A Generator Function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
"""
Sentence: iterate over words using a generator function
"""

# tag::SENTENCE_GEN[]
import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for word in self.words:  # <1>
            yield word  # <2>
        # <3>

# done! <4>

# end::SENTENCE_GEN[]

pythonic way is to use a generator instead of a custom class that acts as the iterator
here, __iter__ is a generator function
a generator function doesn’t raise StopIteration, it just exits when it gets exhausted

How a Generator Works
- a generator function is a generator factory
  it is a function, when called, returns a generator object
  generator function generates generator objects
  generator function and generator objects are not the same
- not necessary to have just a single yield (typically within a loop construct), we can have as many yield s as we like in our generator function
- on each next() applied to the generator object, we’ll just end up continuing the control flow until the next yield statement
- the fallthrough at the end of a generator function is for the generator object to raise StopIteration
  the consumer of the generator object may handle things cleanly
  When the generator function runs to the end, the generator object raises StopIteration. The for loop machinery catches that exception, and the loop terminates cleanly.
- Language:
  - functions “return” values, generators “yield” values
    generator functions return generator objects

Lazy Sentences

Sentence Take #4: Lazy Generator

we know that the findall method for the regex was being eager so we use the lazy version: re.finditer. This returns a generator yielding re.MatchObject instances on demand $\implies$ it’s not lazy.

finditer builds an iterator over the matches of RE_WORD on self.text, yielding MatchObject instances.

code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  """
  Sentence: iterate over words using a generator function
  """

  # tag::SENTENCE_GEN2[]
  import re
  import reprlib

  RE_WORD = re.compile(r'\w+')


  class Sentence:

      def __init__(self, text):
          self.text = text  # <1>

      def __repr__(self):
          return f'Sentence({reprlib.repr(self.text)})'

      def __iter__(self):
          for match in RE_WORD.finditer(self.text):  # <2>
              yield match.group()  # <3>

  # end::SENTENCE_GEN2[]

Sentence Take #5: Lazy Generator Expression

intent here is to replace generator functions with generator expressions. should be seen as syntactic sugar.
we can write generator expressions using generator objects that do not directly consume the generator objects, thereby preserving the lazy behaviour

code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
  """
  Sentence: iterate over words using a generator expression
  """

  # tag::SENTENCE_GENEXP[]
  import re
  import reprlib

  RE_WORD = re.compile(r'\w+')


  class Sentence:

      def __init__(self, text):
          self.text = text

      def __repr__(self):
          return f'Sentence({reprlib.repr(self.text)})'

      def __iter__(self):
          return (match.group() for match in RE_WORD.finditer(self.text))
  # end::SENTENCE_GENEXP[]


  def main():
      import sys
      import warnings
      try:
          filename = sys.argv[1]
          word_number = int(sys.argv[2])
      except (IndexError, ValueError):
          print(f'Usage: {sys.argv[0]} <file-name> <word-number>')
          sys.exit(2)  # command line usage error
      with open(filename, 'rt', encoding='utf-8') as text_file:
          s = Sentence(text_file.read())
      for n, word in enumerate(s, 1):
          if n == word_number:
              print(word)
              break
      else:
          warnings.warn(f'last word is #{n}, {word!r}')

  if __name__ == '__main__':
      main()

the __iter__ method here is no longer a generator function (since it has no yield), it uses a generator expression to build a generator object and returns it
same outcome though, both cases return a generator object

When to Use Generator Expressions

should be seen as a syntactic shortcut to create a generator without defining and calling a function.
syntax stuff:
- if we’re passing in a genexpr as the only argument to a function, we can omit the surrounding () and it will work. This doesn’t work if there’s more than one argument that we’re supplying though.
compared with generator functions:
- generator functions can be seen as coroutines even, supports complex logic with multiple statements
- should use generator functions when the genexpr looks too complex.

Contrasting Iterators and Generators
- iterators:
  - anything implementing __next__ method
  - produce data for client code consumption:
    - consumed via drivers such as for loops
    - consumed via the explicit calling of next(it)
  - practicall, most iterators in python are Generators.
- Generators
  - an iterator that the python compiler builds
  - ways to create a generator:
    1. implement a generator function, with a yield keyword. this is a factory of generator objects
    2. use a generator expression to build a generator object
  - it’s the generator objects that provide __next__ so that they are iterators. The generators (generator functions) don’t need to implement __next__
  - we can have async generators

An Arithmetic Progression Generator

TRICK: we can see the range function as a built in that generates a bounded arithmetic progression of integers!

TRICK: python 3 doesn’t have an explicit type coersion method, but we can work around this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  def __iter__(self):
          result_type = type(self.begin + self.step)
          # NOTE: by keeping the target result type, we can then coerce it like so:
          result = result_type(self.begin)
          forever = self.end is None
          index = 0

          while forever or result < self.end:
                  yield result
                  index += 1
                  result = self.begin + self.step * index

if the whole point of a class is to build a generator by implementing __iter__, we can replace the class with a generator function. A generator function is, after all, a generator factory.

code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  """
  Arithmetic progression generator function::

      >>> ap = aritprog_gen(1, .5, 3)
      >>> list(ap)
      [1.0, 1.5, 2.0, 2.5]
      >>> ap = aritprog_gen(0, 1/3, 1)
      >>> list(ap)
      [0.0, 0.3333333333333333, 0.6666666666666666]
      >>> from fractions import Fraction
      >>> ap = aritprog_gen(0, Fraction(1, 3), 1)
      >>> list(ap)
      [Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]
      >>> from decimal import Decimal
      >>> ap = aritprog_gen(0, Decimal('.1'), .3)
      >>> list(ap)
      [Decimal('0'), Decimal('0.1'), Decimal('0.2')]

  """


  # tag::ARITPROG_GENFUNC[]
  def aritprog_gen(begin, step, end=None):
      result = type(begin + step)(begin)
      forever = end is None
      index = 0
      while forever or result < end:
          yield result
          index += 1
          result = begin + step * index
  # end::ARITPROG_GENFUNC[]

Arithmetic Progression with itertools

ready to use generators in itertools, which we can combine
some useful ones:
1. itertools.count is infinite generator, accepts a start and a step
2. itertools.takewhile function: it returns a generator that consumes another generator and stops when a given predicate evaluates to False
  example: gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))

code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  # tag::ARITPROG_ITERTOOLS[]
  import itertools

  def aritprog_gen(begin, step, end=None):
      first = type(begin + step)(begin)
      ap_gen = itertools.count(first, step)
      if end is None:
          return ap_gen
      return itertools.takewhile(lambda n: n < end, ap_gen)
  # end::ARITPROG_ITERTOOLS[]

NOTE: aritprog_gen is not a generator function because it has no yield in its body, it still returns a generator though, like a generator function does.

1
None

when implementing generators, know what is available in the standard library, otherwise there’s a good chance you’ll reinvent the wheel.

Generator Functions in the Standard Library

This section focuses on general-purpose functions that take arbitrary iterables as arguments and return generators that yield selected, computed, or rearranged items.

Some groups of standard generators:

Filtering generator functions:
they yield a subset of items produced by the input iterable, without changing the items themselves.

Mapping generators: (in the context of map functions, not map objects)

they yield items computed from each individual item in the input iterable—or iterables,

starmap is cool. it does an unpacking from an iterator that yields tuples e.g. this gives us a running average: list(itertools.starmap(lambda a, b: b / a, enumerate(itertools.accumulate(sample), 1)))

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
     The **`itertools.starmap` function** in Python is a specialized iterator that applies a given function to elements from an iterable, where each element is itself an iterable (typically a tuple) and is unpacked as arguments to the function.

     ### Detailed Explanation

     - `starmap(function, iterable)` is similar to the built-in `map()` function, but while `map()` passes each element as a whole argument to the function, **`starmap` unpacks each element (tuple) as positional arguments to the function**.
​     - This means if you have an iterable of tuples like `[(a1, b1), (a2, b2), ...]`, `starmap` calls `function(a1, b1)`, `function(a2, b2)`, etc.
​     - It returns an iterator that yields the results of these function calls lazily, i.e., items are computed only as you iterate.

     ### Mental Model

     - Think of `starmap` as a clever iterator transform that "unzips" or unpacks arguments for you when you have multiple parameters.
​     - It is particularly useful when working with functions or operations that expect multiple positional arguments but you have your data already grouped as tuples in your iterable.
​     - This differs from `map`, which treats each element as a single argument.

     ### Prototype of `starmap` behavior (simplified)

     ```python
     def starmap(function, iterable):
         for args in iterable:
             yield function(*args)  # unpack each tuple as arguments
     ```

     ### Practical Example

     ```python
     from itertools import starmap

     def multiply(x, y):
         return x * y

     pairs = [(2, 3), (4, 5), (6, 7)]
     result = starmap(multiply, pairs)

     print(list(result))  # Output: [6, 20, 42]
     ```

     In this example, `multiply` requires two arguments; `pairs` contains tuples with two elements each, and `starmap` calls `multiply` with unpacked tuple elements.

     ### Comparison with `map`

     - Using `map(pow, [(2, 3), (4, 5)])` would fail because `pow` expects two separate arguments, but `map` passes the entire tuple as one argument.
​     - Using `starmap(pow, [(2, 3), (4, 5)])` works because it unpacks the tuples as separate arguments automatically.

     ### Use Cases Beyond Simple Functions

     - It is often leveraged to apply functions like `pow`, arithmetic operators, or user-defined functions that take multiple arguments.
​     - Useful in multiprocessing scenarios (e.g., `multiprocessing.Pool.starmap`) for applying functions with multiple inputs concurrently.

     ### Summary Table

     | Aspect                    | Description                                                         |
     |---------------------------|---------------------------------------------------------------------|
     | Function signature        | `itertools.starmap(function, iterable_of_arg_tuples)`              |
     | Functional behavior       | Applies function as `function(*args)` for each tuple in iterable    |
     | Returns                   | An iterator yielding results lazily                                |
     | Difference from `map`     | `map` passes each element as-is; `starmap` unpacks tuple arguments  |
     | Use case                  | Applying multi-argument functions over an iterable of argument tuples|

     ### References

     - Python official docs for itertools: `starmap` applies a function to unpacked arguments from tuples in an iterable.
​     - Tutorialspoint, GeeksforGeeks, and Educative.io provide practical examples demonstrating the use and difference from `map`.
​     - Multiprocessing's `Pool.starmap()` uses exactly the same concept to map multi-argument functions in parallel.

     This understanding helps senior engineers grasp how `starmap` elegantly bridges the gap between iterable data structures and multi-argument function applications in Python’s iterator toolkit.

     [1] https://www.tutorialspoint.com/python/python_itertools_starmap_function.htm
     [2] https://www.geeksforgeeks.org/python/python-itertools-starmap/
     [3] https://www.educative.io/answers/what-is-the-itertoolsstarmap-method-in-python
     [4] https://www.mybluelinux.com/python-map-and-starmap-functions/
     [5] https://superfastpython.com/multiprocessing-pool-starmap/
     [6] https://docs.python.org/3/library/multiprocessing.html
     [7] https://indhumathychelliah.com/2020/09/14/exploring-map-vs-starmap-in-python/
     [8] https://stackoverflow.com/questions/56672348/applying-the-pool-starmap-function-with-multiple-arguments-on-a-dict-which-are
     [9] https://www.youtube.com/watch?v=aUUJRF6Zako

Merging Generators: yield items from multiple input iterables
- chain.from_iterable: It’s almost like flattening.
Generator functions that expand each input into multiple output items:
- pairwise is interesting: each item in the input, pairwise yields a 2-tuple with that item and the next — if there is a next item.
  list(itertools.pairwise(range(7)))

TRICK: Combinatorics Generators see the elaboration here:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
   The **combinatorics generators in the `itertools` module** of Python are a suite of powerful, memory-efficient iterator-based functions designed to generate combinatorial collections such as permutations, combinations, and Cartesian products systematically without building them fully in memory. These functions are foundational for algorithmic tasks involving combinatorial enumeration, sampling, or search.

   Here is a detailed outline tailored for a senior engineer who values mental models, rigor, and first principles:

   ***

   ### 1. Overview of Combinatorics Generators in `itertools`

   Python’s `itertools` module offers **four primary combinatoric iterators** that generate combinatorial constructs lazily:

   | Iterator Name                  | Purpose                                     | Key Characteristics                                      |
   |-------------------------------|---------------------------------------------|----------------------------------------------------------|
   | `product()`                   | Cartesian product of input iterables      | Generates tuples combining every element with every other (with optional repetition) |
   | `permutations()`               | All possible orderings (permutations)      | Generates all possible ordered arrangements of a specified length |
   | `combinations()`               | Combinations without replacement            | Generates all possible selections of a specified length without regard to order |
   | `combinations_with_replacement()` | Combinations allowing repeated elements     | Like combinations but allows repeated elements in output  |

   ***

   ### 2. Detailed Description with Mental Models

   #### a. `itertools.product(*iterables, repeat=1)`

   - **Conceptual model:** The Cartesian product can be thought of as the "all pairs/all tuples" construction, where you combine every element of iterable 1 with every element of iterable 2, and so forth.
​   - **Use case:** Explores all possible selections when repetition and order matter.
​   - **Implementation detail:** Produces tuples where each position corresponds to one iterable element. The `repeat` argument simulates self cartesian products.
​   - **Example:**

     ```python
     from itertools import product

     list(product([1, 2], repeat=2))
     # Output: [(1, 1), (1, 2), (2, 1), (2, 2)]
     ```

   - **Scaling note:** The output size grows multiplicatively — caution with large inputs.

   #### b. `itertools.permutations(iterable, r=None)`

   - **Conceptual model:** All possible orderings of `r` distinct elements from the iterable, where order matters.
​   - **Use case:** Problems requiring permutations without replacement.
​   - **Implementation:** Yields tuples of length `r` (default `r` equals length of iterable).
​   - **Example:**

     ```python
     from itertools import permutations

     list(permutations('ABC', 2))
     # Output: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
     ```

   - **Key mental model:** Unlike combinations, `'AB'` and `'BA'` are distinct.

   #### c. `itertools.combinations(iterable, r)`

   - **Conceptual model:** Select `r` elements from iterable, ignoring order and disallowing repetitions.
​   - **Use case:** Choosing subsets or unique groupings.
​   - **Implementation:** Yields sorted tuples of length `r`.
​   - **Example:**

     ```python
     from itertools import combinations

     list(combinations('ABC', 2))
     # Output: [('A', 'B'), ('A', 'C'), ('B', 'C')]
     ```

   - **Mental model:** For `['A', 'B', 'C']` picking 2 is like choosing pairs regardless of arrangement.

   #### d. `itertools.combinations_with_replacement(iterable, r)`

   - **Conceptual model:** Like combinations but elements can appear multiple times.
​   - **Use case:** Selecting combinations where repeats are allowed.
​   - **Implementation:** Yields sorted tuples where elements can recur.
​   - **Example:**

     ```python
     from itertools import combinations_with_replacement

     list(combinations_with_replacement('AB', 2))
     # Output: [('A', 'A'), ('A', 'B'), ('B', 'B')]
     ```

   ***

   ### 3. Functional and Performance Notes

   - All these combinatoric iterators **return generator objects**, supporting lazy evaluation to handle potentially large combinatorial spaces without memory blowup.
​   - Outputs are tuples representing fixed-size sequences or selections.
​   - The generation order and structure adhere to mathematical combinatorics rules (e.g., lexicographic ordering for combinations).
​   - Be mindful of the combinatorial explosion, where output size can grow factorially or exponentially with input size.

   ***

   ### 4. Related Tools and Concepts in `functools` and Others

   - While `functools` does not include combinatorics generators directly, its role complements iterator tools by providing **function composition**, **partial application** (`partial`), and caching mechanisms (`lru_cache`) common in higher-order functional programming.
​   - For complex pipeline building involving combinatorics generators, these tools enhance composability and performance in functional-style codebases.

   ***

   ### Mental Model Summary Table

   | Function                           | Category           | Usage Mental Model                              | Output                         | Notes                          |
   |----------------------------------|--------------------|------------------------------------------------|-------------------------------|--------------------------------|
   | `product(*iterables, repeat=1)`  | Cartesian product  | Cross-combine elements, repeat allows self-product | Tuples representing Cartesian product elements | Size = $$\prod |iterables_i|$$ or $$|iterable|^{repeat}$$ |
   | `permutations(iterable, r=None)` | Permutations       | Order-sensitive arrangements of length r       | Tuples of length r             | Number of permutations = n!/(n-r)! |
   | `combinations(iterable, r)`       | Combinations       | Unordered selection of r items without replacement | Tuples of length r             | Number = C(n, r)               |
   | `combinations_with_replacement(iterable, r)` | Combinations with replacement | Unordered selection allowing duplicates        | Tuples of length r             | Number = C(n+r-1, r)           |

   ***

   ### References

   - Python official documentation — `itertools` [combinatoric iterators section](https://docs.python.org/3/library/itertools.html#itertools.combinations)
​   - GeeksforGeeks "Combinatoric Iterators in Python" (2025)[1][2]
​   - Blog posts and overviews such as Juha-Matti Santala’s "Combinatoric iterators from itertools" (2024)[3]
​   - Real Python and Towards Data Science tutorials on `itertools`

   This combinatorics toolkit in `itertools` forms the backbone for algorithmic exploration in exhaustive, sampled, or random combinatorial problems while preserving efficiency and clarity in Python's iterator model. Let me know if you'd like me to dive into usage patterns, performance characteristics, or idiomatic compositions with these combinatorics generators.

   [1] https://www.geeksforgeeks.org/python-itertools/
   [2] https://www.geeksforgeeks.org/combinatoric-iterators-in-python/
   [3] https://hamatti.org/posts/combinatoric-iterators-from-itertools/
   [4] https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Itertools.html
   [5] https://towardsdatascience.com/a-guide-to-python-itertools-like-no-other-454da1ddd5b8/
   [6] https://realpython.com/python-itertools/
   [7] https://mathspp.com/blog/module-itertools-overview
   [8] https://docs.python.org/3/library/itertools.html
   [9] https://labex.io/tutorials/python-how-to-use-itertools-combinations-in-python-398083
   [10] https://arjancodes.com/blog/python-itertools-module-tutorial-for-efficient-data-handling/

Rearranger Generators: yield all items in the input iterables, in some rearranged ways
They all accept at most one input variable.
- itertools.groupby, itertool.tee return multiple generators
  - GOTCHA: itertools.groupby assumes that the input iterable is sorted by the grouping criterion, or at least that the items are clustered by that criterion — even if not completely sorted.
    e.g. use case: you can sort the datetime objects chronologically, then groupby weekday to get a group of Monday data, followed by Tuesday data, etc., and then by Monday (of the next week) again, and so on.
- itertools.tee similar to unix tee, gives us multiple generators to consume the yielded values independently.
  which has a unique behavior: it yields multiple generators from a single input iterable, each yielding every item from the input. Those generators can be consumed independently,

reversed only works with sequences

Iterable Reducing Functions

given an iterable, they return a single result $\implies$ “reducing”/ “folding” / “accumulating” functions.
Naturally, they have to work with bounded iterables, won’t work with infinite iterables.
all and any have the ability to short-circuit!

Subgenerators with yield from

objective is to let a generator delegate to a subgenerator
uses yield from

Reinventing chain

Here’s the implementation without yield from

1
2
3
4
5
6
7
8
9
def chain(*iterables):
        for it in iterables:
                for i in it:
                        yield i

s = 'ABC'
r = range(3)

return list(chain(s, r))

here’s how we can implement itertools.chain using yield from


A	B	C	0	1	2

1
2
3
4
5
6
7
8
def chain(*iterables):
        for it in iterables:
                yield from it

s = 'ABC'
r = range(3)

return list(chain(s, r))

Traversing a Tree

Step 2: using a subgenerator for the subtrees

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def tree(cls):
    yield cls.__name__, 0
    yield from sub_tree(cls)              # <1> here's the delegation from tree to sub_tree. here, the tree generator is suspended, and sub_tree takes over yielding values


def sub_tree(cls):
    for sub_cls in cls.__subclasses__():
        yield sub_cls.__name__, 1         # <2>


def display(cls):
    for cls_name, level in tree(cls):     # <3>
        indent = ' ' * 4 * level
        print(f'{indent}{cls_name}')


if __name__ == '__main__':
    display(BaseException)

the delegation from generator to sub-generator is interesting
here, the tree generator is suspended, and sub_tree takes over yielding values
we soon observe the following pattern:
We do a for loop to get the subclasses of level N. Each time around the loop, we yield a subclass of level N, then start another for loop to visit level N+1.

Step 5

we use the pattern seen before and call the same generator function again as a subgenerator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
  def tree(cls):
      yield cls.__name__, 0
      yield from sub_tree(cls, 1)


  def sub_tree(cls, level):
      for sub_cls in cls.__subclasses__():
          yield sub_cls.__name__, level
          yield from sub_tree(sub_cls, level+1)


  def display(cls):
      for cls_name, level in tree(cls):
          indent = ' ' * 4 * level
          print(f'{indent}{cls_name}')


  if __name__ == '__main__':
      display(BaseException)

This is limited only by Python’s recursion limit. The default limit allows 1,000 pending functions.

This also has an implicit base case:

sub_tree has no if, but there is an implicit conditional in the for loop: if cls.__subclasses__() returns an empty list, the body of the loop is not executed, therefore no recursive call happens. The base case is when the cls class has no subclasses. In that case, sub_tree yields nothing. It just returns.

Step 6: merge into a single generator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def tree(cls, level=0):
    yield cls.__name__, level
    for sub_cls in cls.__subclasses__():
        yield from tree(sub_cls, level+1)


def display(cls):
    for cls_name, level in tree(cls):
        indent = ' ' * 4 * level
        print(f'{indent}{cls_name}')


if __name__ == '__main__':
    display(BaseException)

yield from connects the subgenerator directly to the client code, bypassing the delegating generator. That connection becomes really important when generators are used as coroutines and not only produce but also consume values from the client code,

Generic Iterable Types

Mypy, reveals that the Iterator type is really a simplified special case of the Generator type.
Iterator[T] is a shortcut for Generator[T, None, None]. Both annotations mean “a generator that yields items of type T, but that does not consume or return values.”
Generators can consume and return values $\implies$ they are classic coroutines

Classic Coroutines via Enhanced Generators

“generators that can consume and return values”
these are not supported by asyncio
the modern, native coroutines are just called “coroutines” now.
2 ways to typehint generators:
Underlying C implementation is the same, they are just USED differently.
1. as an iterator: readings: Iterator[float]
  Bound to an iterator / generator object that yields float items
2. as a coroutine: sim_taxi: Generator[Event, float, int]
  The `sim_taxi` variable can be bound to a coroutine representing a taxi cab in a discrete event simulation. It yields events, receives `float` timestamps, and returns the number of trips made during the simulation
The type is named Generator, when in fact it describes the API of a generator object intended to be used as a coroutine, while generators are more often used as simple iterators.
Generator[YieldType, SendType, ReturnType]
Generator type has the same type parameters as typing.Coroutine: Coroutine[YieldType, SendType, ReturnType] (deprecated in favour of collections.abc.Coroutine) which is to annotate only native co-routines, not classic coroutines.
Some guidelines to avoid confusion:
- Generators produce data for iteration
- Coroutines are consumers of data
- To keep your brain from exploding, don’t mix the two concepts together
- Coroutines are not related to iteration
- Note: There is a use of having `yield` produce a value in a coroutine, but it’s not tied to iteration.

Example: Coroutine to Compute a Running Average

Old example of running average using closures. This is a higher order function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def make_averager():
        count = 0
        total = 0

        def averager(new_value):
                nonlocal count, total
                count += 1
                total += new_value
                return total / count
        return averager

yield statement here suspends the coroutine, yields a result to the client, and — later — gets a value sent by the caller to the coroutine, starting another iteration of the infinite loop.

The coroutine can keep internal state without needing any instance attrs or closures. They keep local state between activations $\implies$ attractive replacement for callbacks in async programming

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
A coroutine to compute a running average

# tag::CORO_AVERAGER_TEST[]
    >>> coro_avg = averager()  # <1>
    >>> next(coro_avg)  # <2>
    0.0
    >>> coro_avg.send(10)  # <3>
    10.0
    >>> coro_avg.send(30)
    20.0
    >>> coro_avg.send(5)
    15.0

# end::CORO_AVERAGER_TEST[]
# tag::CORO_AVERAGER_TEST_CONT[]

    >>> coro_avg.send(20)  # <1>
    16.25
    >>> coro_avg.close()  # <2>
    >>> coro_avg.close()  # <3>
    >>> coro_avg.send(5)  # <4>
    Traceback (most recent call last):
      ...
    StopIteration

# end::CORO_AVERAGER_TEST_CONT[]

"""

# tag::CORO_AVERAGER[]
from collections.abc import Generator

def averager() -> Generator[float, float, None]:  # <1> yields float, accepts float, nothing useful returned
    total = 0.0
    count = 0
    average = 0.0
    while True:  # <2> will keep accepting as long as there are values sent to this coroutine
        term = yield average  # <3> =yield= statement here suspends the coroutine, yields a result to the client, and — later — gets a value sent by the caller to the coroutine, starting another iteration of the infinite loop.
        total += term
        count += 1
        average = total/count
# end::CORO_AVERAGER[]

Priming/Starting the Coroutine
We can do an initial next(my_coroutine)
OR, we can send(None) to start it off. Only None works here because a coroutine can’t accept a sent value, unless it is suspended at a yield line.

Multiple activations
- After each activation, the coroutine is suspended precisely at the yield keyword, waiting for a value to be sent.
- coro_avg.send(10): yield expression resolves to the value 10, assigning it to the term variable. The rest of the loop updates the total, count, and average variables. The next iteration in the while loop yields the average, and the coroutine is again suspended at the yield keyword.
- i notice that there’s 2 states to the co-routine: active and suspended.

Terminating a coroutine
- can just stop referring to it and the coroutine can be garbage collected
- for explicit termination, we can call coro_avg.close()
- .close() method raises GeneratorExit at the suspended yield expression. If not handled in the coroutine function, the exception terminates it. GeneratorExit is caught by the generator object that wraps the coroutine—that’s
- calling close on a closed coroutine does nothing, but sending to a closed coroutine raises StopIteration

Returning a Value from a Coroutine

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
"""
A coroutine to compute a running average.

Testing ``averager2`` by itself::

# tag::RETURNING_AVERAGER_DEMO_1[]

    >>> coro_avg = averager2()
    >>> next(coro_avg)
    >>> coro_avg.send(10)  # <1>
    >>> coro_avg.send(30)
    >>> coro_avg.send(6.5)
    >>> coro_avg.close()  # <2>

# end::RETURNING_AVERAGER_DEMO_1[]

Catching `StopIteration` to extract the value returned by
the coroutine::

# tag::RETURNING_AVERAGER_DEMO_2[]

    >>> coro_avg = averager2()
    >>> next(coro_avg)
    >>> coro_avg.send(10)
    >>> coro_avg.send(30)
    >>> coro_avg.send(6.5)
    >>> try:
    ...     coro_avg.send(STOP)  # <1>
    ... except StopIteration as exc:
    ...     result = exc.value  # <2>
    ...
    >>> result  # <3>
    Result(count=3, average=15.5)

# end::RETURNING_AVERAGER_DEMO_2[]

Using `yield from`:


# tag::RETURNING_AVERAGER_DEMO_3[]

NOTE: this uses a delegating generator

    >>> def compute():
    ...     res = yield from averager2(True)  # <1>
    ...     print('computed:', res)  # <2>
    ...     return res  # <3>
    ...
    >>> comp = compute()  # <4>
    >>> for v in [None, 10, 20, 30, STOP]:  # <5>
    ...     try:
    ...         comp.send(v)  # <6>
    ...     except StopIteration as exc:  # <7> Have to capture the StopIteration, else the GeneratorExit exception is raised at the yield line in the coroutine, so the return statement is never reached.
    ...         result = exc.value
    received: 10
    received: 20
    received: 30
    received: <Sentinel>
    computed: Result(count=3, average=20.0)
    >>> result  # <8>
    Result(count=3, average=20.0)

# end::RETURNING_AVERAGER_DEMO_3[]
"""

# tag::RETURNING_AVERAGER_TOP[]
from collections.abc import Generator
from typing import Union, NamedTuple

class Result(NamedTuple):  # <1>
    count: int  # type: ignore  # <2>
    average: float

class Sentinel:  # <3>
    def __repr__(self):
        return f'<Sentinel>'

STOP = Sentinel()  # <4>

SendType = Union[float, Sentinel]  # <5> modern python, write it as SendType: TypeAlias = float | Sentinel, or directly use the =|= union in the generator SendType type param

# end::RETURNING_AVERAGER_TOP[]
# tag::RETURNING_AVERAGER[]
def averager2(verbose: bool = False) -> Generator[None, SendType, Result]:  # <1> None data yielded, returns Result type, which is a named tuple (subclass of tuple)
    total = 0.0
    count = 0
    average = 0.0
    while True:
        term = yield  # <2> this consumes data (when resuming, assigned to variable named "term")
        if verbose:
            print('received:', term)
        if isinstance(term, Sentinel):  # <3> if received the Sentinel, break from infinite loop.
            break
        total += term  # <4>
        count += 1
        average = total / count
    return Result(count, average)  # <5> reachable only if Sentinel is sent to the coroutine

# end::RETURNING_AVERAGER[]

coroutines can consume data, (based on the SendType type param). So that’s why it makes sense to use yield directly.
Calling .close() in this coroutine makes it stop but does not return a result, because the GeneratorExit exception is raised at the yield line in the coroutine, so the return statement is never reached.
Instead, we do:
1 2 3 4 5
try: coro_avg.send(STOP) except StopIteration as exc: result = exc.value return result
1. STOP sentinel makes the coroutine break from the loop and return a Result The generator object that wraps the coroutine then raises StopIteration.
2. The instance of StopIteration has a value attribute bound to the value of the return statement that terminated the coroutine.

Generic Type Hints for Classic Coroutines

stick to my existing mental model:
- producers can produce more specific types, i.e. can be narrower $\implies$ they are covariant $\implies$ they are output types
  so this corresponds to the YieldType and ReturnType
```
                   float :&gt; int
```
  Generator[float, Any, float] :> Generator[int, Any, int]
  Corresponds to variance rule of thumb 1:
  If a formal type parameter defines a type for data that comes out of the object, it can be covariant.
- consumers can accept more generic types i.e. they can accept wider $\implies$ they are contravariant $\implies$ they are input types
  so this corresponds to SendType
```
                 float :&gt; int
```
  Generator[Any, float, Any] <: Generator[Any, int, Any]
  Corresponds to variance rule of thumb 2:
  If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant.

elaboration on why my mental model is correct:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
  Your **mental model about generic type hints for classic coroutines in Python is fundamentally correct** and aligns well with the design principles of variance in type systems and how they apply to the parameters of the `Generator` or `Coroutine` generic types.

  To unpack this rigorously:

  ***

  ### Context: Generic Type Parameters for Classic Coroutines

  Classic (generator-based) coroutines and generators can be annotated using the generic `Generator` type hint which has three type parameters:

  $$
  Generator[YieldType, SendType, ReturnType]
  $$

  - **YieldType** — the type of values **produced** (yielded) by the generator/coroutine.
​  - **SendType** — the type of values the generator/coroutine **accepts** via `.send()` (like parameters you can pass in).
​  - **ReturnType** — the type of the value returned upon completion (from the `return` statement inside the generator).

  ***

  ### Your mental model restated

  - **Producers produce more specific types (narrower types).** This corresponds to output types, and hence these are **covariant types**.
    This includes `YieldType` (values yielded) and `ReturnType` (final returned value).

  - **Consumers accept more generic types (wider types).** These correspond to input types, hence they are **contravariant types**.
    This corresponds to `SendType` — the values you send **into** the coroutine, which can be more general.

  ***

  ### What does this mean in terms of type variance?

  - **Covariance** means that if $$A$$ is a subtype of $$B$$, then a producer of $$A$$ is also a producer of $$B$$ (safe to substitute). For example, if you expect `YieldType` to be a `Animal`, then yielding `Dog` (subclass of Animal) is safe.

  - **Contravariance** means that consumers are the opposite — if you can accept a `B`, then you can accept an `A` if $$A$$ is a subtype of $$B$$ (i.e., input types vary contravariantly). The coroutine can accept wider types for sending.

  **Consequently:**

  - `YieldType` and `ReturnType` are **covariant** (output positions).
​  - `SendType` is **contravariant** (input position).

  ***

  ### Supporting evidence from Python typing docs and references:

  From the official `typing` module documentation and the typing system used in Python:

  - The `Generator` and `Coroutine` generic types define variance as follows:

  ```python
  from typing import Generator

  # Generator[YieldType, SendType, ReturnType]
  # YieldType is covariant
  # SendType is contravariant
  # ReturnType is covariant
  ```

  This is also explicitly mentioned in the Python 3.12 typing docs [typing — Support for type hints](https://docs.python.org/3/library/typing.html):

  > "Note that unlike many other generic classes in the standard library, the `SendType` of `Generator` behaves contravariantly, not covariantly or invariantly."

  ***

  ### Mental model mapping for classic coroutines:

  | Role                     | Type Parameter     | Type Variance        | Reason / Interpretation                          |
  |--------------------------|--------------------|---------------------|------------------------------------------------|
  | Values *output* (yielded) | `YieldType`        | Covariant (produced) | Coroutine produces values of this type          |
  | Value passed *in* to `.send()` | `SendType`      | Contravariant (consumed) | Coroutine consumes values of this type from caller |
  | Value *returned* upon completion | `ReturnType`     | Covariant (produced) | Coroutine returns this type after finishing     |

  ***

  ### Intuition:

  - When you **consume** a coroutine, you can **send** values that are more general (superclasses, supertypes) than what the coroutine internally expects — because the coroutine must handle all these general cases safely.
​  - When a coroutine **yields** or **returns** values, it provides more specific types (subclasses or narrower types) back to the caller.

  ***

  ### Example to make it concrete

  Suppose you have:

  ```python
  from typing import Generator

  def coro() -> Generator[int, str, float]:
      # Yield ints
      val = yield 1
      # Accept strings sent via send
      ...
      return 3.14
  ```

  - `YieldType` = `int`: you produce integers.
​  - `SendType` = `str`: you accept strings sent to the coroutine.
​  - `ReturnType` = `float`: coroutine returns a float on completion.

  This fits the mental model that:

  - You produce (yield) integers (specific concrete type).
​  - You consume (accept via `.send()`) strings (input).
​  - You return a float (specific type).

  ***

  ### Summary

  Your mental model:

  - **Producers = covariant = YieldType, ReturnType**
​  - **Consumers = contravariant = SendType**

  is **correct and aligns perfectly with the Python typing system** for classic coroutines and generators.

  ***

  ### Supporting references for further deep-dive:

  - [Python typing module documentation - Annotating generators and coroutines](https://docs.python.org/3/library/typing.html#annotating-generators-and-coroutines) (2025-08-08)
​  - Fluent Python by Luciano Ramalho, "Classic Coroutines" chapter, which discusses these typing details around classic coroutines and variance.
​  - Python typing source and PEP 484 (Type Hints) discussing variance in generic types.
​  - Stack Overflow [The right way to type hint a Coroutine function?](https://stackoverflow.com/questions/73240620/the-right-way-to-type-hint-a-coroutine-function) detailing coroutine generic parameters and variance.

  ***

  Please let me know if you want me to expand on how this mental model impacts safe API design, variance variance violations, or practical examples of generic coroutine typing!

  [1] https://stackoverflow.com/questions/73240620/the-right-way-to-type-hint-a-coroutine-function
  [2] https://docs.python.org/3/library/typing.html
  [3] https://www.r-bloggers.com/2023/06/leveraging-generic-type-hints-of-classes-in-python/
  [4] https://www.fluentpython.com/extra/classic-coroutines/
  [5] https://www.reddit.com/r/Python/comments/10zdidm/why_type_hinting_sucks/
  [6] https://www.linkedin.com/pulse/python-generators-elegant-efficient-often-underused-peter-eldritch-a7faf
  [7] https://docs.python.org/3/reference/datamodel.html

Chapter Summary

kiv native coroutines will come soon, the yield from is now just await in the native coroutine syntax.

Chapter 18. with, match, and else Blocks

this chapter is about control flow structures that are especially powerful in python
magic of with statements and how the context manager gives safety
the magic of match statements and how that is expressive for languages (including custom DSLs)

What’s New in This Chapter

Context Managers and with Blocks

context managers exist to control a with statement
analogous to for statements controlled by iterators
MISCONCEPTIONS:
this is correct: a finally block is always guaranteed to run, even if the try block has a return, sys.exit() or an exception raised.
I just never paid attention to this.
That’s why it’s good for cleanup: resource release / reverting or undoing temporary state changes
with blocks don’t define a new scope like how functions do, that’s why the names are accessible outside of the block
syntax:
- in with open('mirror.py') as fp:,
  - evaluating the expression after the with gives the context manager object, i.e. open('mirror.py')
    the context manager object here is an instance of TextIOWrapper, this is what the open() function returns.
    the __enter__ method of TextIOWrapper returns self
  - the target variable is within the as clause is bound to something
    the as clause is optional
  - that something is the result returned by the __enter__ method of the context manager object (TextIOWrapper), which we determined was self (i.e. the context manager instance)
- for any reason, when the control flow exists the with block, then __exit__ is called on the context manager object.
  This is NOT called on whatever that was returned by __enter__ and stored by the target variable.

example code Custom Context Manager for mirror

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
  """
  A "mirroring" ``stdout`` context.

  While active, the context manager reverses text output to
  ``stdout``::

  # tag::MIRROR_DEMO_1[]

      >>> from mirror import LookingGlass
      >>> with LookingGlass() as what:  # <1>
      ...      print('Alice, Kitty and Snowdrop')  # <2>
      ...      print(what)
      ...
      pordwonS dna yttiK ,ecilA
      YKCOWREBBAJ
      >>> what  # <3>
      'JABBERWOCKY'
      >>> print('Back to normal.')  # <4>
      Back to normal.

  # end::MIRROR_DEMO_1[]


  This exposes the context manager operation::

  # tag::MIRROR_DEMO_2[]

      >>> from mirror import LookingGlass
      >>> manager = LookingGlass()  # <1>
      >>> manager  # doctest: +ELLIPSIS
      <mirror.LookingGlass object at 0x...>
      >>> monster = manager.__enter__()  # <2>
      >>> monster == 'JABBERWOCKY'  # <3>
      eurT
      >>> monster
      'YKCOWREBBAJ'
      >>> manager  # doctest: +ELLIPSIS
      >... ta tcejbo ssalGgnikooL.rorrim<
      >>> manager.__exit__(None, None, None)  # <4>
      >>> monster
      'JABBERWOCKY'

  # end::MIRROR_DEMO_2[]

  The context manager can handle and "swallow" exceptions.

  # tag::MIRROR_DEMO_3[]

      >>> from mirror import LookingGlass
      >>> with LookingGlass():
      ...      print('Humpty Dumpty')
      ...      x = 1/0  # <1>
      ...      print('END')  # <2>
      ...
      ytpmuD ytpmuH
      Please DO NOT divide by zero!
      >>> with LookingGlass():
      ...      print('Humpty Dumpty')
      ...      x = no_such_name  # <1>
      ...      print('END')  # <2>
      ...
      Traceback (most recent call last):
        ...
      NameError: name 'no_such_name' is not defined

  # end::MIRROR_DEMO_3[]

  """


  # tag::MIRROR_EX[]
  import sys

  class LookingGlass:

      def __enter__(self):  # <1>
          self.original_write = sys.stdout.write  # <2>
          sys.stdout.write = self.reverse_write  # <3>
          return 'JABBERWOCKY'  # <4>

      def reverse_write(self, text):  # <5>
          self.original_write(text[::-1])

      def __exit__(self, exc_type, exc_value, traceback):  # <6>
          sys.stdout.write = self.original_write  # <7>
          if exc_type is ZeroDivisionError:  # <8>
              print('Please DO NOT divide by zero!')
              return True  # <9>
          # <10> NOTE: if exit returns None or any falsy value, any exception raised in the =with= block will be propagated.
  # end::MIRROR_EX[]

enter and exit:
1. __enter__ is called without any arguments other than the implicit self (which is the context manager instance)
2. __exit__ is called with 3 arguments:
  - exc_type
  - exc_value the actual exception instance
  - traceback
  these 3 args received by self are the same as what happens if we call sys.exc_info() in the finally block of a try/finally. in the past, calling that was necessary to determine how to do the cleanup.

we can now do parenthesized context managers:

1
2
3
4
5
6
  with (
          CtxManager1() as example1,
          CtxManager2() as example2,
          CtxManager3() as example3,
  ):
          ...

thanks to a new parser from python 3.10 onwards

The contextlib Utilities
- first-reach before writing custom context managers
- things that look useful:
  - using @contextmanager to build a context manager from a generator function
  - ContextDecorator to define class-based context managers
  - the async versions of all of them

Using @contextmanager

just implement a generator with a single yield statement that should produce whatever you want the __enter__ method to return
the yield splits the function body into two parts:
- before yield: gets executed at the beginning of the with block when interpreter calls __enter__
- after yield: gets executed when __exit__ is called at the end of the block

correct example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
  """
  A "mirroring" ``stdout`` context manager.

  While active, the context manager reverses text output to
  ``stdout``::

  # tag::MIRROR_GEN_DEMO_1[]

      >>> from mirror_gen import looking_glass
      >>> with looking_glass() as what:  # <1>
      ...      print('Alice, Kitty and Snowdrop')
      ...      print(what)
      ...
      pordwonS dna yttiK ,ecilA
      YKCOWREBBAJ
      >>> what
      'JABBERWOCKY'

  # end::MIRROR_GEN_DEMO_1[]


  This exposes the context manager operation::

  # tag::MIRROR_GEN_DEMO_2[]

      >>> from mirror_gen import looking_glass
      >>> manager = looking_glass()  # <1>
      >>> manager  # doctest: +ELLIPSIS
      <contextlib._GeneratorContextManager object at 0x...>
      >>> monster = manager.__enter__()  # <2>
      >>> monster == 'JABBERWOCKY'  # <3>
      eurT
      >>> monster
      'YKCOWREBBAJ'
      >>> manager  # doctest: +ELLIPSIS
      >...x0 ta tcejbo reganaMtxetnoCrotareneG_.biltxetnoc<
      >>> manager.__exit__(None, None, None)  # <4>
      False
      >>> monster
      'JABBERWOCKY'

  # end::MIRROR_GEN_DEMO_2[]

  The context manager can handle and "swallow" exceptions.
  The following test does not pass under doctest (a
  ZeroDivisionError is reported by doctest) but passes
  if executed by hand in the Python 3 console (the exception
  is handled by the context manager):

  # tag::MIRROR_GEN_DEMO_3[]

      >>> from mirror_gen_exc import looking_glass
      >>> with looking_glass():
      ...      print('Humpty Dumpty')
      ...      x = 1/0  # <1>
      ...      print('END')  # <2>
      ...
      ytpmuD ytpmuH
      Please DO NOT divide by zero!

  # end::MIRROR_GEN_DEMO_3[]

      >>> with looking_glass():
      ...      print('Humpty Dumpty')
      ...      x = no_such_name  # <1>
      ...      print('END')  # <2>
      ...
      Traceback (most recent call last):
        ...
      NameError: name 'no_such_name' is not defined



  """


  # tag::MIRROR_GEN_EXC[]
  import contextlib
  import sys

  @contextlib.contextmanager
  def looking_glass():
      original_write = sys.stdout.write

      def reverse_write(text):
          original_write(text[::-1])

      sys.stdout.write = reverse_write
      msg = ''  # <1>
      try:
          yield 'JABBERWOCKY'
      except ZeroDivisionError:  # <2>
          msg = 'Please DO NOT divide by zero!'
      finally:
          sys.stdout.write = original_write  # <3>
          if msg:
              print(msg)  # <4>
  # end::MIRROR_GEN_EXC[]

it’s unavoiadable to use the try/except when using the yield when using @contextmanager since we never know what the users of the context managers will do.
GOTCHA:
Generally,
if __exit__ returns truthy even if there’s an exception, then the exception is suppressed. If it’s not truthy, then the exception is propagated out
HOWEVER, with @contextmanager, the default behaviour is inverted. the __exit__ method provided by the decorator assumes any exception sent into the generator is handled and should be suppressed.

(flawed) example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
  """
  A "mirroring" ``stdout`` context manager.

  While active, the context manager reverses text output to
  ``stdout``::

  # tag::MIRROR_GEN_DEMO_1[]

      >>> from mirror_gen import looking_glass
      >>> with looking_glass() as what:  # <1>
      ...      print('Alice, Kitty and Snowdrop')
      ...      print(what)
      ...
      pordwonS dna yttiK ,ecilA
      YKCOWREBBAJ
      >>> what
      'JABBERWOCKY'
      >>> print('back to normal')
      back to normal


  # end::MIRROR_GEN_DEMO_1[]


  This exposes the context manager operation::

  # tag::MIRROR_GEN_DEMO_2[]

      >>> from mirror_gen import looking_glass
      >>> manager = looking_glass()  # <1>
      >>> manager  # doctest: +ELLIPSIS
      <contextlib._GeneratorContextManager object at 0x...>
      >>> monster = manager.__enter__()  # <2>
      >>> monster == 'JABBERWOCKY'  # <3>
      eurT
      >>> monster
      'YKCOWREBBAJ'
      >>> manager  # doctest: +ELLIPSIS
      >...x0 ta tcejbo reganaMtxetnoCrotareneG_.biltxetnoc<
      >>> manager.__exit__(None, None, None)  # <4>
      False
      >>> monster
      'JABBERWOCKY'

  # end::MIRROR_GEN_DEMO_2[]

  The decorated generator also works as a decorator:


  # tag::MIRROR_GEN_DECO[]
      >>> @looking_glass()
      ... def verse():
      ...     print('The time has come')
      ...
      >>> verse()  # <1>
      emoc sah emit ehT
      >>> print('back to normal')  # <2>
      back to normal

  # end::MIRROR_GEN_DECO[]

  """


  # tag::MIRROR_GEN_EX[]
  import contextlib
  import sys

  @contextlib.contextmanager  # <1>
  def looking_glass():
      original_write = sys.stdout.write  # <2>

      def reverse_write(text):  # <3>
          original_write(text[::-1])

      sys.stdout.write = reverse_write  # <4>
      yield 'JABBERWOCKY'  # <5>
      sys.stdout.write = original_write  # <6>
  # end::MIRROR_GEN_EX[]

this is flawed because if an exception is raised in the body of the with block, the Python interpreter will catch it and raise it again in the yield expression inside looking_glass. But there is no error handling there, so the looking_glass generator will terminate without ever restoring the original sys.stdout.write method, leaving the system in an invalid state.

Cleanup not done if there’s an exception raised within the with block.

TRICK: generators decorated with it can also be used as decorators themselves.
happens because @contextmanager is implemented with the contextlib.ContextDecorator class.
1 2 3 4 5
@looking_glass() def verse(): print("the time has come") verse() # returns in reverse
Here, looking_glass does its job before and after the body of verse runs.

Pattern Matching in lis.py: A Case Study

Scheme Syntax

Imports and Types

The Parser

The Environment

The REPL

The Evaluator

Procedure: A Class Implementing a Closure

Using OR-patterns

Do This, Then That: else Blocks Beyond if

use cases: avoids the need to setup extra control flags or coding extra if statements
- raising pattern:
  1 2 3 4 5
  for item in my_list: if item.flavor == 'banana': break else: raise ValueError('No banana flavor found!')
- keep the try blocks lean in try/except
  The body of the try block should only have the statements that generate the expected exceptions.
  Instead of doing this:
  1 2 3 4 5
  try: dangerous_call() after_call() except OSError: log('OSError...')
  We should do:
  1 2 3 4 5 6
  try: dangerous_call() except OSError: log('OSError...') else: after_call()
This gives clarity.
try block is guarding against possible errors in dangerous_call() and not in after_call(). It’s also explicit that after_call() will only execute if no exceptions are raised in the try block.
IDIOM: try/except is NOT only for error handling, it can be used for control flow as well. E.g. duck typing type checks.
Follows EAFP
EAFP
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.
this contrasts LBYL:
LBYL
Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements. In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping.” For example, the code, if key in mapping: return mapping[key] can fail if another thread removes key from mapping after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach.

else blocks apply to most control flow constructs, they are closely related to each other but very different from if/else
LANGUAGE_LIMITIATION: GOTCHA: the else keyword is more of more of a “run this loop, then do that” instead of “Run this loop, otherwise do that”
- for: else block will run only if and when the for loop runs to completion (i.e., not if the for is aborted with a break).
- while: else block will run only if and when the while loop exits because the condition became falsy (i.e., not if the while is aborted with a break).
- try: The else block will run only if no exception is raised in the try block.
  NOTE: “Exceptions in the else clause are not handled by the preceding except clauses.”
the else clause is also skipped if an exception or a return, break, or continue statement causes control to jump out of the main block of the compound statement.

Chapter Summary

an insight:
subroutines are the most important invention in the history of computer languages. If you have sequences of operations like A;B;C and P;B;Q, you can factor out B in a subroutine. It’s like factoring out the filling in a sandwich: using tuna with different breads. But what if you want to factor out the bread, to make sandwiches with wheat bread, using a different filling each time? That’s what the with statement offers. It’s the complement of the subroutine.

Chapter 19. Concurrency Models in Python

concurrency vs parallelism; informally speaking
- concurrency: dealing with multiple things done at once $\implies$ it’s about structure of a solution
  the structure provided by concurrent solutions may help solve a problem (though not necessarily) in a parallelized fashion.
- parallelism: doing lots of things at once $\implies$ execution of the solution
  in this informal view, it’s a special case of concurrency, so parallel $\implies$ concurrent
Python’s three approaches to concurrency: threads, processes, and native coroutine.
python’s fitness for concurrent and parallel computing is not limited to what the std lib provides. Python can scale.

What’s New in This Chapter

The Big Picture

factor of difficulty when writing concurrent programs: starting threads or processes is easy enough, but how do you keep track of them?
non concurrent programs, function call is blocking so useful for us
concurrent programs, non blocking, need to rely on some form of communication to get back results or errors
starting a thread is not cheap $\implies$ amortize costs by using “worker” threads/procs $\implies$ coordinating them is tough e.g. how to terminate?
resolved using messages and queues still
coroutines are useful:
- cheap to start
- returns values
- can be safely cancelled
- specific area to catch exceptions
But they have problems:
- they’re handled by the async framework $\implies$ hard to monitor as threads / procs
- not good for CPU-intensive tasks

A Bit of Jargon

Concurrency: ability to handle multiple pending tasks (each eventually succeeding or failing) $\implies$ can multitask
Parallelism: ability to compute multiple computations at the same time $\implies$ multicore CPU, multiple CPU, GPU, multiple computers in a cluster
Execution Unit: objects executing concurrent code. Each has independent state and call stack
Python execution units:
1. processes
  - definition:
    instance of computer program while it’s running, using memory and CPU time-slices, all of which has its own private memory space
  - communication:
    objects communicated as raw bytes (so must be serialised) to pass from one proc to another. Communicated via pipes, sockets or memory-mapped files
  - spawning:
    can spawn child procs which are all isolated from the parent
  - scheduling:
    can be pre-emptively scheduled, supposed to be that a frozen proc won’t freeze the whole system
2. threads
  - definition:
    execution unit within a single process
    consumes less resources than a process (if they both did the same job)
  - lifecycle:
    @ start of process, there’s a single thread. Procs can create more threads by calling OS APIs
  - Shared Memory management:
    Threads within a process share the same memory space $\implies$ holds live Python object. Shared memory may be corrupted via read/write race conditions
  - Supervision:
    Also supervised by OS Scheduler, threads can enable pre-emptive multitasking
3. coroutines
  - Definition:
    A function that can suspend itself and resume later.
    Classic Coroutines: built from generator functions
    Native Coroutines: defined using async def
  - Supervising coroutines:
    Typically, coroutines run within a single thread, supervised by an event loop that is in the same thread.
    Async frameworks provide an event loop and supporting libs that support nonblocking, coroutine-based I/)
  - Scheduling & Cooperative Multitasking:
    each coroutine must explicitly cede control with the yield or await keyword, so that another may proceed concurrently (but not in parallel).
    so if there’s any blocking code in a coroutine block, it would block the execution of the event loop and hence all other coroutines
    this contrasts preemptive multitasking supported by procs and threads.
    nevertheless, coroutine consumes less resources than a thread or proc doing the same job
Mechanisms useful to us:
1. Queue:
  - purpose:
    allow separate execution units to exchange application data and control messages, such as error codes and signals to terminate.
  - implementation:
    depends on concurrency model:
    - python stdlib queue gives queue classes to support threads
      this also provides non-FIFO queues like LifoQueue and PriorityQueue
    - multiprocessing, asyncio packages have their own queue classes
      asyncio also provides non-FIFO queues like LifoQueue and PriorityQueue
2. Lock:
  - purpose:
    Sync mechanism object for execution units to sync actions and avoid data corruption
    While updating a shared data structure, the running code should hold an associated lock.
  - implementation:
    depends on the concurrency model
    simplest form of a lock is just a mutex
3. Contention: dispute over a limited asset
  - Resource Contention
    When multiple exeuction units try to access a shared resoruce (e.g. a lock / storage)
  - CPU Contention
    Compute-intensive procs / threads must wait for the OS scheduler to give them a share of CPU time

Processes, Threads, and Python’s Infamous GIL
Here’s 10 points that consolidate info about python’s concurrency support:
1. Instance of python interpreter $\implies$ a process
  We can create additional Python processes $\leftarrow$ use multiprocessing / concurrent.futures libraries
  We can also start sub-processes that run any other external programs. $\leftarrow$ using subprocess library
2. Interpreter runs user program and the GC in a single thread. We can start additional threads using threading / concurrent.futures libraries.
3. GIL (Global Interpreter Lock) controls internal interpreter state (process state shared across threads) and access to object ref counts.
  Only one python thread can hold the GIL at any time $\implies$ only one thread can execute Python code at any time, regardless the number of CPU cores.
  GIL is NOT part of the python language definition, it’s a CPython Implementation detail. This is critical for portability reasons.
4. Default release of the GIL @ an interval:
  Prevents any particular thread from holding the GIL indefinitely.
  It’s the bytecode interpreter that pauses the current thread every 5ms default (can be changed) and the OS Scheduler picks who (which thread) gets access to the GIL next (could be the same thread that just released the GIL also).
5. Python source code can’t control the GIL but extension / builtin written in C (or lang that interfaces at the Python/C API level) can release the GIL when it’s running time-consuming tasks.
6. Every python stdlib that does a syscall (for kernel services) will release the GIL. This avoids contention of resources (mem as well as CPU)
  - functions that perform I/O operations (disk, network, sleep)
  - functions that are CPU-intensive (e.g. NumPy / SciPy), compressing/decompressing functions (e.g. zlib, bz2)
7. GIL-free threads:
  - can only be launched by extensions that integrate at the Python/C API level
  - can’t change python objects generally, but can R/W to memory objects that support buffer protocols (bytearray, array.array, NumPy arrays)
  - GIL-free python is under experimentation at the moment (but not mainstream)
8. Network I/O is GIL-insensitive
  GIL minimally affects network programming because Network I/O is higher latency than memory I/O.
  Each individual thread would have spent long time waiting anyway so interleaving their execution doesn’t majory impact the overall throughput.
9. Compute-intensive python threads $\implies$ will be slowed down by GIL contention.
  Better to use sequential, single-threaded code here. Faster and simpler.
10. CPU-intensive python code to be ran on multiple cores requires multiple python processes.
- Extra Notes:
  1. Coroutines are not affected by the GIL
    by default they share the same Python thread among themselves and with the supervising event loop provided by an asynchronous framework, therefore the GIL does not affect them.
  2. We technically can use multiple therads in an async program. This is not best practice.
    Typically, we have one coordinating thread running the event loops, which delegates to additional threads that carry out specific tasks.
    KIV “delegating tasks to executors”

A Concurrent Hello World

a demo of how python can “walk and chew gum”, using multiple approaches: multiprocessing, threading, asyncio

Spinner with Threads

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# spinner_thread.py

# credits: Adapted from Michele Simionato's
# multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_THREAD_TOP[]
import itertools
import time
from threading import Thread, Event

def spin(msg: str, done: Event) -> None:  # <1> this fn runs in a separate thread, Event instance is for syncing of threads
    for char in itertools.cycle(r'\|/-'):  # <2> infinite loop because infinite iterator
        status = f'\r{char} {msg}'  # <3> the carriage return TRICK
        print(status, end='', flush=True)
        if done.wait(.1):  # <4> ??? the timeout value sets the "framerate" of the animation (0.1s => 10FPS)
            break  # <5> break inf loop
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')  # <6> clears the status line

def slow() -> int: # called by the main thread
    time.sleep(3)  # <7> this is a blocking syscall, so GIL is released, which will allow other threads to be executed
    return 42
# end::SPINNER_THREAD_TOP[]

# tag::SPINNER_THREAD_REST[]
def supervisor() -> int:  # <1> eventually returns the result of =slow=
    done = Event()  # <2> to coordinate =main= and =spinner= thread
    spinner = Thread(target=spin, args=('thinking!', done))  # <3> spawn thread
    print(f'spinner object: {spinner}')  # <4> displays as <Thread(Thread-1, initial)> ; initial means the thread not started yet
    spinner.start()  # <5>
    result = slow()  # <6> call slow, blocks the =main= thread, while the secondary =spinner= thread still runs the animation
    done.set()  # <7> signals spin function to exit, terminates the fot loop inside the spin function
    spinner.join()  # <8> wait until spinner finishes (fork-join!)
    return result

def main() -> None:
    result = supervisor()  # <9> just a didatic purpose, to make it similar to the asyncio version
    print(f'Answer: {result}')

if __name__ == '__main__':
    main()

# end::SPINNER_THREAD_REST[]

Notes:

within slow(), time.sleep blocks the calling thread but releases the GIL, so other Python threads (in this case our secondary thread for spinner) can run.
spin and slow executed concurrently, the supervisor coordinates the threads using an instance of threading.Event
creating threads:
create a new Thread, provide a function as the target keyword argument, and positional arguments to the target as a tuple passed via args spinner = Thread(target=spin, args=('thinking!', done)) # <3> spawn thread
we can also pass in kwargs using kwargs named parameter to Thread constructor

threading.Event:
1. Python’s simplest signalling mechanism to coordinate threads.
2. Event instance has an internal boolean flag that starts as False. Calling Event.set() sets the flag to True.
  - when flag is False (unset):
    - if a thread calls Event.wait(), the thread is blocked until another thread calls Event.set(). When this happens, Event.wait() returns True
    - If timeout is provided Event.wait(s), the call returns False when timeout elapses.
      As soon as another thread calls Event.set() then the wait function will return True.
TRICK: for text-mode animation: move the cursor back to the start of the line with the carriage return ASCII control character ('\r').

Spinner with Processes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# spinner_proc.py

# credits: Adapted from Michele Simionato's
# multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_PROC_IMPORTS[]
import itertools
import time
from multiprocessing import Process, Event  # <1>
from multiprocessing import synchronize     # <2> this import supports the type hinting for the Event

def spin(msg: str, done: synchronize.Event) -> None:  # <3> attention to the typehint
# end::SPINNER_PROC_IMPORTS[]
    for char in itertools.cycle(r'\|/-'):
        status = f'\r{char} {msg}'
        print(status, end='', flush=True)
        if done.wait(.1):
            break
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')

def slow() -> int:
    time.sleep(3)
    return 42

# tag::SPINNER_PROC_SUPER[]
def supervisor() -> int:
    done = Event()
    spinner = Process(target=spin,               # <4>
                      args=('thinking!', done))
    print(f'spinner object: {spinner}')          # <5> displays <Process name='Process-1' parent=14868 initial> so it tells you the PID and the initial state.
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result
# end::SPINNER_PROC_SUPER[]

def main() -> None:
    result = supervisor()
    print(f'Answer: {result}')


if __name__ == '__main__':
    main()

multiprocessing package supports running concurrent tasks in separate Python processes instead of threads.
each instance has its own python interpreter, procs will be working in the background.
Each proc has its own GIL $\implies$ we can exploit our multicore CPU well because of this (depends on the OS scheduler though)
multiprocessing API emulates the threading API $\implies$ can easily convert between them.
Comparing multiprocessing and therading APIs
- similarities
  1. Event objects are similar in how they function with the bit setting / unsetting
  2. Event objects can wait on timeouts
- differences:
  1. Event is of different type between them multiprocessing.Event is a function (not a class like threading.Event)
  2. multiprocessing has a larger API because it’s more complex
    e.g. python objects that would need to be communicated across process need to be serialized/deserialized because it’s an OS-level isolation (of processes). This adds overhead.
    the Event state is the only cross-proccess state being shared, it’s implemented via an OS semaphore
    memory sharing can be done via multiprocessing.shared_memory. Only raw bytes, can use a ShareableList (mutable sequence) with a fixed number of items of some primitives up to 10MB per item.

Spinner with Coroutines

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# spinner_async.py

# credits: Example by Luciano Ramalho inspired by
# Michele Simionato's multiprocessing example in the python-list:
# https://mail.python.org/pipermail/python-list/2009-February/675659.html

# tag::SPINNER_ASYNC_TOP[]
import asyncio
import itertools

async def spin(msg: str) -> None:  # <1> no need Event as a coordinating mechanism
    for char in itertools.cycle(r'\|/-'):
        status = f'\r{char} {msg}'
        print(status, flush=True, end='')
        try:
            await asyncio.sleep(.1)  # <2> this is a non-blocking pause
        except asyncio.CancelledError:  # <3> when the cancelled method is called on the task that is controlling this coroutine
            break
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')

async def slow() -> int:
    await asyncio.sleep(3)  # <4> also uses the non blocking sleep
    return 42
# end::SPINNER_ASYNC_TOP[]

# tag::SPINNER_ASYNC_START[]
def main() -> None:  # <1> only regular function here, rest are coroutines
    result = asyncio.run(supervisor())  # <2> supervisor coroutine will block the main function
    print(f'Answer: {result}')

async def supervisor() -> int:  # <3> native coroutine definition
    spinner = asyncio.create_task(spin('thinking!'))  # <4> returns a Task, schedules the eventual execution of spin
    print(f'spinner object: {spinner}')  # <5> <Task pending name='Task-2' coro=<spin() running at /path/to/spinner_async.py:11>>
    result = await slow()  # <6> =await= calls slow, blocks =supervisor= until =slow= returns
    spinner.cancel()  # <7> Task.cancel() raises =CancelledError= inside the coro task
    return result

if __name__ == '__main__':
    main()
# end::SPINNER_ASYNC_START[]

who manages the event loop?
- for threads and processes, it’s the OS Scheduler
- for coroutines, it’s app-level event loop
  drives coroutines one by one, manages queue of pending coroutines, passes control back to corresponding coroutine when each event happens
  all of these execute in a single thread: event loop, library coroutines, user coroutines
  that’s why coroutines logic is blocking
Concurrency is achieved by control passing from one coroutine to another.
Python code using asyncio has only one flow of execution, unless you’ve explicitly started additional threads or processes.
means only one coroutine executes at any point in time.
Concurrency is achieved by control passing from one coroutine to another. This happens when we use the await keyword.
Remember when using asyncio coroutines, if we ever need some time for NOOPs, to use non-blocking sleep (asyncio.sleep(DELAY)) instead of blocking sleep (time.sleep())
explaining the example
- asyncio.run starts the event loop, drives the coroutine (supervisor) that sets other coroutines in motion.
  supervisor will block the main function until it’s done
  asyncio.run returns what supervisor returns
- await calls slow, blocks supervisor until slow returns
  I think it’s easier to see it as a control flow handover to slow. That’s why it’s blocking and that’s why when the control flow returns, we carry on with the assignment operator.
- Task.cancel() raises CancelledError inside the coro task

NOTE: if we directly invoke a coro like coro() it immediately returns (because it’s async) but doesn’t return the body of the coro function
the coro needs to be driven by an event loop.
We see 3 ways to run a coro (driven by an event loop):
1. asyncio.run(coro())
  - a regular function will call this
  - usually the first coro is the entry point, that supervisor
  - return value of run is whatever the body of coro returns
2. asyncio.create_task(coro())
  - called from a coroutine, returns a Task instance. Task wraps the coro and provides methods to control and query its state.
  - schedules another coroutine to be eventually run
  - does not suspend current coroutine
3. await coro()
  - transfers control from current coro to coro returned by coro()
  - suspends the current coro until the other coro returns
  - value of await expression is whatever the body of the coro returns

Supervisors Side-by-Side
- asyncio.Task vs threading.Thread (roughly equivalent)
  - Task trives a coroutine object, Thread invokes a callable
  - yielding control: coroutine yields explicitly with await
  - we don’t instantiate Task objects ourselves , we get them by using asyncio.create_task()
  - explicit scheduling:
    - create_task gives a Task object that is already waiting to run, Thread instance must be explicitly told to run via start
  - Termination:
    - threads can’t be terminated from the outside, we can only pass in a signal (eg. setting done in Event)
    - tasks Task.cancel() can be cancelled from the outside, raises CancelledError at the await expression where the coro body is currently suspended
      this can happen because coros are always in-sync because only one of them is running at any time, that’s why the outside can come and cancel it vs outside suggesting to terminate via a signal.
- Instead of holding locks to synchronize the operations of multiple threads, coroutines are “synchronized” by definition: only one of them is running at any time.
  coroutines, code is protected against interruption by default because we’re in charge of driving the event loop

The Real Impact of the GIL

Quick Quiz

the main question here is that are the mechanisms interruptable by the entity that coordinates the control flow.
processes are controlled by OS scheduler so this is interruptable $\implies$ the multiprocessing version will still carry on as usual
threads are controlled by the OS scheduler as well and the GIL lock can be released at a default interval, so this is useful to us $\implies$ the threading approach will not have a noticeable difference.
this has negligible effect only because the number of threads were minimal (2). If any more, it may be visible.

the asyncio coroutine version will be blocked by this compute-intensive call.

we can try doing this hack though: make the is_prime a coroutine and await asyncio.sleep(0) to yield control flow.

This is slow though

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
  # spinner_prime_async_nap.py

  # credits: Example by Luciano Ramalho inspired by
  # Michele Simionato's multiprocessing example in the python-list:
  # https://mail.python.org/pipermail/python-list/2009-February/675659.html

  import asyncio
  import itertools
  import math
  import functools

  # tag::PRIME_NAP[]
  async def is_prime(n):
      if n < 2:
          return False
      if n == 2:
          return True
      if n % 2 == 0:
          return False

      root = math.isqrt(n)
      for i in range(3, root + 1, 2):
          if n % i == 0:
              return False
          if i % 100_000 == 1:
              await asyncio.sleep(0)  # <1>
      return True
  # end::PRIME_NAP[]


  async def spin(msg: str) -> None:
      for char in itertools.cycle(r'\|/-'):
          status = f'\r{char} {msg}'
          print(status, flush=True, end='')
          try:
              await asyncio.sleep(.1)
          except asyncio.CancelledError:
              break
      blanks = ' ' * len(status)
      print(f'\r{blanks}\r', end='')

  async def check(n: int) -> int:
      return await is_prime(n)

  async def supervisor(n: int) -> int:
      spinner = asyncio.create_task(spin('thinking!'))
      print('spinner object:', spinner)
      result = await check(n)
      spinner.cancel()
      return result

  def main() -> None:
      n = 5_000_111_000_222_021
      result = asyncio.run(supervisor(n))
      msg = 'is' if result else 'is not'
      print(f'{n:,} {msg} prime')

  if __name__ == '__main__':
      main()

Using await asyncio.sleep(0) should be considered a stopgap measure before you refactor your asynchronous code to delegate CPU-intensive computations to another process.

A Homegrown Process Pool

Process-Based Solution
- starts a number of worker processes equal to the number of CPU cores, as determined by multiprocessing.cpu_count()
- some overhead in spinning up processes and in inter-process communication

Understanding the Elapsed Times

Code for the Multicore Prime Checker

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/env python3

"""
procs.py: shows that multiprocessing on a multicore machine
can be faster than sequential code for CPU-intensive work.
"""

# tag::PRIMES_PROC_TOP[]
import sys
from time import perf_counter
from typing import NamedTuple
from multiprocessing import Process, SimpleQueue, cpu_count  # <1> use the SimpleQueue to build a queue
from multiprocessing import queues  # <2> use queues.SimpleQueue for typing

from primes import is_prime, NUMBERS

class PrimeResult(NamedTuple):  # <3>
    n: int
    prime: bool
    elapsed: float

JobQueue = queues.SimpleQueue[int]  # <4> TypeAlias for a simple queue to send numbers to the procs that will do the job

ResultQueue = queues.SimpleQueue[PrimeResult]  # <5> TypeAlias for simple queue for building the results

def check(n: int) -> PrimeResult:  # <6>
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

def worker(jobs: JobQueue, results: ResultQueue) -> None:  # <7> gets a queue to read from (jobs) and a queue to write to (results)
    while n := jobs.get():  # <8> uses n = 0 as the poison pill
        results.put(check(n))  # <9> primality check then enqueue the result
    results.put(PrimeResult(0, False, 0.0))  # <10> signals that the worker is done

def start_jobs(
    procs: int, jobs: JobQueue, results: ResultQueue  # <11> procs: number of parallel prime checks
) -> None:
    for n in NUMBERS:
        jobs.put(n)  # <12> enqueue number to be checked
    for _ in range(procs):
        proc = Process(target=worker, args=(jobs, results))  # <13> Fork a child proc for each worker, runs until it fetches a 0 from jobs queue
        proc.start()  # <14> starting the child proc
        jobs.put(0)  # <15> poision pill it after starting, will be read after all the actual jobs get read
# end::PRIMES_PROC_TOP[]

# tag::PRIMES_PROC_MAIN[]
def main() -> None:
    if len(sys.argv) < 2:  # <1>
        procs = cpu_count()
    else:
        procs = int(sys.argv[1])

    print(f'Checking {len(NUMBERS)} numbers with {procs} processes:')
    t0 = perf_counter()
    jobs: JobQueue = SimpleQueue()  # <2>
    results: ResultQueue = SimpleQueue()
    start_jobs(procs, jobs, results)  # <3> starts the workers
    checked = report(procs, results)  # <4>
    elapsed = perf_counter() - t0
    print(f'{checked} checks in {elapsed:.2f}s')  # <5>

def report(procs: int, results: ResultQueue) -> int: # <6>
    checked = 0
    procs_done = 0
    while procs_done < procs:  # <7>
        n, prime, elapsed = results.get()  # <8>
        if n == 0:  # <9>
            procs_done += 1
        else:
            checked += 1  # <10>
            label = 'P' if prime else ' '
            print(f'{n:16}  {label} {elapsed:9.6f}s')
    return checked

if __name__ == '__main__':
    main()
# end::PRIMES_PROC_MAIN[]

when delegating computing to threads / procs, code doesn’t call the worker function directly
the worker is driven by the thread or proc library
the worker eventually produces a result that is stored somewhere
worker coordination & result collection are common uses of queues in concurrent programming
IDIOM: loops, sentinels and poison pills:
- worker function useful for showing common concurrent programming pattern:
  - we loop indefinitely while taking items from a queue and processing each with a fn that does the actual work (check)
  - we end the loop when the queue produces a sentinel value
    the sentinel value that shuts down a worker is often called a poison pill
    - TRICK/IDIOM: poison pilling to signal the worker to finish
      notice the use of the poison-pill in point 8 of the code above
    - common sentinels: (here’s a comment thread on sentinels)
      - None, but may not work if the data stream legitimately may produce None
      - object() is a common sentinel but Python objects must be serialised for IPC, so when we pickle.dump and pickle.load and object, the unpickled instance is distinct from the original and doesn’t compare equal.
      - ⭐️ ... Ellipsis builtin is a good option, it will survive serialisation without losing its identity.
Debugging concurrent code is always hard, and debugging multiprocessing is even harder because of all the complexity behind the thread-like façade.

Experimenting with More or Fewer Processes
- typically after the number of cores available to us, we should expect runtime to increase because of CPU Contention

Thread-Based Nonsolution
- Due to the GIL and the compute-intensive nature of is_prime, the threaded version is slower than the sequential code
  it gets slower as the number of threads increase, because of CPU contention and the cost of context switching.
  OS contention: all the stack frame changes required is what causes the extra overhead
- KIV managing threads and processes using concurrent.futures (chapter 20) and doing async programming using asyncio (chapter 21)

Python in the Multicore World

GIL makes the interpreter faster when running on a single core, and its implementation simpler. It was a no-brainer when CPU performance didn’t hinge on concurrency.
Despite the GIL, Python is thriving in applications that require concurrent or parallel execution, thanks to libraries and software architectures that work around the limitations of CPython.

System Administration
- use cases: manage hardware like NAS, use it for SDN (software defined networking), hacking
- python scripts help with these tasks, commanding remote machines $\implies$ aren’t really CPU bound operations $\implies$ Threads & Coroutines are Good for this
  we can use the concurrent futures to perform the same operation on multiple remote machines at the same time without much complexity

Data Science
- compute-intensive applications, supported by an ecosystem of libs that can leverage multicore machines, GPUs / distribued parallel computing in heterogeneous clusters
- some libs:
  - project jupyter
  - tensorflow (Google) and pytorch (Facebook)
  - dask: parallel computing lib to cordinate work on clusters

Server-Side Web/Mobile Development
- both for app caches and HTTP caches (CDNs)

WSGI Application Servers
- WSGI a standard API for a Python framework or application to receive requests from an HTTP server and send responses to it.
  WSGI apps manage one or more procs running your application, maximising the use of available CPUs
- main point: all of these application servers can potentially use all CPU cores on the server by forking multiple Python processes to run traditional web apps written in good old sequential code in Django, Flask, Pyramid, etc. This explains why it’s been possible to earn a living as a Python web developer without ever studying the threading, multiprocessing, or asyncio modules: the application server handles concurrency transparently.

Distributed Task Queues
- Distributed Task Queues wrap a message queue and offer a high-level API for delegating tasks to workers, possibly running on different machines.
  use cases:
  - run background jobs
  - trigger jobs after responding to the web request
  - async retries to ensure something is done
  - scheduled jobs
  e.g. Django view handler produces job requests which are put in the queue to be consumed by one or more PDF rendering processes
- Supports horizontal scalability
  producers and consumers are decoupled
- I’ve used Celery before!!

Chapter Summary

the demo on the effect of the GIL
demonstrated graphically that CPU-intensive functions must be avoided in asyncio, as they block the event loop.
the prime demo highlighted the difference between multiprocessing and threading, proving that only processes allow Python to benefit from multicore CPUs.
GIL makes threads worse than sequential code for heavy computations.

Chapter 20. Concurrent Executors

concurrent.futures.Executor classes that encapsulate the pattern of “spawning a bunch of independent threads and collecting the results in a queue,” described by Michele Simionato.
can be used with threads as well as processes
introduces futures, similar to JS promises. futures are the low level objects here
this chapter is more demo, less theoretical

What’s New in This Chapter

Concurrent Web Downloads

the concurrent scripts are about 5x faster
typically when well done, concurrent scripts can outpace sequential ones by a factor of 20x or more
TRICK: I didn’t know that the HTTPX library is more modern and the go-to vs requests lib. HTTPX gives both async and sync functions but requests will only give sync versions.
for server-side, servers that may be hit by many clients, there is a difference between what concurrency primitive we use (threading vs coroutines):
coroutines scale better because they use much less memory than threads, and also reduce the cost of context switching

A Sequential Download Script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#!/usr/bin/env python3

"""Download flags of top 20 countries by population

Sequential version

Sample runs (first with new domain, so no caching ever)::

    $ ./flags.py
    BD BR CD CN DE EG ET FR ID IN IR JP MX NG PH PK RU TR US VN
    20 downloads in 26.21s
    $ ./flags.py
    BD BR CD CN DE EG ET FR ID IN IR JP MX NG PH PK RU TR US VN
    20 downloads in 14.57s


"""

# tag::FLAGS_PY[]
import time
from pathlib import Path
from typing import Callable

import httpx  # <1> non stdlib import, conventionally comes after stdlib imports

POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
            'MX PH VN ET EG DE IR TR CD FR').split()  # <2>

BASE_URL = 'https://www.fluentpython.com/data/flags'  # <3>
DEST_DIR = Path('downloaded')                         # <4>

def save_flag(img: bytes, filename: str) -> None:     # <5> saving bytes to file
    (DEST_DIR / filename).write_bytes(img)

def get_flag(cc: str) -> bytes:  # <6> downloads the thing, returns byte contents of the response
    url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
    resp = httpx.get(url, timeout=6.1,       # <7> good to have timeouts if we are making blocking calls like in this demo
                     follow_redirects=True)  # <8>
    resp.raise_for_status()  # <9> prevents silent failures because of non 2XX responses
    return resp.content

def download_many(cc_list: list[str]) -> int:  # <10> sequential version, to be compared across the other examples
    for cc in sorted(cc_list):                 # <11> to observe that the order will be preserved
        image = get_flag(cc)
        save_flag(image, f'{cc}.gif')
        print(cc, end=' ', flush=True)         # <12> the flush is to flush the print buffer
    return len(cc_list)

def main(downloader: Callable[[list[str]], int]) -> None:  # <13> allows the downloader to be injectable, to be used for the other examples
    DEST_DIR.mkdir(exist_ok=True)                          # <14> create dir if necessary
    t0 = time.perf_counter()                               # <15>
    count = downloader(POP20_CC)
    elapsed = time.perf_counter() - t0
    print(f'\n{count} downloads in {elapsed:.2f}s')

if __name__ == '__main__':
    main(download_many)     # <16>
# end::FLAGS_PY[]

Downloading with concurrent.futures

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python3

"""Download flags of top 20 countries by population

ThreadPoolExecutor version

Sample run::

    $ python3 flags_threadpool.py
    DE FR BD CN EG RU IN TR VN ID JP BR NG MX PK ET PH CD US IR
    20 downloads in 0.35s

"""

# tag::FLAGS_THREADPOOL[]
from concurrent import futures

from flags import save_flag, get_flag, main  # <1> reusing things

def download_one(cc: str):  # <2> single downloader, this is what each worker will execute
    image = get_flag(cc)
    save_flag(image, f'{cc}.gif')
    print(cc, end=' ', flush=True)
    return cc

def download_many(cc_list: list[str]) -> int:
    with futures.ThreadPoolExecutor() as executor:         # <3> ThreadPoolExecutor is the context manager here, exit method will be blocking until all threads are done
        res = executor.map(download_one, sorted(cc_list))  # <4> map is similar in style as map builtin, returns a generator that we have to iterate to get the value from each function call

    return len(list(res))                                  # <5>

if __name__ == '__main__':
    main(download_many)  # <6>
# end::FLAGS_THREADPOOL[]

The context manager is ThreadPoolExecutor, the executor.__exit__ method will call executor.shutdown(wait=True) and this is blocking until all the threads are done.
executor.map() similar to map builtin,
the function is called concurrently from multiple threads
it returns a generator that we need to iterate to retrieve the value returned by each function call
any exceptions from a particular call will also be within this.
concurrent.futures makes it easy for us to add concurrent execution atop legacy sequential code
Other useful args to ThreadPoolExecutor:
- max_workers
  the default is max_workers = min(32, os.cpu_count() + 4) the extra ones are for I/O-BOUND tasks
  Also it will try to reuse idle workers instead of using new workers. (lmao meeting rooms II leetcode question be like)

Where Are the Futures?

purpose: an instance of either Future class represents a deferred computation that may or may not have completed.
like Promise in JS
both async frameworks give us futures: concurrent.futures.Future and asyncio.Future
allows us to put them in queues and check if they’re done
HOWEVER, it is the job of the concurrency framework to handle futures, WE DON’T create them directly. This is because a future represents something that will eventually run, so it must be scheduled to run and that’s the role of the framework
e.g. Executor.submit(<callable>) does the scheduling and returns a Future
Who can change the state of a future?
Only the concurrency framework, never the application code.
We are NOT in control of the state of a future.
push/pull method to determine completion:
pull: Future.done() where the applogic keeps polling
push: Future.add_done_callback() to register a callback that will be invoked when the future is done. NOTE: the callback callable will run in the same worker thread or process that ran the function wrapped in the future.
futures have a result()
- when done, it works the same for both libs
- when not done, it works differently for the two libs:
  =concurrency.futures.Future: calling f.result() will block the caller’s thread until the result is ready (we can pass a timeout to avoid infinite blocking)

demo:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  #!/usr/bin/env python3

  """Download flags of top 20 countries by population

  ThreadPoolExecutor example with ``as_completed``.
  """
  from concurrent import futures

  from flags import main
  from flags_threadpool import download_one


  # tag::FLAGS_THREADPOOL_AS_COMPLETED[]
  def download_many(cc_list: list[str]) -> int:
      cc_list = cc_list[:5]  # <1> smaller sample
      with futures.ThreadPoolExecutor(max_workers=3) as executor:  # <2> attempt to see pending futures in the output
          to_do: list[futures.Future] = []
          for cc in sorted(cc_list):  # <3>
              future = executor.submit(download_one, cc)  # <4> schedules the  callable to be executed, returns a future representing this pending operation
              to_do.append(future)  # <5> just storing it for inspection
              print(f'Scheduled for {cc}: {future}')  # <6> we'll see something like this: Scheduled for BR: <Future at 0x100791518 state=running>

          for count, future in enumerate(futures.as_completed(to_do), 1):  # <7> yields futures as they are completed
              res: str = future.result()  # <8> retrieving the result
              print(f'{future} result: {res!r}')  # <9> will look something like this: IN <Future at 0x101807080 state=finished returned str> result: 'IN'

      return count
  # end::FLAGS_THREADPOOL_AS_COMPLETED[]

  if __name__ == '__main__':
      main(download_many)

In this example, because we’re getting the futures from as_completed, when we call future.result(), it will never be blocking.

Launching Processes with concurrent.futures

Both ProcessPoolExecutor and ThreadPoolExecutor implement the Executor interface
this allows us to switch from thread-based to process-based concurrency using concurrent.futures
so we can use process-based primitives just like we can use thread-based primitives, we just have to call a different pool executor
main usecase for process-based is for CPU-intensive jobs
Using process-based allows us to go around the GIL and use multiple CPU cores to simplify
Remember processes use more memory and take longer to start than threads
Main usecase for thread-based is I/O intensive applications.

Multicore Prime Checker Redux

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/env python3

"""
proc_pool.py: a version of the proc.py example from chapter 20,
but using `concurrent.futures.ProcessPoolExecutor`.
"""

# tag::PRIMES_POOL[]
import sys
from concurrent import futures  # <1> no need to import the lower level abstractions (multiprocessing, SimpleQueue)
from time import perf_counter
from typing import NamedTuple

from primes import is_prime, NUMBERS

class PrimeResult(NamedTuple):  # <2> we don't need the queues or the worker function anymore
    n: int
    flag: bool
    elapsed: float

def check(n: int) -> PrimeResult:
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

def main() -> None:
    if len(sys.argv) < 2:
        workers = None      # <3> setting to None allows the ProcessPoolExecutor decide for us
    else:
        workers = int(sys.argv[1])

    executor = futures.ProcessPoolExecutor(workers)  # <4> built executor so that we have access the acutal workers selected
    actual_workers = executor._max_workers  # type: ignore  # <5>

    print(f'Checking {len(NUMBERS)} numbers with {actual_workers} processes:')

    t0 = perf_counter()

    numbers = sorted(NUMBERS, reverse=True)  # <6>
    with executor:  # <7> we use the executor as the context manager
        for n, prime, elapsed in executor.map(check, numbers):  # <8> returns PrimeResult instances that =check= returns in the same order as the numbers argument
            label = 'P' if prime else ' '
            print(f'{n:16}  {label} {elapsed:9.6f}s')

    time = perf_counter() - t0
    print(f'Total time: {time:.2f}s')

if __name__ == '__main__':
    main()
# end::PRIMES_POOL[]

the use of executor.map() will block until all child processes are done. It preserves the order in which they were spawned.
blocking overall but not individually, that’s why the rest return almost instantly

Experimenting with Executor.map

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
Experiment with ``ThreadPoolExecutor.map``
"""
# tag::EXECUTOR_MAP[]
from time import sleep, strftime
from concurrent import futures

def display(*args):  # <1> just echoes back with timestamp
    print(strftime('[%H:%M:%S]'), end=' ')
    print(*args)

def loiter(n):  # <2>
    msg = '{}loiter({}): doing nothing for {}s...'
    display(msg.format('\t'*n, n, n))
    sleep(n)
    msg = '{}loiter({}): done.'
    display(msg.format('\t'*n, n))
    return n * 10  # <3>

def main():
    display('Script starting.')
    executor = futures.ThreadPoolExecutor(max_workers=3)  # <4> 3 threads
    results = executor.map(loiter, range(5))  # <5> the first 3 tasks will start immediately (nonblocking)
    display('results:', results)  # <6>
    display('Waiting for individual results:')
    for i, result in enumerate(results):  # <7>
        display(f'result {i}: {result}')

if __name__ == '__main__':
    main()
# end::EXECUTOR_MAP[]

the display will be seen to get updated incrementally.
enumerate call in the for loop will implicitly invoke next(results), which in turn will invoke _f.result() on the (internal) _f future representing the first call, loiter(0)
the _f.result() will block unti the future is done
Executor.map() will block until all the jobs are done.
Alternatively, to make it more JIT, we can use Executor.submit and futures.as_completed
TRICK : This is more flexible than executor.map because you can submit different callables and arguments, while executor.map is designed to run the same callable on the different arguments.
TRICK: we can pass futures to futures.as_completed such that the futures come from different pool executors (including different type of pool executors)

Downloads with Progress Display and Error Handling

common functions just a reference on the support code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
  """Utilities for second set of flag examples.
  """

  import argparse
  import string
  import sys
  import time
  from collections import Counter
  from enum import Enum
  from pathlib import Path

  DownloadStatus = Enum('DownloadStatus', 'OK NOT_FOUND ERROR')

  POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
              'MX PH VN ET EG DE IR TR CD FR').split()

  DEFAULT_CONCUR_REQ = 1
  MAX_CONCUR_REQ = 1

  SERVERS = {
      'REMOTE': 'https://www.fluentpython.com/data/flags',
      'LOCAL':  'http://localhost:8000/flags',
      'DELAY':  'http://localhost:8001/flags',
      'ERROR':  'http://localhost:8002/flags',
  }
  DEFAULT_SERVER = 'LOCAL'

  DEST_DIR = Path('downloaded')
  COUNTRY_CODES_FILE = Path('country_codes.txt')


  def save_flag(img: bytes, filename: str) -> None:
      (DEST_DIR / filename).write_bytes(img)


  def initial_report(cc_list: list[str],
                     actual_req: int,
                     server_label: str) -> None:
      if len(cc_list) <= 10:
          cc_msg = ', '.join(cc_list)
      else:
          cc_msg = f'from {cc_list[0]} to {cc_list[-1]}'
      print(f'{server_label} site: {SERVERS[server_label]}')
      plural = 's' if len(cc_list) != 1 else ''
      print(f'Searching for {len(cc_list)} flag{plural}: {cc_msg}')
      if actual_req == 1:
          print('1 connection will be used.')
      else:
          print(f'{actual_req} concurrent connections will be used.')


  def final_report(cc_list: list[str],
                   counter: Counter[DownloadStatus],
                   start_time: float) -> None:
      elapsed = time.perf_counter() - start_time
      print('-' * 20)
      plural = 's' if counter[DownloadStatus.OK] != 1 else ''
      print(f'{counter[DownloadStatus.OK]:3} flag{plural} downloaded.')
      if counter[DownloadStatus.NOT_FOUND]:
          print(f'{counter[DownloadStatus.NOT_FOUND]:3} not found.')
      if counter[DownloadStatus.ERROR]:
          plural = 's' if counter[DownloadStatus.ERROR] != 1 else ''
          print(f'{counter[DownloadStatus.ERROR]:3} error{plural}.')
      print(f'Elapsed time: {elapsed:.2f}s')


  def expand_cc_args(every_cc: bool,
                     all_cc: bool,
                     cc_args: list[str],
                     limit: int) -> list[str]:
      codes: set[str] = set()
      A_Z = string.ascii_uppercase
      if every_cc:
          codes.update(a+b for a in A_Z for b in A_Z)
      elif all_cc:
          text = COUNTRY_CODES_FILE.read_text()
          codes.update(text.split())
      else:
          for cc in (c.upper() for c in cc_args):
              if len(cc) == 1 and cc in A_Z:
                  codes.update(cc + c for c in A_Z)
              elif len(cc) == 2 and all(c in A_Z for c in cc):
                  codes.add(cc)
              else:
                  raise ValueError('*** Usage error: each CC argument '
                                   'must be A to Z or AA to ZZ.')
      return sorted(codes)[:limit]


  def process_args(default_concur_req):
      server_options = ', '.join(sorted(SERVERS))
      parser = argparse.ArgumentParser(
          description='Download flags for country codes. '
                      'Default: top 20 countries by population.')
      parser.add_argument(
          'cc', metavar='CC', nargs='*',
          help='country code or 1st letter (eg. B for BA...BZ)')
      parser.add_argument(
          '-a', '--all', action='store_true',
          help='get all available flags (AD to ZW)')
      parser.add_argument(
          '-e', '--every', action='store_true',
          help='get flags for every possible code (AA...ZZ)')
      parser.add_argument(
          '-l', '--limit', metavar='N', type=int, help='limit to N first codes',
          default=sys.maxsize)
      parser.add_argument(
          '-m', '--max_req', metavar='CONCURRENT', type=int,
          default=default_concur_req,
          help=f'maximum concurrent requests (default={default_concur_req})')
      parser.add_argument(
          '-s', '--server', metavar='LABEL', default=DEFAULT_SERVER,
          help=f'Server to hit; one of {server_options} '
               f'(default={DEFAULT_SERVER})')
      parser.add_argument(
          '-v', '--verbose', action='store_true',
          help='output detailed progress info')
      args = parser.parse_args()
      if args.max_req < 1:
          print('*** Usage error: --max_req CONCURRENT must be >= 1')
          parser.print_usage()
          # "standard" exit status codes:
          # https://stackoverflow.com/questions/1101957/are-there-any-standard-exit-status-codes-in-linux/40484670#40484670
          sys.exit(2)  # command line usage error
      if args.limit < 1:
          print('*** Usage error: --limit N must be >= 1')
          parser.print_usage()
          sys.exit(2)  # command line usage error
      args.server = args.server.upper()
      if args.server not in SERVERS:
          print(f'*** Usage error: --server LABEL '
                f'must be one of {server_options}')
          parser.print_usage()
          sys.exit(2)  # command line usage error
      try:
          cc_list = expand_cc_args(args.every, args.all, args.cc, args.limit)
      except ValueError as exc:
          print(exc.args[0])
          parser.print_usage()
          sys.exit(2)  # command line usage error

      if not cc_list:
          cc_list = sorted(POP20_CC)[:args.limit]
      return args, cc_list


  def main(download_many, default_concur_req, max_concur_req):
      args, cc_list = process_args(default_concur_req)
      actual_req = min(args.max_req, max_concur_req, len(cc_list))
      initial_report(cc_list, actual_req, args.server)
      base_url = SERVERS[args.server]
      DEST_DIR.mkdir(exist_ok=True)
      t0 = time.perf_counter()
      counter = download_many(cc_list, base_url, args.verbose, actual_req)
      final_report(cc_list, counter, t0)

Error Handling in the flags2 Examples

sequential version Uses a sequential HTTPX client

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
  #!/usr/bin/env python3

  """Download flags of countries (with error handling).

  Sequential version

  Sample run::

      $ python3 flags2_sequential.py -s DELAY b
      DELAY site: http://localhost:8002/flags
      Searching for 26 flags: from BA to BZ
      1 concurrent connection will be used.
      --------------------
      17 flags downloaded.
      9 not found.
      Elapsed time: 13.36s

  """

  # tag::FLAGS2_BASIC_HTTP_FUNCTIONS[]
  from collections import Counter
  from http import HTTPStatus

  import httpx
  import tqdm  # type: ignore  # <1>

  from flags2_common import main, save_flag, DownloadStatus  # <2> get the commons

  DEFAULT_CONCUR_REQ = 1
  MAX_CONCUR_REQ = 1

  def get_flag(base_url: str, cc: str) -> bytes:
      url = f'{base_url}/{cc}/{cc}.gif'.lower()
      resp = httpx.get(url, timeout=3.1, follow_redirects=True)
      resp.raise_for_status()  # <3> raises if HTTP status code not in range(200, 300)
      return resp.content

  def download_one(cc: str, base_url: str, verbose: bool = False) -> DownloadStatus:
      try:
          image = get_flag(base_url, cc)
      except httpx.HTTPStatusError as exc:  # <4> handles the 404 errors specifically
          res = exc.response
          if res.status_code == HTTPStatus.NOT_FOUND:
              status = DownloadStatus.NOT_FOUND  # <5> replaces it with an internal download status
              msg = f'not found: {res.url}'
          else:
              raise  # <6> re-propagate any other errors other than 404
      else:
          save_flag(image, f'{cc}.gif')
          status = DownloadStatus.OK
          msg = 'OK'

      if verbose:  # <7> verbosity flag
          print(cc, msg)

      return status
  # end::FLAGS2_BASIC_HTTP_FUNCTIONS[]

  # tag::FLAGS2_DOWNLOAD_MANY_SEQUENTIAL[]
  def download_many(cc_list: list[str],
                    base_url: str,
                    verbose: bool,
                    _unused_concur_req: int) -> Counter[DownloadStatus]:
      counter: Counter[DownloadStatus] = Counter()  # <1> to tally the download outcomes
      cc_iter = sorted(cc_list)  # <2>
      if not verbose:
          cc_iter = tqdm.tqdm(cc_iter)  # <3> tqdm returns an iterator yielding the items in cc_iter and also animating the progress bar
      for cc in cc_iter:
          try:
              status = download_one(cc, base_url, verbose)  # <4> successive calls to the singular function
          except httpx.HTTPStatusError as exc:  # <5> the non 404 errors handled here
              error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
              error_msg = error_msg.format(resp=exc.response)
          except httpx.RequestError as exc:  # <6>
              error_msg = f'{exc} {type(exc)}'.strip()
          except KeyboardInterrupt:  # <7> manging keyboard interrupts
              break
          else:  # <8> clear the error msg if there's no error that came down
              error_msg = ''

          if error_msg:
              status = DownloadStatus.ERROR  # <9> local status check based on the internal enum
          counter[status] += 1           # <10>
          if verbose and error_msg:      # <11>
              print(f'{cc} error: {error_msg}')

      return counter  # <12>
  # end::FLAGS2_DOWNLOAD_MANY_SEQUENTIAL[]

  if __name__ == '__main__':
      main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)

Using futures.as_completed

threadpool

Uses concurrent HTTP client based on futures.ThreadPoolExecutor to show error handling

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
  #!/usr/bin/env python3

  """Download flags of countries (with error handling).

  ThreadPool version

  Sample run::

      $ python3 flags2_threadpool.py -s ERROR -e
      ERROR site: http://localhost:8003/flags
      Searching for 676 flags: from AA to ZZ
      30 concurrent connections will be used.
      --------------------
      150 flags downloaded.
      361 not found.
      165 errors.
      Elapsed time: 7.46s

  """

  # tag::FLAGS2_THREADPOOL[]
  from collections import Counter
  from concurrent.futures import ThreadPoolExecutor, as_completed

  import httpx
  import tqdm  # type: ignore

  from flags2_common import main, DownloadStatus
  from flags2_sequential import download_one  # <1>

  DEFAULT_CONCUR_REQ = 30  # <2> defaults for max num of concurrent requests, size of threadpool
  MAX_CONCUR_REQ = 1000  # <3> max num concurrent reqs


  def download_many(cc_list: list[str],
                    base_url: str,
                    verbose: bool,
                    concur_req: int) -> Counter[DownloadStatus]:
      counter: Counter[DownloadStatus] = Counter()
      with ThreadPoolExecutor(max_workers=concur_req) as executor:  # <4>
          to_do_map = {}  # <5> maps each Future instance (representing one download) with the cc for error reporting
          for cc in sorted(cc_list):  # <6> response order is more based on timing of the HTTP responses more  so than anything
              future = executor.submit(download_one, cc,
                                       base_url, verbose)  # <7> each submission does the scheduling and returns a Future
              to_do_map[future] = cc  # <8>  Future instances are hashable
          done_iter = as_completed(to_do_map)  # <9> returns an iterator that yields futures as each task is done
          if not verbose:
              done_iter = tqdm.tqdm(done_iter, total=len(cc_list))  # <10> wrap the iterator within the progress bar
          for future in done_iter:  # <11> iterates on futures as they are completed
              try:
                  status = future.result()  # <12> this could have been blocking but NOT in this case because it's handled by the as_completed()
              except httpx.HTTPStatusError as exc:  # <13> error handling
                  error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                  error_msg = error_msg.format(resp=exc.response)
              except httpx.RequestError as exc:
                  error_msg = f'{exc} {type(exc)}'.strip()
              except KeyboardInterrupt:
                  break
              else:
                  error_msg = ''

              if error_msg:
                  status = DownloadStatus.ERROR
              counter[status] += 1
              if verbose and error_msg:
                  cc = to_do_map[future]  # <14>
                  print(f'{cc} error: {error_msg}')

      return counter


  if __name__ == '__main__':
     main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
  # end::FLAGS2_THREADPOOL[]

NOTE: Future instances are hashable, that’s why we can use it as keys within a dictionary
IDIOM: use a map to store futures to do follow up processing
building a dict to map each future to other data that may be useful when the future is completed.

asyncio

Concurrent HTTPX client

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
  #!/usr/bin/env python3

  """Download flags of countries (with error handling).

  asyncio async/await version

  """
  # tag::FLAGS2_ASYNCIO_TOP[]
  import asyncio
  from collections import Counter
  from http import HTTPStatus
  from pathlib import Path

  import httpx
  import tqdm  # type: ignore

  from flags2_common import main, DownloadStatus, save_flag

  # low concurrency default to avoid errors from remote site,
  # such as 503 - Service Temporarily Unavailable
  DEFAULT_CONCUR_REQ = 5
  MAX_CONCUR_REQ = 1000

  async def get_flag(client: httpx.AsyncClient,  # <1>
                     base_url: str,
                     cc: str) -> bytes:
      url = f'{base_url}/{cc}/{cc}.gif'.lower()
      resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2>
      resp.raise_for_status()
      return resp.content

  async def download_one(client: httpx.AsyncClient,
                         cc: str,
                         base_url: str,
                         semaphore: asyncio.Semaphore,
                         verbose: bool) -> DownloadStatus:
      try:
          async with semaphore:  # <3>
              image = await get_flag(client, base_url, cc)
      except httpx.HTTPStatusError as exc:  # <4>
          res = exc.response
          if res.status_code == HTTPStatus.NOT_FOUND:
              status = DownloadStatus.NOT_FOUND
              msg = f'not found: {res.url}'
          else:
              raise
      else:
          await asyncio.to_thread(save_flag, image, f'{cc}.gif')  # <5>
          status = DownloadStatus.OK
          msg = 'OK'
      if verbose and msg:
          print(cc, msg)
      return status
  # end::FLAGS2_ASYNCIO_TOP[]

  # tag::FLAGS2_ASYNCIO_START[]
  async def supervisor(cc_list: list[str],
                       base_url: str,
                       verbose: bool,
                       concur_req: int) -> Counter[DownloadStatus]:  # <1>
      counter: Counter[DownloadStatus] = Counter()
      semaphore = asyncio.Semaphore(concur_req)  # <2>
      async with httpx.AsyncClient() as client:
          to_do = [download_one(client, cc, base_url, semaphore, verbose)
                   for cc in sorted(cc_list)]  # <3>
          to_do_iter = asyncio.as_completed(to_do)  # <4>
          if not verbose:
              to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5>
          error: httpx.HTTPError | None = None  # <6>
          for coro in to_do_iter:  # <7>
              try:
                  status = await coro  # <8>
              except httpx.HTTPStatusError as exc:
                  error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                  error_msg = error_msg.format(resp=exc.response)
                  error = exc  # <9>
              except httpx.RequestError as exc:
                  error_msg = f'{exc} {type(exc)}'.strip()
                  error = exc  # <10>
              except KeyboardInterrupt:
                  break
              else:
                  error = None

              if error:
                  status = DownloadStatus.ERROR  # <11>
                  if verbose:
                      url = str(error.request.url)  # <12>
                      cc = Path(url).stem.upper()   # <13>
                      print(f'{cc} error: {error_msg}')
              counter[status] += 1

      return counter

  def download_many(cc_list: list[str],
                    base_url: str,
                    verbose: bool,
                    concur_req: int) -> Counter[DownloadStatus]:
      coro = supervisor(cc_list, base_url, verbose, concur_req)
      counts = asyncio.run(coro)  # <14>

      return counts

  if __name__ == '__main__':
      main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
  # end::FLAGS2_ASYNCIO_START[]

Chapter Summary

Chapter 21. Asynchronous Programming

async constructs
Objects supporting async constructs
- includes other constructs enables by the async/await keywords: async generator functions, async comprehensions, async genexps
  these aren’t tied to asyncio!
async libraries like asyncio

What’s New in This Chapter

A Few Definitions

native coroutines
only defined using async def
delegation from coroutine to coroutine only done using await, not necessary that it MUST delegate
classic coroutines
actually a generator function that consumes data (data that is sent to it via my_coro.send(data) calls)
can delegate to other classic coroutines using yield from. Ref “Meaning of yield from”
no longer supported by asyncio and doesn’t support await keyword
generator-based coroutines (decorated using @types.coroutine)
a decorated generator function (@types.coroutine), which makes the generator compatible with await keyword
this is NOT supportd by asyncio, but used in low-level code in other frameworks like Curio and Trio
async generator (function)
generator function defined with async def that uses yield in its body
returns an async generator object that provides __anext__, which is a coroutine method to retrieve the next item.

An asyncio Example: Probing Domains

async operations are interleaved $\implies$ the total time is practically the same as the time for the single slowest DNS response, instead of the sum of the times of all responses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env python3
import asyncio
import socket
from keyword import kwlist

MAX_KEYWORD_LEN = 4  # <1>


async def probe(domain: str) -> tuple[str, bool]:  # <2> returns tuple of domain name and bool
    loop = asyncio.get_running_loop()  # <3> have a ref to the =asyncio= event loop, so that we can use it
    try:
        await loop.getaddrinfo(domain, None)  # <4> await on coroutine that returns a 5-part tuple of parameters. We don't directly use that here because if it resolves means it's a DNS resolution
    except socket.gaierror: # get addr info err
        return (domain, False)
    return (domain, True)


async def main() -> None:  # <5> has to be a coroutine so that we can use =await= within it
    names = (kw for kw in kwlist if len(kw) <= MAX_KEYWORD_LEN)  # <6> gen
    domains = (f'{name}.dev'.lower() for name in names)  # <7> gen
    coros = [probe(domain) for domain in domains]  # <8> build list of coros
    for coro in asyncio.as_completed(coros):  # <9> generator that yields coroutines that return the results of the coros passed to it in the order they are completed (not order of submission), similar to =futures.as_completed=
        domain, found = await coro  # <10> this await is non-blocking because it's guarded by the as_completed above
        mark = '+' if found else ' '
        print(f'{mark} {domain}')


if __name__ == '__main__':
    asyncio.run(main())  # <11> starts the event loop until the loop exits

loop.getaddrinfo() is the async version of socket.getaddrinfo()
this returns a 5-part tuples of params to connect to the given address using a socket
asyncio.get_running_loop is designed to be used from within coroutines.
If no running event loops, then it raises a RuntimeError. The event loop should have already been started prior to execution reaching there.
for coro in asyncio.as_completed(coros):
the asyncio.as_completed(coros) generator that yields coroutines that return the results of the coros passed to it in the order they are completed (not order of submission), similar to futures.as_completed
the await coro is non-blocking because it’s guarded by the as_completed above
if coro raises an exception, then it gets re-raised here
event loop:
- started using asyncio.run()
- IDIOM: for scripts, the common pattern is to make the main function a coroutine as well. The main coroutine is driven with asyncio.run()

Guido’s Trick to Read Asynchronous Code
- squint and pretend the async and await keywords are not there. If you do that, you’ll realize that coroutines read like plain old sequential functions.

New Concept: Awaitable

await expression:
- uses the yield from implementation with an extra step of validating its argument
- only accepts an awaitable
for $\rightarrow$ iterables, await $\rightarrow$ awaitables
from asyncio, we typically work with these awaitables:
- a native coroutine object that we get by calling a native coroutine function e.g. coro() where coro is the coroutine function
- asyncio.Task that we get when we create a task from a coroutine object to asyncio.create_task()
  - remember that the coro_obj = coro(), so the overall call is usually asyncio.creat_task(one_coro()), note the invocation of the native coroutine function
  - Whether to keep a handle to the task or not depends on whether we need to use it (e.g. to cancel the task or wait for it)
- lower-level awaitables: (something we might encouter if we work with lower level abstractions)
  - an obj with __await__ method that returns an iterator (e.g. asyncio.Future, by the way, asyncio.Task <: asyncio.Future)
  - objs written in other langs that use the Python/C API with a tp_as_async.am_wait function, returning an iterator (similar to __await__ method)
  - soon to be deprecated: generator-based-coroutine objects

Downloading with asyncio and HTTPX

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/usr/bin/env python3

"""Download flags of top 20 countries by population

asyncio + aiottp version

Sample run::

    $ python3 flags_asyncio.py
    EG VN IN TR RU ID US DE CN MX JP BD NG ET FR BR PH PK CD IR
    20 flags downloaded in 1.07s
"""
# tag::FLAGS_ASYNCIO_TOP[]
import asyncio

from httpx import AsyncClient  # <1> have to install httpx

from flags import BASE_URL, save_flag, main  # <2>

async def download_one(client: AsyncClient, cc: str):  # <3> has to be a native coro so that we can await on get_flag
    image = await get_flag(client, cc)
    save_flag(image, f'{cc}.gif')
    print(cc, end=' ', flush=True)
    return cc

async def get_flag(client: AsyncClient, cc: str) -> bytes:  # <4> needs the client to make the http request
    url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
    resp = await client.get(url, timeout=6.1,
                                  follow_redirects=True)  # <5> get method also returns a ClientResponse that is an async context manager, the network I/O is drive async via the =asyncio= event loop
    return resp.read()  # <6> the body is just lazily fetched from the response object. This fully consumes the response body into memory.
# end::FLAGS_ASYNCIO_TOP[]

# tag::FLAGS_ASYNCIO_START[]
def download_many(cc_list: list[str]) -> int:    # <1> has t obe a plain function and not a coroutine (to be called by main function from the other modules)
    return asyncio.run(supervisor(cc_list))      # <2> event loop is executed, which drives the supervisor(cc_list) coroutine obj until that coroutine returns. Remember the coroutine fn was supervisor, which , when called, returns a coroutine object. Also this is a blocking call that blocks the event loop, until the coroutine returns.

async def supervisor(cc_list: list[str]) -> int:
    async with AsyncClient() as client:          # <3> an async context manager (AsyncClient) is being used here
        to_do = [download_one(client, cc)
                 for cc in sorted(cc_list)]      # <4> build list of coros here
        res = await asyncio.gather(*to_do)       # <5> pass the awaitables so that they can be gathered after completion, so that we get a list of results. Gathers in the order of submission of the coros.

    return len(res)                              # <6> supervisor returns length of list

if __name__ == '__main__':
    main(download_many)
# end::FLAGS_ASYNCIO_START[]

asyncio directly supports TCP and UDP, without relying on external packages
res = await asyncio.gather(*to_do):
Here, we pass the awaitables so that they can be gathered after completion, so that we get a list of results. Gathers in the order of submission of the coros.

AsyncClient is the async context manager that is used here. It’s a context manager that has async setup and teardown functions KIV

In this snippet of the get_flags coroutine:

1
2
3
4
5
  async def get_flag(client: AsyncClient, cc: str) -> bytes:  # <4> needs the client to make the http request
      url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
      resp = await client.get(url, timeout=6.1,
                                    follow_redirects=True)  # <5> get method also returns a ClientResponse that is an async context manager, the network I/O is drive async via the =asyncio= event loop
      return resp.read()  # <6> the body is just lazily fetched from the response object. This fully consumes the response body into memory.

Implicit delegation of coroutines via async context managers:
get method of an httpx.AsyncClient instance returns a ClientResponse object that is also an asynchronous context manager.
this is an awaitable that returns a Response
by the way, Response can also be used as a context manager when streaming! If it was, then resp.read() would have been an an I/O operation that may yield to the event loop again if it’s attempting to drain the response body stream from the socket

the await yields control flow to the event loop while the network I/O happens (DNS resolution, TCP connect, handshake, waiting for response headers). During that suspension, other tasks can run.
so by the end of point 5, resp is a proper Response object and not a coroutine. The connection is ready.
LANG_LIMITATION: However, asyncio does not provide an asynchronous filesystem API at this time like Node.js does.
there’s OS-level support for it (io_uring on Linux), but nothing that supports this for python’s stdlib/asyncio

The Secret of Native Coroutines: Humble Generators
- classic vs native coroutines: the native ones don’t rely on a visible .send() call or yield expressions
- mechanistic model for async programs and how they drive async libraries:
  Here, we see how in an async program:
  - a user’s function starts the event loop, scheduling an initial coroutine with asyncio.run
  - Each user’s coroutine drives the next with an await expression, which is when the control flow is yielded to the next coroutine
    this forms a channel that enables communication between a library like HTTPX and the event loop.
    await chain eventually reaches a low-level awaitable, which returns a generator that the event loop can drive in response to events such as timers or network I/O. The low-level awaitables and generators at the end of these await chains are implemented deep into the libraries, are not part of their APIs, and may be Python/C extensions.
- await borrows most of its implementation from yield from (classic coroutines), which also makes .send calls to drive coroutines.
- functions like asyncio.gather and asyncio.create_task, you can start multiple concurrent await channels, enabling concurrent execution of multiple I/O operations driven by a single event loop, in a single thread.

The All-or-Nothing Problem
- had to replace I/O functions with their async versions so that they could be activated with await or asyncio.create_task
- if no choice, have to delegate to separate thread/proc
  If you can’t rewrite a blocking function as a coroutine, you should run it in a separate thread or process

Asynchronous Context Managers via `async with`

asynchronous context managers: objects implementing the __aenter__ and __aexit__ methods as coroutines.

Enhancing the asyncio Downloader

caution: asyncio vs threading asyncio can send requests faster, so more likely to get suspected of ddos by the HTTP server.

Using asyncio.as_completed and a Thread

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
#!/usr/bin/env python3

"""Download flags of countries (with error handling).

asyncio async/await version

"""
# tag::FLAGS2_ASYNCIO_TOP[]
import asyncio
from collections import Counter
from http import HTTPStatus
from pathlib import Path

import httpx
import tqdm  # type: ignore

from flags2_common import main, DownloadStatus, save_flag

# low concurrency default to avoid errors from remote site,
# such as 503 - Service Temporarily Unavailable
DEFAULT_CONCUR_REQ = 5
MAX_CONCUR_REQ = 1000

async def get_flag(client: httpx.AsyncClient,  # <1> similar to the sequential version, just that here it requires a client param
                   base_url: str,
                   cc: str) -> bytes:
    url = f'{base_url}/{cc}/{cc}.gif'.lower()
    resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2> we await the coroutine from client.get()
    resp.raise_for_status()
    return resp.content

async def download_one(client: httpx.AsyncClient,
                       cc: str,
                       base_url: str,
                       semaphore: asyncio.Semaphore,
                       verbose: bool) -> DownloadStatus:
    try:
        async with semaphore:  # <3> semaphore as an asynchronous context manager so that the program as a whole is not blocked; only this coroutine is suspended when the semaphore counter is zero.
            image = await get_flag(client, base_url, cc)
    except httpx.HTTPStatusError as exc:  # <4> familiar error handling logic
        res = exc.response
        if res.status_code == HTTPStatus.NOT_FOUND:
            status = DownloadStatus.NOT_FOUND
            msg = f'not found: {res.url}'
        else:
            raise
    else:
        await asyncio.to_thread(save_flag, image, f'{cc}.gif')  # <5> FileSystem I/O, don't let it block us by running it in a thread
        status = DownloadStatus.OK
        msg = 'OK'
    if verbose and msg:
        print(cc, msg)
    return status
# end::FLAGS2_ASYNCIO_TOP[]

# tag::FLAGS2_ASYNCIO_START[]
async def supervisor(cc_list: list[str],
                     base_url: str,
                     verbose: bool,
                     concur_req: int) -> Counter[DownloadStatus]:  # <1> since it's a coroutine, it can't be invoked directly from main.
    counter: Counter[DownloadStatus] = Counter()
    semaphore = asyncio.Semaphore(concur_req)  # <2> creates the semaphore to be shared across the coros we will have
    async with httpx.AsyncClient() as client:
        to_do = [download_one(client, cc, base_url, semaphore, verbose)
                 for cc in sorted(cc_list)]  # <3> list of coro objs, one per call to download_one coro fn
        to_do_iter = asyncio.as_completed(to_do)  # <4> get an iter, receives in the order of completion, allows the iter to be wrapped by tqdm
        if not verbose:
            to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5> wrap iter w tqdm
        error: httpx.HTTPError | None = None  # <6> init error
        for coro in to_do_iter:  # <7> iter over completed coro objs
            try:
                status = await coro  # <8> this is a nonblocking await because implicitly guarded by the =as_completed=
            except httpx.HTTPStatusError as exc:
                error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                error_msg = error_msg.format(resp=exc.response)
                error = exc  # <9> to preserve the exc
            except httpx.RequestError as exc:
                error_msg = f'{exc} {type(exc)}'.strip()
                error = exc  # <10> preserve the exc
            except KeyboardInterrupt:
                break
            else:
                error = None

            if error:
                status = DownloadStatus.ERROR  # <11> user internal error enum
                if verbose:
                    url = str(error.request.url)  # <12>
                    cc = Path(url).stem.upper()   # <13>
                    print(f'{cc} error: {error_msg}')
            counter[status] += 1

    return counter

def download_many(cc_list: list[str],
                  base_url: str,
                  verbose: bool,
                  concur_req: int) -> Counter[DownloadStatus]:
    coro = supervisor(cc_list, base_url, verbose, concur_req)
    counts = asyncio.run(coro)  # <14> drives the event loop, passes coro to event loop and returns when the event loop ends.

    return counts

if __name__ == '__main__':
    main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
# end::FLAGS2_ASYNCIO_START[]

the asyncio.semaphore is being used as an asynchronous context manager so that the program as a whole is not blocked; only this coroutine is suspended when the semaphore counter is zero.
notice how we delegate the File I/O in point 5 to a threadpool provided by asyncio using asyncio.to_thread, we just await it and yield the control flow to allow other threads to carry on

Throttling Requests with a Semaphore
- throwback to OS mods in school, semaphore numbered “mutex” $\implies$ more flexibilty than just a binary mutex lock.
- we can share the semaphore between multiple coroutines with a configured max number in order to throttle our Network I/O
- why? because we should avoid spamming a server with too many concurrent requests $\implies$ we need to throttle the Network I/O
- previously, we did the throttling in a coarse manner by setting the max_workers for the download_many in the demo code
- Python’s Semaphores
  - all the 3 different concurrency structures (threading, multiprocessing, asyncio) have their own semaphore classes
  - initial value set @ point of creating the semaphore, semaphore is passed to every coroutine that needs to rely on it to synchronize semaphore = asyncio.Semaphore(concur_req)
  - semaphore decrements when we await on .acquire() coroutine, increments when we call release() method (non blocking, not a coroutine)
  - if not ready (count = 0), =.acquire() suspends the awaiting coroutine until some other coroutine calls .release() on the same Semaphore, thus incrementing the counter.
  - asyncio.Semaphore used as an async context manager:
    - instead of using semaphore.acquire() and semaphore.release() directly, we can rely on the async context manager to acquire (Semaphore.__aenter__ coroutine method await for .acquire()) and release the semaphore (Semaphore.__aexit__ calls .release())
    - this guarantees that no more than concur_req instances of get_flags coroutines will be active at any time

Making Multiple Requests for Each Download

our objective now is to make 2 callbacks per country. In a sequential pattern, it would have been to just call one after the other. The async version isn’t directly the same.
We can drive the asynchronous requests one after the other, sharing the local scope of the driving coroutine.
here’s the v3 using asyncio
some changes:
1. new coroutine get_country is a new coroutine for the .json fetch
2. download_one we now use await to delegate to get_flag and the new get_country

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
#!/usr/bin/env python3

"""Download flags of countries (with error handling).

asyncio async/await version

"""
# tag::FLAGS2_ASYNCIO_TOP[]
import asyncio
from collections import Counter
from http import HTTPStatus
from pathlib import Path

import httpx
import tqdm  # type: ignore

from flags2_common import main, DownloadStatus, save_flag

# low concurrency default to avoid errors from remote site,
# such as 503 - Service Temporarily Unavailable
DEFAULT_CONCUR_REQ = 5
MAX_CONCUR_REQ = 1000

async def get_flag(client: httpx.AsyncClient,  # <1>
                   base_url: str,
                   cc: str) -> bytes:
    url = f'{base_url}/{cc}/{cc}.gif'.lower()
    resp = await client.get(url, timeout=3.1, follow_redirects=True)   # <2>
    resp.raise_for_status()
    return resp.content

# tag::FLAGS3_ASYNCIO_GET_COUNTRY[]
async def get_country(client: httpx.AsyncClient,
                      base_url: str,
                      cc: str) -> str:    # <1> returns string with country name
    url = f'{base_url}/{cc}/metadata.json'.lower()
    resp = await client.get(url, timeout=3.1, follow_redirects=True)
    resp.raise_for_status()
    metadata = resp.json()  # <2> is a python dict
    return metadata['country']  # <3> erturns the country name
# end::FLAGS3_ASYNCIO_GET_COUNTRY[]

# tag::FLAGS3_ASYNCIO_DOWNLOAD_ONE[]
async def download_one(client: httpx.AsyncClient,
                       cc: str,
                       base_url: str,
                       semaphore: asyncio.Semaphore,
                       verbose: bool) -> DownloadStatus:
    try:
        async with semaphore:  # <1> hold the semaphore to =await= (delegate) to =get_flag=
            image = await get_flag(client, base_url, cc)
        async with semaphore:  # <2> hold the semaphore again to delegate to the next step
            country = await get_country(client, base_url, cc)
    except httpx.HTTPStatusError as exc:
        res = exc.response
        if res.status_code == HTTPStatus.NOT_FOUND:
            status = DownloadStatus.NOT_FOUND
            msg = f'not found: {res.url}'
        else:
            raise
    else:
        filename = country.replace(' ', '_')  # <3>
        await asyncio.to_thread(save_flag, image, f'{filename}.gif')
        status = DownloadStatus.OK
        msg = 'OK'
    if verbose and msg:
        print(cc, msg)
    return status
# end::FLAGS3_ASYNCIO_DOWNLOAD_ONE[]

# tag::FLAGS2_ASYNCIO_START[]
async def supervisor(cc_list: list[str],
                     base_url: str,
                     verbose: bool,
                     concur_req: int) -> Counter[DownloadStatus]:  # <1>
    counter: Counter[DownloadStatus] = Counter()
    semaphore = asyncio.Semaphore(concur_req)  # <2>
    async with httpx.AsyncClient() as client:
        to_do = [download_one(client, cc, base_url, semaphore, verbose)
                 for cc in sorted(cc_list)]  # <3>
        to_do_iter = asyncio.as_completed(to_do)  # <4>
        if not verbose:
            to_do_iter = tqdm.tqdm(to_do_iter, total=len(cc_list))  # <5>
        error: httpx.HTTPError | None = None  # <6>
        for coro in to_do_iter:  # <7>
            try:
                status = await coro  # <8>
            except httpx.HTTPStatusError as exc:
                error_msg = 'HTTP error {resp.status_code} - {resp.reason_phrase}'
                error_msg = error_msg.format(resp=exc.response)
                error = exc  # <9>
            except httpx.RequestError as exc:
                error_msg = f'{exc} {type(exc)}'.strip()
                error = exc  # <10>
            except KeyboardInterrupt:
                break

            if error:
                status = DownloadStatus.ERROR  # <11>
                if verbose:
                    url = str(error.request.url)  # <12>
                    cc = Path(url).stem.upper()   # <13>
                    print(f'{cc} error: {error_msg}')
            counter[status] += 1

    return counter

def download_many(cc_list: list[str],
                  base_url: str,
                  verbose: bool,
                  concur_req: int) -> Counter[DownloadStatus]:
    coro = supervisor(cc_list, base_url, verbose, concur_req)
    counts = asyncio.run(coro)  # <14>

    return counts

if __name__ == '__main__':
    main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)
# end::FLAGS2_ASYNCIO_START[]

NOTE: point 1 & 2 in download_one: it’s good practice to hold semaphores and locks for the shortest possible time.

One challenge is to know when you have to use await and when you can’t use it.
The answer in principle is easy: you await coroutines and other awaitables, such as asyncio.Task instances.
Reality is that the APIs can be confusingly named e.g. StreamWriter

Delegating Tasks to Executors

problem: unlike NodeJS where ALL I/O has async APIs, python doesn’t have async APIs for all I/O. Notably, File I/O is NOT async.
This means that in our async code, file I/O can severly bottleneck performance if the main thread is blocked.

delegating to an executor is a good idea then

we can use asyncio.to_thread e.g. await asyncio.to_thread(save_flag, image, f'{cc}.gif')

under the hood, it uses loop.run_in_executor, so the equivalent to the above statement would be:

1
2
3
    loop = asyncio.get_running_loop() # gets a reference to the event loop
    loop.run_in_executor(None, save_flag, image, f'{cc}.gif')
    # 1st Arg: Executor to use. None => default => ThreadPoolExecutor (always available in asyncio event loop)

when using run_in_executor, the 1st Arg is the Executor to use. None $\implies$ default $\implies$ ThreadPoolExecutor (always available in asyncio event loop)

CAUTION: this accepts positional args, have to use functool.partial if we wish to use kwargs. Or just use the newer asyncio.to_thread which will accept kwargs.

IDIOM: this is a common pattern in async APIs:
wrap blocking calls that are implementation details in coroutines using run_in_executor internally. That way, you provide a consistent interface of coroutines to be driven with await, and hide the threads you need to use for pragmatic reasons.
loop.run_in_executor’s explicit Executor allows us to use process-based approach for CPU-intensive tasks so that it’s a different python process and we avoid the GIL contention.
TRICK / IDIOM: prime the ProcessPoolExecutor in the supervisor and then pass it to the coroutines that need it to reduce the effect of the high startup costs

WARNING / LANG_LIMITATION: Coroutines that use executors give the pretense of cancellation because the underlying thread/proc has no cancellation mechanism.
Using run_in_executor can produce hard-to-debug problems since cancellation doesn’t work the way one might expect. Coroutines that use executors give merely the pretense of cancellation: the underlying thread (if it’s a ThreadPoolExecutor) has no cancellation mechanism.
For example, a long-lived thread that is created inside a run_in_executor call may prevent your asyncio program from shutting down cleanly:
asyncio.run will wait for the executor to fully shut down before returning, and it will wait forever if the executor jobs don’t stop somehow on their own.
My greybeard inclination is to want that function to be namedrun_in_executor_uncancellable.

Writing asyncio Servers

A FastAPI Web Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from pathlib import Path
from unicodedata import name

from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel

from charindex import InvertedIndex

STATIC_PATH = Path(__file__).parent.absolute() / 'static'  # <1> beautiful overloading of =/= for pathlib

app = FastAPI(  # <2> defines the ASGI app, params are for autogen docs
    title='Mojifinder Web',
    description='Search for Unicode characters by name.',
)

class CharName(BaseModel):  # <3> pydantic schema for runtime type checking
    char: str
    name: str

def init(app):  # <4> attach to app state for later use
    app.state.index = InvertedIndex()
    app.state.form = (STATIC_PATH / 'form.html').read_text()

init(app)  # <5>

@app.get('/search', response_model=list[CharName])  # <6> search endpoint, response_model uses the CharName pydantic model to describe the response format
async def search(q: str):  # <7> non-path params within the coro signature
    chars = sorted(app.state.index.search(q))
    return ({'char': c, 'name': name(c)} for c in chars)  # <8> an iterable of dicts compatible with response_model schema => FastAPI can build the json response accoding to the response model that we supplied in the @app.get decorator

@app.get('/', response_class=HTMLResponse, include_in_schema=False)
def form():  # <9> can use regular functions to handle endpoints as well, not just coros
    return app.state.form

# no main funcion  # <10>

endpoint handlers can be coros or plain functions like we see here.
there’s no main function, it’s loaded and driven by the ASGI server (uvicorn).
we don’t have return type hints here because we allow the pydantic schema to do the job
this is like schema casting when defining changesets in elixir
model is declared in this parameter instead of as a function return type annotation, because the path function may not actually return that response model but rather return a dict, database object or some other model, and then use the response_model to perform the field limiting and serialization.
response_model in FastAPI + Pydantic plays the role of both serialization and field-whitelisting — taking arbitrary Python objects/dicts and producing clean, predictable outputs according to the model definition

by the way the inverted index was implemened like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#!/usr/bin/env python

"""
Class ``InvertedIndex`` builds an inverted index mapping each word to
the set of Unicode characters which contain that word in their names.

Optional arguments to the constructor are ``first`` and ``last+1``
character codes to index, to make testing easier. In the examples
below, only the ASCII range was indexed.

The `entries` attribute is a `defaultdict` with uppercased single
words as keys::

    >>> idx = InvertedIndex(32, 128)
    >>> idx.entries['DOLLAR']
    {'$'}
    >>> sorted(idx.entries['SIGN'])
    ['#', '$', '%', '+', '<', '=', '>']
    >>> idx.entries['A'] & idx.entries['SMALL']
    {'a'}
    >>> idx.entries['BRILLIG']
    set()

The `.search()` method takes a string, uppercases it, splits it into
words, and returns the intersection of the entries for each word::

    >>> idx.search('capital a')
    {'A'}

"""

import sys
import unicodedata
from collections import defaultdict
from collections.abc import Iterator

STOP_CODE: int = sys.maxunicode + 1

Char = str
Index = defaultdict[str, set[Char]]


def tokenize(text: str) -> Iterator[str]:
    """return iterator of uppercased words"""
    for word in text.upper().replace('-', ' ').split():
        yield word


class InvertedIndex:
    entries: Index

    def __init__(self, start: int = 32, stop: int = STOP_CODE):
        entries: Index = defaultdict(set)
        for char in (chr(i) for i in range(start, stop)):
            name = unicodedata.name(char, '')
            if name:
                for word in tokenize(name):
                    entries[word].add(char)
        self.entries = entries

    def search(self, query: str) -> set[Char]:
        if words := list(tokenize(query)):
            found = self.entries[words[0]]
            return found.intersection(*(self.entries[w] for w in words[1:]))
        else:
            return set()


def format_results(chars: set[Char]) -> Iterator[str]:
    for char in sorted(chars):
        name = unicodedata.name(char)
        code = ord(char)
        yield f'U+{code:04X}\t{char}\t{name}'


def main(words: list[str]) -> None:
    if not words:
        print('Please give one or more words to search.')
        sys.exit(2)  # command line usage error
    index = InvertedIndex()
    chars = index.search(' '.join(words))
    for line in format_results(chars):
        print(line)
    print('─' * 66, f'{len(chars)} found')


if __name__ == '__main__':
    main(sys.argv[1:])

An asyncio TCP Server (no deps, just asyncio streams)

this demo is one where we use plain TCP to comms with a telnet/netcat client and using asyncio directly without any external dependencies!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/env python3

# tag::TCP_MOJIFINDER_TOP[]
import asyncio
import functools
import sys
from asyncio.trsock import TransportSocket
from typing import cast

from charindex import InvertedIndex, format_results  # <1> formatting useful for TUI via CLI telnet session

CRLF = b'\r\n'
PROMPT = b'?> '

async def finder(index: InvertedIndex,          # <2> server expects a coro / function that only takes in teh reader and writer args. That's why we need to wrap it up in a partial
                 reader: asyncio.StreamReader,
                 writer: asyncio.StreamWriter) -> None:
    client = writer.get_extra_info('peername')  # <3> remote client addr
    while True:  # <4> handles a dialog until we get a control char (see break stmt below)
        writer.write(PROMPT)  # can't await!  # <5> this is not a CORO, just a plain function
        await writer.drain()  # must await!  # <6> flushes writer buffer, it's a coro that's why needs to be driven with =await=
        data = await reader.readline()  # <7> coro that returns bytes
        if not data:  # <8> no bytes => client closed the connection ==> break the loop
            break
        try:
            query = data.decode().strip()  # <9> byte to string decoding
        except UnicodeDecodeError:  # <10> replace with null char for simplicity (e.g. when keyboard interrupt then we get control bytes that can't be decoded into str)
            query = '\x00'
        print(f' From {client}: {query!r}')  # <11> log stmt
        if query:
            if ord(query[:1]) < 32:  # <12> kill loop if control or nullchar
                break
            results = await search(query, index, writer)  # <13> delegate to searching coro
            print(f'   To {client}: {results} results.')  # <14>

    writer.close()  # <15> close the writer steram
    await writer.wait_closed()  # <16> wait for closing of stream
    print(f'Close {client}.')  # <17>log
# end::TCP_MOJIFINDER_TOP[]

# tag::TCP_MOJIFINDER_SEARCH[]
async def search(query: str,  # <1> has to be a coro because we have to write to a StreamWriter and use its =.drain()= coro method
                 index: InvertedIndex,
                 writer: asyncio.StreamWriter) -> int:
    chars = index.search(query)  # <2> query inverted index
    lines = (line.encode() + CRLF for line  # <3> genexp gives char, name and CRLF
                in format_results(chars))
    writer.writelines(lines)  # <4> SURPRISE! this is NOT a coro
    await writer.drain()      # <5> SURPRISE! this is a coro
    status_line = f'{"─" * 66} {len(chars)} found'  # <6> status line to be written
    writer.write(status_line.encode() + CRLF)
    await writer.drain()
    return len(chars)
# end::TCP_MOJIFINDER_SEARCH[]

# tag::TCP_MOJIFINDER_MAIN[]
async def supervisor(index: InvertedIndex, host: str, port: int) -> None:
    server = await asyncio.start_server(    # <1> gets an instance of the server, creates and starts it so that it's ready to receive conns
        functools.partial(finder, index),   # <2> =client_connected_cb=, a cb that is either a fn/coro needs to be supplied a stream reader and stream writer
        host, port)                         # <3>

    socket_list = cast(tuple[TransportSocket, ...], server.sockets)  # <4> because typeshed type is outdated
    addr = socket_list[0].getsockname()
    print(f'Serving on {addr}. Hit CTRL-C to stop.')  # <5>
    await server.serve_forever()  # <6> suspends the supervisor. without this supervisor returns immediately

def main(host: str = '127.0.0.1', port_arg: str = '2323'):
    port = int(port_arg)
    print('Building index.')
    index = InvertedIndex()                         # <7> index gets built
    try:
        asyncio.run(supervisor(index, host, port))  # <8> starts the event loop that will drive the supervisor coro
    except KeyboardInterrupt:                       # <9> catch CTRL-C
        print('\nServer shut down.')

if __name__ == '__main__':
    main(*sys.argv[1:])
# end::TCP_MOJIFINDER_MAIN[]

IDIOM @ finder point number 2;
Use functools.partial to bind that parameter and obtain a callable that takes the reader and writer. Adapting user functions to callback APIs is the most common use case for functools.partial
how multiple clients can be served at once:
While the event loop is alive, a new instance of the finder coroutine will be started for each client that connects to the server.
how the keyboard interrupt works
the interrupt signal will cause the raising of KeyboardInterrupt exception from within the supervisor::server.serve_forever.
event loop dies also.
This propagates out into the main function that had been driving the event loop.
GOTCHA: StreamWriter.write is not a coro, StreamWriter.drain is a coro
some of the I/O methods are coroutines and must be driven with await, while others are simple functions. For example, StreamWriter.write is a plain function, because it writes to a buffer. On the other hand, StreamWriter.drain — which flushes the buffer and performs the network I/O — is a coroutine, as is StreamReader.readline —but not StreamWriter.writelines!

Asynchronous Iteration and Asynchronous Iterables and using `async for`

async with $\implies$ works with Async Context Managers
async for $\implies$ asynchronous iterables:
- __aiter__ that returns an async iterator BUT __aiter__ is NOT as coro method, it’s a regular method
async iterator provides __anext__ coro method that returns an awaitable, usually a coro object. Just like the sync counterparts, expected to implement __aiter__ which trivially returns self
Remember same point about NOT mixing iterables and iterators

example: aiopg async postgres driver :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  async def go():
          pool = await aiopg.create_pool(dsn)

          async with pool.acquire() as conn:
                  async with conn.cursor() as cur: # the cursor is the async iterator here
                          await cur.execute("SELECT 1")
                          ret = []
                          async for row in cur: # important to NOT block the event loop while cursor may be waiting for additional rows
                                  ret.append(row)
                          assert ret == [(1,)]

By implementing the cursor as an asynchronous iterator, aiopg may yield to the event loop at each __anext__ call, and resume later when more rows arrive from PostgreSQL.

Asynchronous Generator Functions

Implementing and Using an async generator

Implementing an Async Iterator
- class-implementation for async iterator: implement a class with __anext__ and __aiter__
- simpler way to implement an async iterator: as a generator function that is async $\implies$ async generator
  write a function declared with async def and use yield in its body. This parallels how generator functions simplify the classic Iterator pattern.
Usage of async generators:
- Async generators can be used with async for $\Leftarrow$ driven by async for:
  - as a block statement
  - as async comprehensions
- We can’t use typical for loops because async generators implement __aiter__ and NOT __iter__

Demo example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  import asyncio
  import socket
  from collections.abc import Iterable, AsyncIterator
  from typing import NamedTuple, Optional


  class Result(NamedTuple):  # <1> convenience: easier to read and debug
      domain: str
      found: bool


  OptionalLoop = Optional[asyncio.AbstractEventLoop]  # <2> typealias to clean up the hinting below


  async def probe(domain: str, loop: OptionalLoop = None) -> Result:  # <3>
      if loop is None: # no current event loop handle in scope
          loop = asyncio.get_running_loop()
      try:
          await loop.getaddrinfo(domain, None)
      except socket.gaierror:
          return Result(domain, False)
      return Result(domain, True)


  async def multi_probe(domains: Iterable[str]) -> AsyncIterator[Result]:  # <4> Async Generator function returns an async generator object, that's why it's typed like that
      loop = asyncio.get_running_loop()
      coros = [probe(domain, loop) for domain in domains]  # <5> list of proble coros
      for coro in asyncio.as_completed(coros):  # <6> this is a classic generator, that's why we can drive it using =for= and not =async for=
          result = await coro  # <7> guarded by the =as_completed= not to worry that it will be actually blocking.
          yield result  # <8> this is what makes multiproble an async generator

The result is yielded by multi_probe, which is what makes multi_probe an async generator
Shortcut to the for loop:
1 2
for coro in asyncio.as_completed(coros): yield await coro

TRICK: The .invalid top-level domain is reserved for testing.

see elaboration here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
    Yes, the statement is **true**.

    The **`.invalid` top-level domain (TLD) is reserved specifically for testing and use in examples or scenarios where a guaranteed invalid domain is needed**. It is defined as a special-use domain name by the Internet Engineering Task Force (IETF) in [RFC 2606 (1999)](https://www.rfc-editor.org/rfc/rfc2606.html) and officially reserved by the Internet Assigned Numbers Authority (IANA).

    ### Key points on `.invalid` TLD reservation:

    - The `.invalid` TLD **cannot appear in the global DNS root zone** to avoid conflicts with existing or future valid TLDs.
​    - It is intended to be used in tests, documentation, or example scenarios where domain names must be constructed clearly as invalid or guaranteed to not resolve.
​    - Alongside `.invalid`, other reserved TLDs for similar "safe" use are `.test`, `.example`, and `.localhost`.
​    - Using `.invalid` in software or test settings helps catch or demonstrate domain resolution failures without accidentally affecting real domains.
​    - Because of this reservation, any use of `.invalid` as a real domain name should not expect it to resolve on the public internet.

    ### Supporting authoritative references:

    - **RFC 2606 (Reserved Top Level DNS Names)** states:
      > "`.invalid` is intended for use in online construction of domain names that are sure to be invalid and which it is obvious at a glance are invalid."
​    - [Wikipedia: .invalid](https://en.wikipedia.org/wiki/.invalid) also confirms this reservation by IETF for such use.
​    - IANA maintains `.invalid` as one of the reserved special-use domain names unlikely to ever be delegated.

    ### Summary table

    | Domain         | Purpose                             | Delegated in global DNS? | Use Case                                         |
    |----------------|-----------------------------------|-------------------------|-------------------------------------------------|
    | `.invalid`     | Reserved for invalid/test domains | No                      | Testing, documentation, avoiding domain clashes |
    | `.test`        | Reserved for testing               | No                      | Test environments                               |
    | `.example`     | Reserved for examples              | No                      | Documentation and examples                       |
    | `.localhost`   | Reserved for loopback services     | No                      | Localhost network reference                      |

    ***

    In conclusion, your quoted **TRICK** that `.invalid` is a top-level domain reserved for testing is **correct and reflects Internet standards**.

    If you want, I can provide more background on reserved TLDs, best practices for using them in networking or development, or how they differ from other special-use or reserved names. Just let me know!

    [1] https://en.wikipedia.org/wiki/.invalid
    [2] https://skynethosting.net/blog/what-is-invalid-tlds/
    [3] https://datatracker.ietf.org/doc/rfc2606/
    [4] https://www.rfc-editor.org/rfc/rfc2606.html
    [5] https://domaintyper.com/invalid-domain
    [6] https://stackoverflow.com/questions/4128351/is-there-a-valid-domain-name-guaranteed-to-be-unreachable
    [7] https://circleid.com/posts/20090618_most_popular_invalid_tlds_should_be_reserved
    [8] https://news.ycombinator.com/item?id=15268822
    [9] https://en.wikipedia.org/wiki/Top-level_domain

Using the async generator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    #!/usr/bin/env python3
    import asyncio
    import sys
    from keyword import kwlist

    from domainlib import multi_probe


    async def main(tld: str) -> None:
        tld = tld.strip('.')
        names = (kw for kw in kwlist if len(kw) <= 4)  # <1>
        domains = (f'{name}.{tld}'.lower() for name in names)  # <2>
        print('FOUND\t\tNOT FOUND')  # <3>
        print('=====\t\t=========')
        async for domain, found in multi_probe(domains):  # <4> async iterate over the async generator
            indent = '' if found else '\t\t'  # <5>
            print(f'{indent}{domain}')


    if __name__ == '__main__':
        if len(sys.argv) == 2:
            asyncio.run(main(sys.argv[1]))  # <6>
        else:
            print('Please provide a TLD.', f'Example: {sys.argv[0]} COM.BR')

Async generators as context managers

Generators (sync and async versions) have one extra use unrelated to iteration: they can be made into context managers.

We can use the @asynccontextmanager decorator within the contextlib module

Similar to its sync counterpart @contextmanager

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  from contextlib import asynccontextmanager

  @asynccontextmanager
  async def web_page(url): # the function to be decorated has to be an async generator
          loop = asyncio.get_running_loop()
          data = await loop.run_in_executor(
                  None, download_webpage, url) # we run in a separate thread in case this is a blocking function; keeps out event loop unblocked

          yield data # this makes it an async generator

          await loop.run_in_executor(None, update_stats, url)

  async with web_page('google.com') as data:
          process(data)

Outcome
similar to the sync version, all lines before the yield become the entry code, __aenter__ coro method of the async context manager that is built by the decorator. So, when control flow comes back to this, the value of data will be bound to the data target variable that is associated with the context manager below.
All lines after yield become the __aexit__ coro method. Another possibly blocking call is delegated to the thread executor.

Asynchronous generators versus native coroutines
- Similarities
  - async def for both
- Differences
  - async generator has a yield in its body but not a native coroutine
  - async generator can ONLY have empty return statements BUT a naive coro may return a value other than None
  - Async generators are NOT awaitable, they are iterables so are driven by async for or async comprehensions
    meanwwhile, native coros are awaitable. Therefore:
    - can be driven by await expressions
    - can be passed to asyncio functions that consume awaitables (e.g. create_task)

Async Comprehensions and Async Generator Expressions

Async generator expressions

Here’s how we can define and use one:

1
2
3
4
gen_found = (name async for name, found in multi_probe(names) if found) # the async genexpr builds the async generator (async iterator) obj

async for  name in gen_found: # driven by the async for
    print(name)

an asynchronous generator expression can be defined anywhere in your program, but it can only be consumed inside a native coroutine or asynchronous generator function.

Async comprehensions
- we can have the usual kind of comprehensions done async! just need to make sure that it’s within an async context i.e. within an async def or within an async REPL console.
- async listcomps: result = [i async for i in aiter() if i % 2] which is actually similar to asyncio.gather() just a little less flexible. gather function allows us to do better exception handling.
- async dictcomps: {name: found async for name, found in multi_probe(names)}
- async setcomps: {name for name in names if (await probe(name)).found}
  the extra parentheses is because __getattr__ operator, . has operator precedence there

async Beyond asyncio: Curio

async/await constructs are library agnostic

curio blogdom demo example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  #!/usr/bin/env python3
  from curio import run, TaskGroup
  import curio.socket as socket
  from keyword import kwlist

  MAX_KEYWORD_LEN = 4


  async def probe(domain: str) -> tuple[str, bool]:  # <1> no need to receive event loop
      try:
          await socket.getaddrinfo(domain, None)  # <2> getaddrinfo is top-level fn of the curio.socket, it's not a method of a loop object like it is in asyncio
      except socket.gaierror:
          return (domain, False)
      return (domain, True)

  async def main() -> None:
      names = (kw for kw in kwlist if len(kw) <= MAX_KEYWORD_LEN)
      domains = (f'{name}.dev'.lower() for name in names)
      async with TaskGroup() as group:  # <3> core concept in curio monitors and controls a group of tasks (coros)
          for domain in domains:
              await group.spawn(probe, domain)  # <4> we spawn to start a coro, managed by a particular TaskGroup instance. Coro is wrapped by a Task within the TaskGroup
          async for task in group:  # <5> yields as it's completed, like =as_completed=
              domain, found = task.result
              mark = '+' if found else ' '
              print(f'{mark} {domain}')

  if __name__ == '__main__':
      run(main())  # <6> sensible syntax

TaskGroup
- Curio TaskGroup is an asynchronous context manager that replaces several ad hoc APIs and coding patterns in asyncio.
- above we saw how we can just drive the group and we get things in the order of completion, analogous to asyncio.as_completed
- we can also gather them all easily:
1 2 3
async with TaskGroup(wait=all) as g: await g.spawn(coro1) await g.spawn(coro2)
TaskGroup as a support for structured concurrency:
- adds a constraint to concurrent programming:
  a group of async tasks should have a single entry and single exit point.
  as an asynchronous context manager, a TaskGroup ensures that all tasks spawned inside are completed or cancelled, and any exceptions raised, upon exiting the enclosed block.
- just like how structured programming advised against the use of GOTO statements
seems like asyncio has some partial support for structured concurrency since 3.11, e.g. with TaskGroups…
Curio also provides a UniversalQueue that can be used to coordinate the work among threads, Curio coroutines, and asyncio coroutines.

Type Hinting Asynchronous Objects

the return type of native coroutine == the type of result it spits out when you await on it
different from annotations for classic coroutines, where it’s the 3-paramed Generator type
3 points about typing:
- all the async objects are all covariant on the first type parameter, which is the type of the items yielded from these objects. Aligns with the “producer” / output types being covariant.
- AsyncGenerator and Coroutine are contravariant on the second to last parameter. That’s because they are output types and output types are contravariant.
- AsyncGenerator has no return type
  when we saw typing.Generator, we realised how we could return values by hacking the StopIteration(value) and that’s how generator-enhanced classic coroutines were hacked out, which is why we could make generators operate as classic coroutines and support yield from
  No such thing for AsyncGenerator
  AsyncGenerator objects don’t return values, and are completely separate from native coroutine objects, which are annotated with typing.Coroutine

How Async Works and How It Doesn’t

Running Circles Around Blocking Calls
- IO is god damn slow, if we async in a disciplined manner then our servers would be high-performance

The Myth of I/O-Bound Systems
- there are “I/O bound functions” but no “I/O bound systems”
- any nontrivial system will have CPU-bound functions, dealing with them is the key to success in async programming

Avoiding CPU-Bound Traps
- should have performance regression tests
- important with async code, but also relevant to threaded Python code because of the GIL
- we should not OBSERVE slowdown (by that time it’s too late) because the direct performance hit bad patterns are less likely to be humanly observable (until it’s too late).
- What to do if we see a CPU-hogging bottleneck:
  - delegate task to a python proc pool
  - delegate task to external task queue
  - avoid GIL constraints, rewrite code in Cython, C, Rust – anything that interfaces with the Python/C API
  - choose to do nothing

Chapter Summary

don’t block the event loop, delegate to different processing unit (thread, proc, task queue)

Part V. Metaprogramming

Chapter 22. Dynamic Attributes and Properties

What’s New in This Chapter

Data Wrangling with Dynamic Attributes

Exploring JSON-Like Data with Dynamic Attributes

The Invalid Attribute Name Problem

Flexible Object Creation with new

Computed Properties

Step 1: Data-Driven Attribute Creation

Step 2: Property to Retrieve a Linked Record

Step 3: Property Overriding an Existing Attribute

Step 4: Bespoke Property Cache

Step 5: Caching Properties with functools

Using a Property for Attribute Validation

LineItem Take #1: Class for an Item in an Order

LineItem Take #2: A Validating Property

A Proper Look at Properties

Properties Override Instance Attributes

Property Documentation

Coding a Property Factory

Handling Attribute Deletion

Essential Attributes and Functions for Attribute Handling

Special Attributes that Affect Attribute Handling

Built-In Functions for Attribute Handling

Special Methods for Attribute Handling

Chapter Summary

Chapter 23. Attribute Descriptors

What’s New in This Chapter

Descriptor Example: Attribute Validation

LineItem Take #3: A Simple Descriptor

LineItem Take #4: Automatic Naming of Storage Attributes

LineItem Take #5: A New Descriptor Type

Overriding Versus Nonoverriding Descriptors

Overriding Descriptors

Overriding Descriptor Without get

Nonoverriding Descriptor

Overwriting a Descriptor in the Class

Methods Are Descriptors

Descriptor Usage Tips

Descriptor Docstring and Overriding Deletion

Chapter Summary

Chapter 24. Class Metaprogramming

What’s New in This Chapter

Classes as Objects

type: The Built-In Class Factory

A Class Factory Function

Introducing init_subclass

Why init_subclass Cannot Configure slots

Enhancing Classes with a Class Decorator

What Happens When: Import Time Versus Runtime

Evaluation Time Experiments

Metaclasses 101

How a Metaclass Customizes a Class

A Nice Metaclass Example

Metaclass Evaluation Time Experiment

A Metaclass Solution for Checked

Metaclasses in the Real World

Modern Features Simplify or Replace Metaclasses

Metaclasses Are Stable Language Features

A Class Can Only Have One Metaclass

Metaclasses Should Be Implementation Details

A Metaclass Hack with prepare

Wrapping Up

Chapter Summary

Legend

In these notes, I add some tags once in a while:

TRICK: pythonic tricks
IDIOM: Pythonic Idioms that are great
sometimes, I also use the tag RECIPE for similar points.
TO_HABIT: for things I should add to my own habits when I’m writing python
MISCONCEPTION: some misconception that I had had.
LANG_LIMITATION: is a point that outlines a limitation in python

TODO pending tasks

TODO Skipped Parts of the Book `[0/2]`

[ ] ControlFlow::Chapter18::lis.py

[ ] Metaprogramming

TODO watch list:

[ ] what makes python awesome by Raymond Hettinger

[ ] “Fear adn Awaiting in Async” PyOhio 2016 by David Beazley

[ ] Advanced asyncio: Solving Real-world Production Problems using python (staff eng @ spotify)

TODO add in some recipes that will help for Leetcode `[0/1]`

[ ] Table slicing

Extras blog post for the book (ref)

this is where a bunch of extra material has been included that couldn’t be included in the book

GitHub organisation for the Fluent Python book

Videos:

design patterns in python from europython 2011 talk

Outline and Notes¶

Part I. Data Structures¶

Chapter 1. The Python Data Model¶

Seeing python as a “framework”¶

What’s New in This Chapter¶

A Pythonic Card Deck¶

Class Composition and how Delegation pattern in the data model helps¶

How Special Methods Are Used¶

Overview of Special Methods¶

Why len Is Not a Method¶

Chapter Summary¶

Further Reading¶

Python’s DataModel can be seen as a MetaObject Protocol¶

Chapter 2. An Array of Sequences¶

What’s New in This Chapter¶

Overview of Built-in Sequences¶

List Comprehensions and Generator Expressions¶

Tuples Are Not Just Immutable Lists¶

Unpacking Sequences and Iterables¶

Pattern Matching with Sequences¶

Slicing¶

Using + and * with Sequences¶

list.sort Versus the sorted Built-In¶

Managing Ordered Sequences with bisect (extra ref from textbook)¶

When a List Is Not the Answer¶

Chapter Summary¶

Further Reading¶

Chapter 3. Dictionaries and Sets¶

What’s New in This Chapter¶

Extra: Internals of sets and dicts internalsextra¶

Modern dict Syntax¶

Syntax & Structure: Pattern Matching with Mappings cool¶

Standard API of Mapping Types¶

Automatic Handling of Missing Keys¶

Variations of dict¶

Immutable Mappings¶

Dictionary Views¶

Practical Consequences of How dict Works¶

Set Theory¶

Practical Consequences of How Sets Work¶

Set Operations on dict Views¶

Chapter Summary¶

Further Reading¶

Chapter 4. Unicode Text Versus Bytes¶

What’s New in This Chapter¶

Character Issues¶

Byte Essentials¶

Basic Encoders/Decoders¶

Understanding Encode/Decode Problems¶

Handling Text Files & the “Unicode Sandwich”¶

Normalizing Unicode for Reliable Comparisons¶

Sorting Unicode Text¶

The Unicode Database¶

Dual-Mode str and bytes APIs¶

Chapter Summary¶

Further Reading¶

Chapter 5. Data Class Builders¶

Link on page 194: typing module documentation¶

What’s New in This Chapter¶

Overview of Data Class Builders¶

Classic Named Tuples¶

Typed Named Tuples¶

Typed Named Tuples¶

Type Hints 101¶

More About @dataclass¶

Data Class as a Code Smell¶

Pattern Matching Class Instances¶

Chapter Summary¶

Further Reading¶

Chapter 6. Object References, Mutability, and Recycling¶

What’s New in This Chapter¶

Variables Are Not Boxes, they are labels¶

Identity, Equality, and Aliases¶

Copies Are Shallow by Default¶

Function Parameters as References¶

del and Garbage Collection¶

Tricks Python Plays with Immutables \(\rightarrow\) Interned Immutables¶

Chapter Summary¶

Further Reading¶

Part II. Functions as Objects¶

Outline and Notes

Part I. Data Structures

Chapter 1. The Python Data Model

Seeing python as a “framework”

What’s New in This Chapter

A Pythonic Card Deck

Class Composition and how Delegation pattern in the data model helps

How Special Methods Are Used

Overview of Special Methods

Why len Is Not a Method

Chapter Summary

Further Reading

Python’s DataModel can be seen as a MetaObject Protocol

Chapter 2. An Array of Sequences

What’s New in This Chapter

Overview of Built-in Sequences

List Comprehensions and Generator Expressions

Tuples Are Not Just Immutable Lists

Unpacking Sequences and Iterables

Pattern Matching with Sequences

Slicing

Using + and * with Sequences

list.sort Versus the sorted Built-In

Managing Ordered Sequences with bisect (extra ref from textbook)

When a List Is Not the Answer

Chapter Summary

Further Reading

Chapter 3. Dictionaries and Sets

What’s New in This Chapter

Extra: Internals of sets and dicts internalsextra

Modern dict Syntax

Syntax & Structure: Pattern Matching with Mappings cool

Standard API of Mapping Types

Automatic Handling of Missing Keys

Variations of dict

Immutable Mappings

Dictionary Views

Practical Consequences of How dict Works

Set Theory

Practical Consequences of How Sets Work

Set Operations on dict Views

Chapter Summary

Further Reading

Chapter 4. Unicode Text Versus Bytes

What’s New in This Chapter

Character Issues

Byte Essentials

Basic Encoders/Decoders

Understanding Encode/Decode Problems

Handling Text Files & the “Unicode Sandwich”

Normalizing Unicode for Reliable Comparisons

Sorting Unicode Text

The Unicode Database

Dual-Mode str and bytes APIs

Chapter Summary

Further Reading

Chapter 5. Data Class Builders

Link on page 194: typing module documentation

What’s New in This Chapter

Overview of Data Class Builders

Classic Named Tuples

Typed Named Tuples

Typed Named Tuples

Type Hints 101

More About @dataclass

Data Class as a Code Smell

Pattern Matching Class Instances

Chapter Summary

Further Reading

Chapter 6. Object References, Mutability, and Recycling

What’s New in This Chapter

Variables Are Not Boxes, they are labels

Identity, Equality, and Aliases

Copies Are Shallow by Default

Function Parameters as References

del and Garbage Collection

Tricks Python Plays with Immutables \(\rightarrow\) Interned Immutables

Chapter Summary

Further Reading

Part II. Functions as Objects