Skip to main content
  1. Readings/
  2. Books/
  3. Fluent Python: Clear, Concise, and Effective Programming – Luciano Ramalho/

Chapter 5. Data Class Builders

··3984 words·19 mins
Table of Contents

I think my stance on using data classes is that it should help mock things easily to come up with scaffolds which are easy to replace.

It’s interesting that the type hinting for class vs instance attributes ended up needing to use pseudoclasses specific for this purpose (ClassVar, InitVar)

What’s New in This Chapter #

Overview of Data Class Builders #

  • Problem posed:
    • __init__ constructor can become too complex if we’re just going to assign attributes from constructor parameters
  • 3 options:
    • collections.namedtuple
    • typing.NamedTuple
      • newer than namedtuple
    • @dataclass decorator from dataclasses module
  • How they work:
    • they don’t rely on inheritence
    • typing hints are there if we use NamedTuple or dataclass
    • some of them are subclasses of tuple
    • All of them use metaprogramming techniques to inject methods and data attributes into the class under construction.
    • Some of them are more updated ways of doing things: typed.NamedTuple is newer than namedtuple
  • Examples:
    • Named tuple:

      • define inline Coordinate = typing.NamedTuple('Coordinate', lat=float, lon=float)

      • defined with a class statement Although here, NamedTuple is not a superclass, it’s actually a metaclass

         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        
              from typing import NamedTuple
              class Coordinate(NamedTuple):
                      lat: float
                      lon: float
        
                      def __str__(self):
                              ns = 'N' if self.lat >= 0 else 'S'
                              we = 'E' if self.lon >= 0 else 'W'
        
                      return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'
        
    • Using dataclass

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      
          from dataclasses import dataclass
          @dataclass(frozen=True)
          class Coordinate:
                  lat: float
                  lon: float
      
                  def __str__(self):
                      ns = 'N' if self.lat >= 0 else 'S'
                      we = 'E' if self.lon >= 0 else 'W'
      
                  return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'
      

Main Features #

Mutability #

Out of the 3, only @dataclass allows us to keep the class mutable (if we need we an mark it as frozen btw).

The rest, since they are subclasses of tuple are immutable.

For the immutable ones, we can replace the object using replace functions.

NamedTuple as a metaclass customization of a class def #

Although NamedTuple appears in the class statement as a super‐ class, it’s actually not. typing.NamedTuple uses the advanced func‐ tionality of a metaclass2 to customize the creation of the user’s class.

Correctly reading type hints @ runtime #

It will be discussed in more detail later in the book

reading from annotations directly is not recom‐ mended. Instead, the recommended best practice to get that information is to call inspect.get_annotations(MyClass) (added in Python 3.10) or typing.get_type_hints(MyClass) (Python 3.5 to 3.9). That’s because those functions provide extra services, like resolving forward references in type hints.

Classic Named Tuples #

collections.namedtuple is a factory function #

So it’s possible to hack things by adding functions to this subclass.

collections.namedtuple function is a factory that builds subclasses of tuple enhanced with field names, a class name, and an informative repr.

Memory Use by collections.namedtuple #

There’s no excess mem usage because it’s the class that will store the attribute names

So it’s same space usage as a tuple.

Each instance of a class built by namedtuple takes exactly the same amount of memory as a tuple because the field names are stored in the class.

Injecting methods into the subclass #

this is a hack, shouldn’t be relied upon.

NOTE: No need to name the first arg as self if you’re hacking things by injecting methods

the first argument doesn’t need to be named self. Anyway, it will get the receiver when called as a method.

normal classes method definition, self attribute is the receiver #

just some extra information about what the receiver is in the context of defining class methods in python

In Python, the **self** parameter in class methods is the conventional name for the **receiver**—the object instance that the method is being called on. While Python documentation and community almost always use the term "self," in some other object-oriented languages (like Ruby or in theoretical OOP discussions), "receiver" is the standard term for the object that receives the method call.

### What is the "receiver"?

- The **receiver** is the specific instance of the class on which the method is invoked.
- In Python, when you define a method like `def foo(self, ...)`, the `self` parameter is automatically bound to the instance when you call `instance.foo(...)`[6][1][7].
- This allows each method to access and modify the attributes and other methods of the particular object instance, not just the class as a whole[6][1].

### How does it work?

- When you call `obj.method(args)`, Python translates this to `Class.method(obj, args)`. The object `obj` is passed as the first argument to the method, and inside the method, it is referred to as `self`[6][1][7].
- This is how methods can operate on instance-specific data: `self.attribute` refers to the attribute named `attribute` on the particular instance, not on the class or on other instances[1][6][7].

#### Example:

```python
class Car:
    def __init__(self, model):
        self.model = model  # self is the receiver

    def show(self):
        print("Model is", self.model)

audi = Car("Audi A4")
audi.show()  # 'self' inside show() refers to the audi instance
```

- Here, `audi.show()` is equivalent to `Car.show(audi)`. The `audi` object is the receiver.

### Why is the receiver important?

- It enables **object-specific state and behavior**. Each instance maintains its own data, and methods can access or mutate that data through `self`[1][6][7].
- It allows methods to call other methods on the same object, e.g., `self.other_method()`[3].
- Without the receiver, methods would not know which instance's data to operate on, and all data would have to be global or class-level.

### How does Python implement this?

- When you retrieve a method from an instance (e.g., `method = obj.method`), Python returns a **bound method**—a function object with the receiver (`obj`) already attached as its first argument[5][8].
- When you call the bound method, you only need to supply the remaining arguments; `self` is supplied automatically[5].

### Summary

- **self** is the Pythonic name for the **receiver** of a method call in a class.
- It refers to the specific instance on which the method was called.
- It gives methods access to instance-specific data and behavior, enabling true object-oriented programming in Python[1][6][7].

> “The 'self' parameter in Python class methods represents the class instance (object). Methods can access and manipulate attributes (variables) and call other methods of the same class using this special variable.”[6]

This is a foundational mechanism for encapsulation and polymorphism in Python’s object model.

[1] https://www.geeksforgeeks.org/python/self-in-python-class/
[2] https://www.reddit.com/r/learnpython/comments/k9f4q7/could_someone_explain_the_use_of_self_when_it/
[3] https://docs.python.org/3/tutorial/classes.html
[4] https://stackoverflow.com/questions/14671218/python-class-methods-changing-self
[5] https://stackoverflow.com/questions/70076256/how-do-python-handles-self-attribute-of-a-class-internally-when-the-method-is
[6] https://www.w3resource.com/python-interview/explain-the-purpose-of-the-self-parameter-in-python-class-methods.php
[7] https://blog.finxter.com/understanding-the-self-in-python-classes/
[8] https://docs.python.org/3/reference/datamodel.html
[9] https://softwareengineering.stackexchange.com/questions/422364/is-it-better-to-pass-self-or-the-specific-attributes-in-python-methods
[10] https://www.pythonmorsels.com/what-is-self/

Typed Named Tuples #

compile-time type annotations: the main feature of named tuples #

Classes built by typing.NamedTuple don’t have any methods beyond those that col lections.namedtuple also generates—and those that are inherited from tuple. The only difference is the presence of the annotations class attribute—which Python completely ignores at runtime.

Typed Named Tuples #

  • the type annotations are ignored by python at runtime

Type Hints 101 #

No Runtime Effect #

Type hints not enforced by compiler & interpreter #

  • main intent is for use by static analysis tools, at rest

The first thing you need to know about type hints is that they are not enforced at all by the Python bytecode compiler and interpreter.

Works at import time! #

that’s why importing libraries may fail.

Variable Annotation Syntax #

  • variable here refers to the fact that variables are being annotated, not that the type hint is variable.
  • the syntax is just var_name: some_type = a_default_value

The Meaning of Variable Annotations #

For classic class definitions, survival of annotations & survival of attributes within annotations #

:NOTER_PAGE: (206 . 0.086168)

This applies to the classic class definitions, without the named tuples and such.

This makes sense because there’s no reason to keep the annotations.

surviving of annotation <== if there’s a type hint given

surviving of the attribute in the class <== if there’s a value assignable

Note that the annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class. The a survives only as an annotation. It doesn’t become a class attribute because no value is bound to it.6 The b and c are stored as class attributes because they are bound to values.

Annotations are type annotations for immutable attributes #

This is because NT is extended from Tuple class.

  • Contents

    If you try to assign values to nt.a, nt.b, nt.c, or even nt.z, you’ll get Attribute Error exceptions with subtly different error messages. Try that and reflect on the messages.

  • Comment

    Because it’s read-only instance attribute and it’s expected to be immutable

using the @dataclass decorator allows the attrs to persist as instance attributes #

:NOTER_PAGE: (208 . 0.488788)

  • Contents

    However, there is no attribute named a in DemoDataClass—in contrast with DemoNTClass from Example 5-11, which has a descriptor to get a from the instances as read-only attributes (that myste‐ rious <_collections._tuplegetter>). That’s because the a attribute will only exist in instances of DemoDataClass. It will be a public attribute that we can get and set, unless the class is frozen. But b and c exist as class attributes, with b holding the default value for the b instance attribute, while c is just a class attribute that will not be bound to the instances.

  • Comment

    when using a decorator, the descriptor for the class that is ONLY type-hinted will only exist in concrete instances of that class.

annotation special attr are for type hints #

annotations special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class.

More About @dataclass #

Don’t set a custom attribute outside of its constructor function! #

:NOTER_PAGE: (209 . 0.862182)

Contents #

Setting an attribute after init defeats the dict key-sharing memory optimization mentioned in “Practical Consequences of How dict Works” on page 102.

Comment #

Reminder: all the attrs for a class should really just be defined within the class itself to benefit from the memory optimisation that it comes with by default

immutability is emulated by methods #

Which means it can be bypassed by overriding the implementation of these functions! (the settattr and deattr dunder methods)

emulates immutability by generating setattr and delattr, which raise data class.FrozenInstanceError

Field Options #

WARNING: mutable defaults are NOT allowed. #

similar to the assignment gotchas where if we do my arr = [[] * 3], reusing a mutable reference (the inner list) means that the 3 instances all point to the same memory location

we can how that would be a problematic bug

therefore, it’s illegal to set default values that are mutable when we use dataclasses.

we can use default_factory as a solution to this.

default_factory helps prevent mutability bugs #

  • if a default value is provided that is mutable, then it would mean that many instances can edit the same mutable handle ==> this is a problematic bug. That’s why the default option is only to pass a factory function if you want to assign mutable default values so that each mutable default is a separate reference.

  • but this won’t apply to custom mutable objects, that’s why it’s a common source of mutable data related bugs l

The default_factory parameter lets you provide a function, class, or any other call‐ able, which will be invoked with zero arguments to build a default value each time an instance of the data class is created. This way, each instance of ClubMember will have its own list—instead of all instances sharing the same list from the class, which is rarely what we want and is often a bug.

mental model for sentinel values #

``sentinel value''

### Understanding Sentinel Values

**Sentinel values** are special values used in programming to signify that a variable or parameter is missing, not set, or has a specific condition that needs to be distinguished from valid values. They serve as markers that help identify states in data structures or function arguments without conflicting with legitimate data.

#### Key Characteristics of Sentinel Values

1. **Uniqueness**: A sentinel value is typically unique and not used as a regular value in the context where it is applied. This uniqueness allows it to serve as a clear indicator of absence or a specific condition.

2. **Common Usage**:
   - **Function Arguments**: In functions, sentinel values can indicate that an optional parameter was not provided. For example, in Python, you might use `None` or a custom sentinel object to differentiate between "no value" and "a value of None".
   - **Data Structures**: In data structures like linked lists or trees, sentinel nodes may be used to simplify boundary conditions (e.g., head and tail pointers).

3. **Examples**:
   ```python
   # Using None as a sentinel
   def fetch_data(key, default=None):
       if key not in my_dict:
           return default  # Return the sentinel if key is missing
       return my_dict[key]

   # Using a custom sentinel
   MISSING = object()
   def get_value(key, default=MISSING):
       if key not in my_dict:
           if default is MISSING:
               raise KeyError(f"{key} not found")
           return default
       return my_dict[key]
   ```

### Mental Model for Sentinel Values

To form a mental model for sentinel values, consider the **concept of markers or flags** in various contexts:

1. **Etymological Basis**: The term "sentinel" originates from the Latin word "sentinella," meaning "to watch." Just as sentinels guard and signal the presence or absence of something important, sentinel values act as indicators in programming. They help you "watch" for specific conditions in your code.

2. **Mental Model**: Think of sentinel values as **flags on a map**:
   - Imagine a treasure map where certain locations are marked with flags indicating whether they contain treasure (valid data) or are empty (missing data).
   - Just like those flags help you navigate the map without confusion, sentinel values guide your logic by clearly indicating when something is absent or when a specific condition applies.

3. **Practical Application**: When designing functions or data structures, consider how you can implement sentinel values to handle edge cases gracefully. This approach can prevent errors and make your code more robust by explicitly managing conditions that would otherwise lead to ambiguity.

### Summary

Sentinel values are special markers used to indicate missing or special conditions in programming. They provide clarity and help manage edge cases effectively. By conceptualizing them as flags on a map, you can better understand their role in guiding logic and maintaining code integrity. This mental model emphasizes the importance of clear indicators in navigating complex data scenarios, much like sentinels watch over critical points of interest.

Citations:
[1] https://www.reddit.com/r/slatestarcodex/comments/83phqz/what_mental_models_do_you_find_to_be_most_helpful/
[2] https://fs.blog/mental-models/
[3] https://pmc.ncbi.nlm.nih.gov/articles/PMC8290163/
[4] https://jamesclear.com/feynman-mental-models
[5] https://www.linkedin.com/pulse/10x-your-results-mental-models-mark-green
[6] https://forum.valuepickr.com/t/scalability-thinking-mental-models/104746
[7] https://www.safalniveshak.com/latticework-mental-models-active-reading/
[8] https://learn.microsoft.com/ja-jp/globalization/encoding/unicode-standard

Post-init Processing #

Allows us to insert logic after the contructor, to do things like calculations and such.

Highlight on page 213 #

  • Contents

    method generated by @dataclass only takes the arguments passed and assigns them—or their default values, if missing—to

Highlight on page 213 #

  • Contents

    Common use cases for post_init are validation and computing field values based on other fields.

Typed Class Attributes #

Need to use ClassVar from the typing module.

This is a pseudotype Read more here:

Yes, **this is still the standard and recommended way to declare class variables with type hints in the most recent versions of Python (as of Python 3.13)**.

### Why is `ClassVar` Needed?

- By default, when you annotate a variable inside a class, **type checkers (like mypy, pyright, or the built-in `dataclasses` module) assume it is an instance variable**—that is, each instance of the class will have its own copy.
- If you want to indicate that a variable is a **class variable** (shared across all instances), you must use `typing.ClassVar` in your annotation[1][2].
- This is especially important for tools like `@dataclass`, which will otherwise treat all annotated variables as fields to be included in the instance's `__init__`.

### The Syntax

```python
from typing import ClassVar

class Example:
    all_handles: ClassVar[set[str]] = set()
```
- This tells both the type checker and `@dataclass` (if used) that `all_handles` is a class attribute, not an instance attribute.
- The type inside `ClassVar[...]` can be any valid type, including generics like `set[str]`.

### Why is this necessary/ugly?

- The reason for this "pseudotype" is that Python's variable annotations (PEP 526) are designed to be explicit and unambiguous for static analysis and code generation tools.
- Without `ClassVar`, type checkers and dataclasses would treat the variable as an instance field, which is almost always wrong for shared state.

### Is this still the case in Python 3.13+?

**Yes.**
- The [official Python docs](https://docs.python.org/3/library/typing.html) (as of Python 3.13.5) still require `ClassVar` for class variables in type annotations[2].
- This is also reflected in community best practices and in the implementation of type checkers and the standard library[1][2][3][4].

### Example with `@dataclass`

```python
from dataclasses import dataclass
from typing import ClassVar

@dataclass
class Example:
    all_handles: ClassVar[set[str]] = set()  # Not included as a dataclass field
    name: str  # This is an instance variable
```
- `all_handles` is not included in the generated `__init__`, nor in the instance's `__dict__`.

### Related: `Final` and `ClassVar`

- If you want a class variable to be a constant, you can use `Final` from `typing`, but **you cannot combine `Final` and `ClassVar`** in the annotation (see [1]). You must choose one or the other, as combining them is not supported by type checkers.

### References to Search Results

- [PEP 526 and usage of ClassVar](https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python)[1]
- [Python 3.13 typing documentation](https://docs.python.org/3/library/typing.html)[2]
- [Class variables in Python](https://pynative.com/python-class-variables/)[5]

**In summary:**
Declaring class variables with type hints using `ClassVar[...]` is still required and is the correct, modern, and supported approach in all current and foreseeable future versions of Python. This ensures type checkers and dataclasses handle your class variables as intended.

[1] https://stackoverflow.com/questions/66285078/how-to-define-final-classvar-variable-in-python
[2] https://docs.python.org/3/library/typing.html
[3] https://github.com/python/typing/discussions/1424
[4] https://programming-25.mooc.fi/part-9/5-class-attributes/
[5] https://pynative.com/python-class-variables/
[6] https://docs.python.org/3/tutorial/classes.html
[7] https://www.digitalocean.com/community/tutorials/understanding-class-and-instance-variables-in-python-3
[8] https://realpython.com/python-variables/
[9] https://programming-25.mooc.fi/part-8/3-defining-classes/
[10] https://github.com/python/typing/discussions/1636

Initialization Variables That Are Not Fields #

@dataclass Example: Dublin Core Resource Record #

Data Class as a Code Smell #

Data Class as Scaffolding #

value: dataclasses are a good way to do temp wire-ups / stubs #

In this scenario, the data class is an initial, simplistic implementation of a class to jump-start a new project or module. With time, the class should get its own methods, instead of relying on methods of other classes to operate on its instances. Scaffolding is temporary;

Data Class as Intermediate Representation #

Pattern Matching Class Instances #

pattern matching in python is new #

  • It’s very similar to elixir’s pattern matching and is different from a typical case-switch construct

  • it typically uses the __match_args__ dunder declaration, will be discussed in a later part of the book

Here’s some details on it

Python's `match-case` syntax, introduced in Python 3.10, offers a powerful alternative to traditional `switch-case` constructs found in other programming languages like C, C++, and Java. Here’s a detailed comparison of Python's `match-case` with traditional `switch-case` syntax.

### Key Differences Between Python's `match-case` and Traditional `switch-case`

1. **Pattern Matching vs. Value Matching**:
   - **Python's `match-case`**: Supports **pattern matching**, which means it can match complex data structures, such as lists, tuples, and even class instances. It allows for destructuring and extracting values from these structures.
     ```python
     match some_value:
         case (x, y):  # Matches a tuple with two elements
             print(f"Matched a tuple with x={x} and y={y}")
         case _:
             print("No match")
     ```
   - **Traditional `switch-case`**: Typically only matches against scalar values (like integers or strings) and does not support destructuring. It evaluates the expression and compares it against constant cases.
     ```c
     switch (value) {
         case 1:
             printf("One");
             break;
         case 2:
             printf("Two");
             break;
         default:
             printf("Default case");
     }
     ```

2. **Wildcards and Default Cases**:
   - **Python's `match-case`**: Uses the underscore (`_`) as a wildcard to catch all unmatched cases, similar to an `else` statement.
   - **Traditional `switch-case`**: Uses a `default` case for handling unmatched values, but it requires explicit declaration.

3. **Multiple Patterns**:
   - **Python's `match-case`**: Allows combining multiple patterns using the pipe operator (`|`) for cases that should execute the same block of code.
     ```python
     match day:
         case "Saturday" | "Sunday":
             print("It's the weekend!")
         case _:
             print("It's a weekday.")
     ```
   - **Traditional `switch-case`**: Requires separate cases for each value or uses fall-through behavior (if not explicitly handled with `break`).

4. **No Break Statements Needed**:
   - **Python's `match-case`**: Automatically exits after executing the matched case block, eliminating the need for `break` statements to prevent fall-through.
   - **Traditional `switch-case`**: Requires explicit use of `break` to prevent fall-through to subsequent cases.

5. **Guard Conditions**:
   - **Python's `match-case`**: Supports guard conditions using an `if` statement within the case clause to add additional checks.
     ```python
     match details:
         case [amt, duration] if amt < 10000:
             return amt * 0.1 * duration
         case [amt, duration] if amt >= 10000:
             return amt * 0.15 * duration
     ```
   - **Traditional `switch-case`**: Does not natively support guard conditions; you would need to use additional if-else statements.

### Summary

- Python's `match-case` syntax is more flexible and powerful than traditional `switch-case`, allowing for complex pattern matching and destructuring of data structures.
- It simplifies code by removing the need for break statements and supports more expressive patterns through guards and multiple patterns.
- While both constructs serve similar purposes in controlling flow based on variable values, Python's approach aligns more closely with modern programming paradigms that emphasize readability and expressiveness.

In conclusion, while Python's `match-case` serves a similar purpose to traditional switch-case statements in other languages, it introduces significant enhancements that make it more versatile and easier to use in many scenarios.

Citations:
[1] https://www.geeksforgeeks.org/python-match-case-statement/
[2] https://www.tutorialspoint.com/python/python_matchcase_statement.htm
[3] https://www.youtube.com/watch?v=L7tT0NZF-Ag
[4] https://www.datacamp.com/tutorial/python-switch-case
[5] https://discuss.python.org/t/providing-a-shorthand-match-case-statement/21421
[6] https://stackoverflow.com/questions/74655787/match-case-statement-with-multiple-or-conditions-in-each-case
[7] https://www.youtube.com/watch?v=prB2lfuPDAc
[8] https://docs.python.org/pt-br/3.13/whatsnew/3.10.html

Designed to match classes instances by types and by attrs #

Contents #

Class patterns are designed to match class instances by type and—optionally—by attributes. The subject of a class pattern can be any class instance, not only instances of data classes.10

Simple Class Patterns #

Keyword Class Patterns #

Captures also work with this syntax #

  • Contents

    Keyword class patterns are very readable, and work with any class that has public instance attributes, but they are somewhat verbose.

Positional Class Patterns #

The pattern for an attribute can be defined positionally as well.

Named collectors / captures still work with this.

Chapter Summary #

Dataclasses as a code smell #

Contents #

warned against possible abuse of data classes defeating a basic principle of object-oriented programming: data and the functions that touch it should be together in the same class. Classes with no logic may be a sign of misplaced logic.

Further Reading #

Highlight on page 228 #

Contents #

Finally, if you want to annotate that class attribute with a type, you can’t use regular types because then it will become an instance attribute. You must resort to that pseu‐ dotype ClassVar annotation:

Underline on page 228 #

Contents #

Here we are