Chapter 11. A Pythonic Object

Table of Contents

Learning Objectives:

builtins that convert objects to other types
alternative constructors
extending the formatting mini language

What’s New in This Chapter #

Object Representations #

python has the following ways to get common string representations. everything except for bytes should return Unicode strings:
- repr(), depends on __repr__: developer’s POV
- str() depends on __str__: user’s POV
- bytes() depends on __bytes__ for byte sequence
- format() depends on __format__: for f-strings
  str.format() method also relies on this.

Vector Class Redux #

this example is really useful

some notes from the example:

__iter__ is what makes the object iterable, so that we can do things like:
- unpacking and so on
we can implement this by using a generator expression to yield the components one after the other
kiv the implentation of __eq__ and operator overloading until then

An Alternative Constructor #

@classmethod allows a method to be called on a class.

1
2
3
4
5
6
  @classmethod
  def frombytes(cls, octets):
          typecode = chr(octets[0])
          memv = memoryview(octets[1:]).cast(typecode)

          return cls(*memv)

Naturally, this ISN’T a static method. Here’s an outline of the diffs. The next section is about this actually.

  Yes, class methods and static methods are different in Python; they serve distinct purposes and have different access to class and instance data.

  - **Class methods** are defined using the `@classmethod` decorator. Their first parameter is `cls`, which refers to the class itself (not an instance). Class methods can access and modify class-level attributes and are often used for factory methods or operations that affect the class as a whole. They cannot directly access or modify instance-specific data[1][2][5][6][8].

  - **Static methods** are defined using the `@staticmethod` decorator. They do not take `self` or `cls` as their first argument and have no automatic access to either instance or class state. Static methods are utility functions, included in a class for organizational purposes, but they behave like plain functions that happen to live in a class’s namespace. They cannot access or modify class or instance data[1][2][4][5][6][8].

  ### Key Differences

  | Aspect                     | Class Method                  | Static Method                      |
  |----------------------------|-------------------------------|------------------------------------|
  | Decorator                  | `@classmethod`                | `@staticmethod`                    |
  | First parameter            | `cls` (the class)             | None required                      |
  | Accesses class data?       | Yes, via `cls`                | No                                 |
  | Accesses instance data?    | No                            | No                                 |
  | Use case                   | Factory methods, class state  | Utility functions                  |
  | Can modify class state?    | Yes                           | No                                 |
  | Can modify instance state? | No                            | No                                 |

  #### Example Usage

  ```python
  class Example:
      count = 0

      @classmethod
      def increment_count(cls):
          cls.count += 1

      @staticmethod
      def add(x, y):
          return x + y
  ```

  - `increment_count()` can change `Example.count`.
  - `add(2, 3)` simply returns the sum and does not access or change any class or instance data[2][5][6][8].

  In summary: **Class methods operate on the class and its state; static methods are independent and do not touch class or instance data.**

  [1] https://realpython.com/instance-class-and-static-methods-demystified/
  [2] https://www.geeksforgeeks.org/python/class-method-vs-static-method-python/
  [3] https://stackoverflow.com/questions/136097/what-is-the-difference-between-staticmethod-and-classmethod-in-python
  [4] https://www.python-engineer.com/posts/difference-classmethod-and-staticmethod/
  [5] https://www.linkedin.com/pulse/exploring-differences-between-class-methods-static-python
  [6] https://accuweb.cloud/resource/articles/class-vs-static-method-in-python
  [7] https://www.boardinfinity.com/blog/class-and-static-method-in-python-differences/
  [8] https://flexiple.com/python/class-method-vs-static-method
  [9] https://www.youtube.com/watch?v=PIKiHq1O9HQ

`@classmethod` Versus `@staticmethod` #

classmethod: to define a method that operates on the class and not on instances. classmethod changes the way the method is called, so it receives the class itself as the first argument, instead of an instance.
a good use case is for alternative constructors like we see here in frombytes, or for operations that work at the class-level
NOTE: having learnt java in school, this seems to directly map to the use of static functions in java.
HOWEVER @classmethod in Python is not the same as Java static methods. In Python, classmethods are more powerful: they get the class as an argument, so they are aware of inheritance (subclasses), and can instantiate correct subclasses—something Java static methods cannot directly do, since they are not polymorphic in the same way and are resolved at compile-time
staticmethod: this changes a method so that it receives no special first argument. In essence, a static method is just like a plain function that happens to live in a class body, instead of being defined at the module level.
this just has no access to the class or instance data.
seems like there aren’t many good usecases for this.

Formatted Displays #

the different invokers delegate the formatting logic to the dunder method __format__(<format_spec>)
two ways to get the formatting specifier:
1. the second arg in format(my_obj, format_spec)
2. within replacement fields (which are {}) that delimit the specifier within an f-string or the fmt in fmt.str.format().
  e.g. '1 BRL = {rate:0.2f} USD'.format(rate=brl)
  in this example, the rate is not part of the specifier, it’s the kwarg for the replacement field. The actual specifier is just '0.2f'
  we could also directly reference this: f'1 USD = {1 / brl:0.2f} BRL' (notice the use of the )
Replacement fields:
A format string that looks like '{0.mass:5.3e}' has 2 separate notations:
1. field name:
  0.mass to the left is the field_name for the replacement syntax.
  it can be an arbitrary expression in an f-string.
2. formatting specifier:
  5.3e after the colon is the formatting specifier
  this is just the formatspec mini language (ref)

FormatSpec Structure #

fstrings vs str.format()
- the fstrings allows in place replacement
- the others separate it and make it more of an interpolation. This is great when we wanna separate it.

let me do a bot dump for this.

The **format specifier** (or `format_spec`) is a mini-language used inside replacement fields of Python format strings (e.g., in `'{0.mass:5.3e}'`) to control how the formatted value is presented.

### Breakdown of the Notation:

- **Field name:**
The part before the colon (`:`), like `0.mass` in your example, specifies what value to format. It can be a positional argument (`0`), an attribute access (`mass`), or an arbitrary expression in f-strings.

- **Format specifier:**
The part after the colon (`5.3e` in your example) is the **format spec mini-language** that instructs Python how to render the value. This includes alignment, width, precision, type (e.g., scientific notation), padding, and more.

### About the Format Spec Mini-Language

The format spec language is defined in Python’s string formatting documentation and includes syntax like:

```
[[fill]align][sign][#][0][width][,][.precision][type]
```

Where:

- **fill**: Any character used to fill the padding space.
- **align**: `` (right), `=` (pad after sign), `^` (center).
- **sign**: `+`, `-`, or space for numeric sign display.
- **#**: Alternate form (e.g., include `0x` for hex).
- **0**: Enables zero-padding.
- **width**: Minimum field width as a number.
- **,**: Enables thousands separator.
- **.precision**: Number of decimal places (or significant digits).
- **type**: Type of conversion, such as:
- `e` or `E` for scientific notation,
- `f` or `F` for fixed-point,
- `g` or `G` for general format,
- `d` for decimal integer,
- `s` for string, etc.

Your example `{0.mass:5.3e}` formats the value `mass` of the first argument in scientific notation (`e`), with a total width of 5 characters and 3 digits after the decimal point.

### Practical Use

- Format specs can control numeric formatting (precision, notation).
- They can align text inside fixed-width fields.
- Support thousand separators and padding.
- Allow highly readable and localized string outputs.

### References and Deep Dive

- Official Python docs on [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#formatspec)
- Real Python tutorial on [Python’s Format Mini-Language](https://realpython.com/python-format-mini-language/)
- Examples for format spec syntax are available in many Python formatting guides.

If you want, I can provide a detailed summary or a "bot dump" of the format spec mini-language with examples—just let me know!

[1] https://realpython.com/python-format-mini-language/
[2] https://docs.python.org/3/library/string.html
[3] https://realpython.com/python-string-formatting/
[4] https://stackoverflow.com/questions/44551535/access-the-cpython-string-format-specification-mini-language-parser
[5] https://www.telerik.com/blogs/string-formatting-python
[6] https://docs.python.org/3/tutorial/inputoutput.html
[7] https://learnpython.com/blog/python-string-formatting/
[8] https://pyformat.info
[9] https://mimo.org/glossary/python/formatted-strings

The Format Specification Mini-Language is extensible because each class gets to interpret the format_spec argument as it likes.

Enhancing/extending FormatSpec for our custom objects #

extending is as simple as just having a bunch of if-elses pased on the format spec that we get.

seems like it’s a good practice to remove the 'p' suffix from fmt_spec. This is so that the subsequent formatting calls on the individual components downstream this part of the code doesn’t get confused.

Thus, to preserve compatibility with the built-in format machinery, the custom 'p' suffix must be removed before passing the fmt_spec down to format()

elaboration:

    Great question! Let's analyze the code snippet and the rationale behind removing the custom `'p'` suffix from the `fmt_spec` in the `__format__` method.

    ### Context Recap: What Is Happening Here?

    The method `__format__` is being customized in a class that likely represents a 2D vector or complex number-like object with components and angle:

    ```python
    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]
            coords = (abs(self), self.angle())
            outer_fmt = ''
        else:
            coords = self
            outer_fmt = '({}, {})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(*components)
    ```

    - **`fmt_spec`**: The format specification string passed when formatting, e.g., in an f-string.
    - **Custom `'p'` suffix:** If the format spec ends with `'p'`, the method interprets it as a special request to format the vector in *polar coordinates* — providing `(r, θ)` instead of `(x, y)`.
    - If `'p'` is detected, it is **stripped off** (`fmt_spec = fmt_spec[:-1]`), so it doesn't confuse the subsequent formatting calls on the individual float components.

    ### Why Remove the Custom `'p'` Suffix from `fmt_spec`?

    #### 1. **Format specifiers need to be valid for the inner formatting calls**

    - The method eventually calls `format(c, fmt_spec)` on each component, where each `c` is a float (radius and angle) or float-like.
    - The Python built-in formatter (`format()` function) and the numeric format mini-language **do not recognize `'p'`** as a valid format code.
    - If `'p'` were left in the format spec string, calling `format(float_value, fmt_spec)` would raise a `ValueError` because `'p'` is not a defined type specifier.

    Thus, to **preserve compatibility with the built-in format machinery**, the custom `'p'` suffix must be removed **before** passing the `fmt_spec` down to `format()`.

    #### 2. **Maintaining a clean interface for users**

    - The `'p'` suffix acts as a *signal* to the `__format__` method that **polar formatting** should be used.
    - By removing it early, the code simply leverages the underlying float formatting logic for the numeric components while exposing a neat, domain-specific format feature.
    - This lets users write something like `format(vec, ".2fp")` -- meaning "format this vector in polar coordinates, with two decimal places," where `" .2f"` is the standard float format and `'p'` is the custom extension.

    #### 3. **Separation of concerns**

    - The `'p'` flag is handled at the level of `__format__` **only** to choose the representation (Cartesian vs polar).
    - The output formatting of numeric components is delegated to the built-in or standard format spec mini-language.
    - Removing `'p'` keeps these concerns cleanly separated and avoids leaking the custom format logic into the lower formatting layers.

    ### Mental Model: Decorator Analogy to Formatter Chains

    We can think of this like a decorator on the `format` function:

    - The outer function (`__format__`) decorates or wraps the usual formatting logic.
    - It handles the special parameter (`'p'`) that changes *what* is formatted (polar vs Cartesian).
    - Then it **cleans** the format spec for the underlying calls that format numeric components, passing on only the valid portion (`.2f`, `.3e`, etc.).
    - This preserves **compatibility, safety, and separation** between the custom feature and standard behavior.

    ### Summary

    | Reason                                   | Explanation                                 |
    |------------------------------------------|---------------------------------------------|
    | Prevent invalid format spec errors       | `'p'` is **not valid** in built-in float format codes; must be removed before numeric formatting calls. |
    | Clean user interface for extended formatting | `'p'` is a **custom flag** for polar representation, distinct from numeric formatting codes. |
    | Separation of concerns                    | Handle polar/cartesian choice separately from numeric formatting of components. |
    | Avoid breaking the built-in `format()` contract | Passing `'p'` would cause exceptions or undefined behavior. |

    ### Optional: What Happens if You Don’t Strip `'p'`?

    If the code passes on the whole string including `'p'`, e.g., `"5.2fp"`, to `format(1.23, "5.2fp")` on a float:

    - Python raises a `ValueError` like:

      ```
      ValueError: Unknown format code 'p' for object of type 'float'
      ```

    Because `p` is an unknown format type in the standard format mini-language.

    If you want, I can provide more detailed examples or links to the Python format spec mini-language documentation or even discuss how this relates to PEP 3101 formatting protocol and how to design robust `__format__` implementations.

    **In essence, the removal of the custom 'p' suffix from the format spec string ensures that this domain-specific extension coexists safely with Python’s built-in formatting system, maintaining robustness and good user experience.**

we should try to avoid confusion for the users of the custom object. Just try not to reuse a code-letter that already exists in similar object types (make reference to the formatspec mini language to see what is already available)

A Hashable Vector2d #

what hashability unlocks:
- we can keep sets of our custom object
- we can use our custom object as dict keys
how?
- have to implement __hash__ and __eq__ and make the object immutable.
  technically the immutability (by protecting accesses and using private attrs is not a strict requirement).
- for immutability, fields can be converted to private fields (two leading underscores) and adding @property decorators to define the getter functions for these.

Supporting Positional Pattern Matching #

what are positional patterns?

the examples here are non-positional because each value is provided:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  def keyword_pattern_demo(v: Vector2d) -> None:
          match v:
                  case Vector2d(x=0, y=0):
                          print(f'{v!r} is null')
                  case Vector2d(x=0):
                          print(f'{v!r} is vertical')
                  case Vector2d(y=0):
                          print(f'{v!r} is horizontal')
                  case Vector2d(x=x, y=y) if x==y:
                          print(f'{v!r} is diagonal')
                  case _:
                          print(f'{v!r} is awesome')

so a positional pattern would look something like this: case Vector2d(_, 0):

have to add __match_args__: need to add a class attribute named __match_args__, listing the instance attributes in the order they will be used for positional pattern matching
typically at least have the required args within __match_args__

Complete Listing of Vector2d, Version 3 #

Just to put a pin on the v3 of this didatic example:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
"""
A two-dimensional vector class

>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y)
3.0 4.0
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector2d(3.0, 4.0)
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector2d(0, 0))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector2d.frombytes(bytes(v1))
>>> v1_clone
Vector2d(3.0, 4.0)
>>> v1 == v1_clone
True

Tests of ``format()`` with Cartesian coordinates:
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of the ``angle`` method:
>>> Vector2d(0, 0).angle()
0.0
>>> Vector2d(1, 0).angle()
0.0
>>> epsilon = 10**-8
>>> abs(Vector2d(0, 1).angle() - math.pi/2) < epsilon
True
>>> abs(Vector2d(1, 1).angle() - math.pi/4) < epsilon
True

Tests of ``format()`` with polar coordinates:
>>> format(Vector2d(1, 1), 'p')  # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector2d(1, 1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1, 1), '0.5fp')
'<1.41421, 0.78540>'

Tests of `x` and `y` read-only properties:
>>> v1.x, v1.y
(3.0, 4.0)
>>> v1.x = 123
Traceback (most recent call last):
...
AttributeError: can't set attribute 'x'

Tests of hashing:
>>> v1 = Vector2d(3, 4)
>>> v2 = Vector2d(3.1, 4.2)
>>> len({v1, v2})
2
"""

from array import array
import math


class Vector2d:
    __match_args__ = ('x', 'y')
    typecode = 'd'

    def __init__(self, x, y):
        self.__x = float(x)
        self.__y = float(y)

    @property
    def x(self):
        return self.__x

    @property
    def y(self):
        return self.__y

    def __iter__(self):
        return (i for i in (self.x, self.y))

    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(array(self.typecode, self)))

    def __eq__(self, other):
        if isinstance(other, Vector2d):
            return tuple(self) == tuple(other)
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y))

    def __abs__(self):
        return math.hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))

    def angle(self):
        return math.atan2(self.y, self.x)

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]
            coords = (abs(self), self.angle())
            outer_fmt = '<{}, {}>'
        else:
            coords = self
            outer_fmt = '({}, {})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(*components)

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(*memv)

TO_HABIT: TIL that python has doctests too!
damn.

Private and “Protected” Attributes in Python #

there’s no way to actually make the variables private and immutable though
name mangling: why the double underscores is useful to use
consider the case where if we don’t have a good mechanism for “private” attributes, it’s going to be possible for a child subclass to accidentally overwrite a parent-attribute
with this in mind, the mechanism for using the double underscores is to mangle the name with the class name.
Python stores the name in the instance __dict__ prefixed with a leading underscore and the class name, so in the Dog class, __mood (the “private” attr) becomes _Dog__mood, and in Beagle it’s _Beagle__mood.
This language feature goes by the lovely name of name mangling.
It’s a safety feature (to prevent accidentally access / modification) rather than a security feature.
“Protected attributes”:
for those that don’t like the name mangling feature, they can just use a convention of a single underscore and explicitly naming the attribute in a “mangled” fashion.
this has no special interpreter support though.
- GOTCHA: In modules, a single _ in front of a top-level name does have an effect:
  if you write from mymod import *, the names with a _ prefix are not imported from mymod. However, you can still write from mymod import _privatefunc.

Saving Memory with `slots` #

instance attributes’ storage model:
- typically stored within a dict (__dict__), which has significant memory overhead
- if we use __slots to hold a sequence of attr names, then it’s an alternative storage model.
  stored in a hidden array / references that use less memory than a dict
using slots:
- the __slots__ attribute holds attrs in either a tuple (similar to the __match_args__) or list.
  Tuple is clearer since it implicitly shows that there’s no changing it.
- can only be defined when the class is defined, can’t be updated (referring to its shape) thereafter
- COUNTER-INTUITIVE:
  a subclass only partially inherits the effect of the parent’s __slots__.
  Partial because:
  1. slot attrs defined in the parent will still be stored in the reference array
  2. attrs not stored in the parent will end up being stored in a dict.
  Slots of the supercalsses are added to the slots of the current class.
- we can have both slots (fixed attributes) and dict (dynamic attributes)
if we need weak refs AND we’re using slots, then the slots attributes should include '__weakref__'
NOTE: slots will end up referring to the internal storage variables and match args will ref to the public attribute names for positional pattern matching.

Simple Measure of slot Savings #

here’s how they did the checking of ram usage and such

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import importlib
import sys
import resource

NUM_VECTORS = 10**7

module = None
if len(sys.argv) == 2:
    module_name = sys.argv[1].replace('.py', '')
    module = importlib.import_module(module_name)
else:
    print(f'Usage: {sys.argv[0]} <vector-module-to-test>')

if module is None:
    print('Running test with built-in `complex`')
    cls = complex
else:
    fmt = 'Selected Vector2d type: {.__name__}.{.__name__}'
    print(fmt.format(module, module.Vector2d))
    cls = module.Vector2d

mem_init = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f'Creating {NUM_VECTORS:,} {cls.__qualname__!r} instances')

vectors = [cls(3.0, 4.0) for i in range(NUM_VECTORS)]

mem_final = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f'Initial RAM usage: {mem_init:14,}')
print(f'  Final RAM usage: {mem_final:14,}')

Essentially, the slot version uses half as much RAM and is faster.

Summarizing the Issues with slots #

remember to redeclare __slots__ in each subclass to prevent their instances from having __dict__
the instance slots are binded ONLY at the time when the class is defined
classes can’t use @cached_property decorator unless the slot attributes includes '__dict__'
class instances can’t be targets of weak references unless we add '__weakref__' in the slots.

Overriding Class Attributes #

a common use case for class attributes is to provide default values for instance attributes
what happens is that self.typecode gets default resolved by treating it as a class attribute (in the case that there’s no such instance attribute).
If you want to change a class attribute, you must set it on the class directly, not through an instance.
So do Vector2d.typecode = 'f' instead of doing v1.typecode = 'f'
the more idiomatic way is to subclass just to customise a class data attribute
subclass to customise

Chapter Summary #

we can add in more control to our code (adding getters, keeping attributes immutable) on a need-basis since the consumers / places the class will be referred from will not change.
So we can stick to public attributes first.
Implementing setters and getters up-front is a distraction from the goal. In Python, we can simply use public attributes, knowing we can change them to properties later, if the need arises.
extra: in java’s enforced privacy idioms, the privacy is only really enforced if there’s a Java Security Manager that is configured and deployed on top of the java app.
this is because the reflections API (introspection tooling for Java) can be used to read the value of a private field.