- rtshkmr's digital garden/
- Readings/
- Books/
- Fluent Python: Clear, Concise, and Effective Programming – Luciano Ramalho/
- Chapter 12. Special Methods for Sequences/
Chapter 12. Special Methods for Sequences
Table of Contents
Objectives for this chapter:
Make the toy vector implementation behave like a standard Python immutable flat sequence. with float elements
supports the following:
basic sequence protocol
__len__and___getitem__safe representation of instances with many items
slicing supported properly
aggregate hashing that considers every element
custom formatting extensions
Useful TRICKS:
- we can get the class of an instance by doing
cls = type(self)
- we can get the class of an instance by doing
What’s New in This Chapter #
Vector: A User-Defined Sequence Type #
Vector space benefits: use cases of vectors > 3 dims:
- for implementing N-dimensional vectors for info retrieval
- for vector space models, cosine similarity is usually the key metric for relevance.
the takes on the vector implementation behaviour are not mutually exclusive, they build on each other
Vector Take #1: Vector2d Compatible #
the best practice for a sequence constructor is to take the data as an iterable argument in the constructor, like all built-in sequence types do.
remember the goal for a good implementation of
__repr__is that it should give serviceable output such that a user can have a chance of identifying the receiver (self).the
reprlib.repr()can be used to get a limited-lenght representation
Protocols and Duck Typing #
- Protocols:
context of object-oriented programming, a protocol is an informal interface, defined only in documentation and not in code.
it’s ONLY a typing/tooling construct for static analysis, it supports structural subtyping / static duck-typing.
we can partially implement part of a protocol if we wish, depending on the contextual requirements
there’s 2 kinds of protocols:
static protocols
Definition:
Static protocols in Python refer to protocol classes (from typing.Protocol) that exist solely for static type analysis during development—they have no effect at runtime unless specially marked.
Purpose:
To provide interfaces that static type checkers (like mypy or Pyright) can use for verifying whether an object “matches” a required set of methods/attributes, regardless of explicit inheritance.
Behavior:
A class matches a static protocol if it provides ALL required methods/attributes (matching names and type signatures).
There is no runtime enforcement or validation by default—type conformance is only checked when tools like mypy analyze your code.
Classes do not need to inherit from the protocol to be considered as conforming to it for static analysis
Use case:
Ensuring that different objects used in a function provide a required interface (“static duck typing”), enabling type-safe polymorphism and generic programming.
1 2 3 4 5 6from typing import Protocol class SupportsClose(Protocol): def close(self) -> None: ... # Any class with a .close() method matches SupportsClose for type checkingdynamic protocols Definition:
Dynamic protocols are protocol classes designed to support runtime checking of protocol conformance, in addition to static analysis.
Purpose:
To enable both static type checking and runtime assertions that an object supports a given protocol interface.
How:
Achieved by decorating the protocol class with
@typing.runtime_checkableBehavior:
At runtime, you can use
isinstance(obj, ProtocolClass)to check if an object supports the protocol (i.e., implements the required methods/attributes).The protocol still does not require explicit inheritance—conformance is structural.
1 2 3 4 5 6 7 8from typing import Protocol, runtime_checkable @runtime_checkable class SupportsClose(Protocol): def close(self) -> None: ... obj = open("file.txt") isinstance(obj, SupportsClose) # True if .close() exists with correct signature
Vector Take #2: A Sliceable Sequence #
delegation is an easy way to support the protocol.
Have to ensure that the types don’t change for the ones that are supposed to return our custom type, the example being used here is for slice functionality, it’s in these instances that we can’t just use delegation and have to explicitly handle it.
How Slicing Works #
- some observations on how slicing is handled:
the accessor
s[1:5]returns a slice objectwe can have multiple slices in our accessing if we do something like
s[1:5, 8:10]and we’ll get something like this:(slice(1, 5, None), slice(8, 10, None))from which we conclude:
it’s a tuple (of
sliceobjects) that is being returnedthe tuple may return multiple
sliceobjects
sliceis a builtin type, with attrsstart,stop,stepandindiceswe found this by doing
dir(slice)indices exposes the tricky logic that’s implemented in the built-in sequences to gracefully handle missing or negative indices and slices that are longer than the original sequence. This method produces “normalized” tuples of non-negative start, stop, and stride integers tailored to a sequence of the given length.
NOTE: we don’t need to implement this for the vector example here because we’ll be delegating it to the
_componentsarray
A Slice-Aware __getitem__ #
to make
Vectorbehave as a sequence, we need__len__and__getitem__both are essential to handle slicing correctly
There’s 2 cases to handle:
case 1: we’re accessing via a slice
in this case, we have to extract out the class and then build another Vector instance from the slice of the components array.
this is what allows us to properly return Vector classes on sliced accesses.
case 2: we’re accessing via a single index
then we can extract out the index from the key using
operator.index(key)operator.index()function calls the__index__special method. The function and the special method. It’s defined in this PEP 357it’s different from
intin the sense thatoperator.index()will return aTypeErrorfor non-int arguments supplied as an attempt to access an index.
1 2 3 4 5 6 7 8 9 10 11 12def __len__(self): return len(self._components) def __getitem__(self, key): # case 1: we're accessing via a slice if isinstance(key, slice): cls = type(self) return cls(self._components[key]) # case 2: we're accessing via a single index index = operator.index(key) return self._components[index]
Vector Take #3: Dynamic Attribute Access #
the
__getattr__is the fallback function if a name is not found within the various hierarchy graphs (not in instance, not in class, not in inheritance graph)KIV part 4 of the textbook for more info on attribute lookups
1 2 3 4 5 6 7 8 9 10 11 12 13 14__match_args__ = ('x', 'y', 'z', 't') # allows positional pattern matching def __getattr__(self, name): cls = type(self) try: pos = cls.__match_args__.index(name) except ValueError: pos = -1 if 0 <= pos < len(self._components): return self._components[pos] msg = f'{cls.__name__!r} object has no attribute {name!r}' raise AttributeError(msg)GOTCHA: since
__getattr__is a fallback, the following snippet behaves inaccuratelythis is because when we do the
v.x, it gets accessed to a new attribute calledv.xwithin instancev. Therefore, the name resolution never gets done by the fallback (__getattr__)The implementation for
__getattr__also doesn’t account for such names\(\implies\) we implement
__setattr__because the problem here is in the attribute setting, that’s not behaving properly here.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15def __setattr__(self, name, value): cls = type(self) if len(name) == 1: if name in cls.__match_args__: error = 'readonly attribute {attr_name!r}' elif name.islower(): error = "can't set attributes 'a' to 'z' in {cls_name!r}" else: error = '' if error: msg = error.format(cls_name=cls.__name__, attr_name=name) raise AttributeError(msg) # default: use the superclass's __setattr__ super().__setattr__(name, value)For this example, we want the
xandyto be readonly, that’s why we’re throwing attribute errors.NOTE: usually getters and setters come together to ensure some consistency in the use of the objects.
here, we had to implement both
__getattr__and__setattr__NOTE: we shouldn’t use
__slots__as a shortcut to prevent instance attribute creation, they should be used only to save memory, when needed. In this case, we prevent readonly attribute overwrites by implementing the__setattr__properly that handles this.
Vector Take #4: Hashing and a Faster == #
implementing the hash function that is performant
1 2 3 4 5 6 7 8 9 10import functools import operator def __eq__(self, other): return tuple(self) == tuple(other) def __hash__(self): # NOTE: use generator here for lazy operations. hashes = (hash(x) for x in self._components) return functools.reduce(operator.xor, hashes, 0)alternatively, hash could have been implemented as:
1 2 3def __hash__(self): hashes = map(hash, self._components) return functools.reduce(operator.xor, hashes)so the fast hash here can use an XOR:
functools.reduce(lambda a, b: a ^ b, range(n))or using
operator.xorlike sofunctools.reduce(operator.xor, range(n))interesting: we can see the initializer ALSo as a value to return on empty sequence (in addition to the usual “first argument in the reducing loop”).
for
+,|,^the initializer should be0, but for*,&it should be1.TO_HABIT: remember that
operatorprovides the functionality of all Python infix operators in function form, so using it will prevent custom lambda definitionsTO_HABIT: using
functools.reducefor the fast compute of a hash with huge number of components is a good use case for using reduce.
improving the performance of
__eq__doing the tuple conversion will be expensive for large vectors.the better implementation reminds me of Java style:
1 2 3 4 5 6 7def __eq__(self, other): if len(self) != len(other): return False for a, b in zip(self, other): if a != b: return False return Truea one liner:
1 2def __eq__(self, other): return len(self) == len(other) and all(a == b for a, b in zip(self, other))
Vector Take #5: Formatting #
Chapter Summary #
So this is the final code, vector_v5.py:
| |
- uses
itertools.chainfor the__format__function - KIV the generator tricks until chapter 17
Further Reading #
reducehas other names in the CS world!The powerful
reducehigher-order function is also known asfold,accumulate,aggregate,compress, andinject.See the wiki link.
you can often tell when a protocol is being discussed when you see language like “a file-like object.” This is a quick way of saying “something that behaves sufficiently like a file, by implementing the parts of the file interface that are relevant in the context.”
it’s not sloppy to implement a protocol partially (for dynamic protocols)
When implementing a class that emulates any built-in type, it is important that the emulation only be implemented to the degree that it makes sense for the object being modeled. For example, some sequences may work well with retrieval of individual elements, but extracting a slice may not make sense.
this KISS-es it.
for more strictness, we can make it a static protocol wherein everything needs to be implemented