Chapter 12. Special Methods for Sequences

Table of Contents

Objectives for this chapter:
1. Make the toy vector implementation behave like a standard Python immutable flat sequence. with float elements
2. supports the following:
  1. basic sequence protocol __len__ and ___getitem__
  2. safe representation of instances with many items
  3. slicing supported properly
  4. aggregate hashing that considers every element
  5. custom formatting extensions
Useful TRICKS:
- we can get the class of an instance by doing cls = type(self)

What’s New in This Chapter #

Vector: A User-Defined Sequence Type #

Vector space benefits: use cases of vectors > 3 dims:
- for implementing N-dimensional vectors for info retrieval
- for vector space models, cosine similarity is usually the key metric for relevance.
the takes on the vector implementation behaviour are not mutually exclusive, they build on each other

Vector Take #1: Vector2d Compatible #

the best practice for a sequence constructor is to take the data as an iterable argument in the constructor, like all built-in sequence types do.
remember the goal for a good implementation of __repr__ is that it should give serviceable output such that a user can have a chance of identifying the receiver (self).
the reprlib.repr() can be used to get a limited-lenght representation

Protocols and Duck Typing #

Protocols:

context of object-oriented programming, a protocol is an informal interface, defined only in documentation and not in code.
it’s ONLY a typing/tooling construct for static analysis, it supports structural subtyping / static duck-typing.
we can partially implement part of a protocol if we wish, depending on the contextual requirements

there’s 2 kinds of protocols:

static protocols

Definition:

Static protocols in Python refer to protocol classes (from typing.Protocol) that exist solely for static type analysis during development—they have no effect at runtime unless specially marked.

Purpose:

To provide interfaces that static type checkers (like mypy or Pyright) can use for verifying whether an object “matches” a required set of methods/attributes, regardless of explicit inheritance.

Behavior:

A class matches a static protocol if it provides ALL required methods/attributes (matching names and type signatures).

There is no runtime enforcement or validation by default—type conformance is only checked when tools like mypy analyze your code.

Classes do not need to inherit from the protocol to be considered as conforming to it for static analysis

Use case:

Ensuring that different objects used in a function provide a required interface (“static duck typing”), enabling type-safe polymorphism and generic programming.

1
2
3
4
5
6
      from typing import Protocol

      class SupportsClose(Protocol):
          def close(self) -> None: ...

      # Any class with a .close() method matches SupportsClose for type checking

dynamic protocols Definition:

Dynamic protocols are protocol classes designed to support runtime checking of protocol conformance, in addition to static analysis.

Purpose:

To enable both static type checking and runtime assertions that an object supports a given protocol interface.

How:

Achieved by decorating the protocol class with @typing.runtime_checkable

Behavior:

At runtime, you can use isinstance(obj, ProtocolClass) to check if an object supports the protocol (i.e., implements the required methods/attributes).

The protocol still does not require explicit inheritance—conformance is structural.

1
2
3
4
5
6
7
8
      from typing import Protocol, runtime_checkable

      @runtime_checkable
      class SupportsClose(Protocol):
          def close(self) -> None: ...

      obj = open("file.txt")
      isinstance(obj, SupportsClose)  # True if .close() exists with correct signature

Vector Take #2: A Sliceable Sequence #

delegation is an easy way to support the protocol.
Have to ensure that the types don’t change for the ones that are supposed to return our custom type, the example being used here is for slice functionality, it’s in these instances that we can’t just use delegation and have to explicitly handle it.

How Slicing Works #

some observations on how slicing is handled:
1. the accessor s[1:5] returns a slice object
2. we can have multiple slices in our accessing if we do something like s[1:5, 8:10] and we’ll get something like this:
  (slice(1, 5, None), slice(8, 10, None))
  from which we conclude:
  1. it’s a tuple (of slice objects) that is being returned
  2. the tuple may return multiple slice objects
3. slice is a builtin type, with attrs start, stop, step and indices
  we found this by doing dir(slice)
  - indices exposes the tricky logic that’s implemented in the built-in sequences to gracefully handle missing or negative indices and slices that are longer than the original sequence. This method produces “normalized” tuples of non-negative start, stop, and stride integers tailored to a sequence of the given length.
    NOTE: we don’t need to implement this for the vector example here because we’ll be delegating it to the _components array

A Slice-Aware `getitem` #

to make Vector behave as a sequence, we need __len__ and __getitem__

both are essential to handle slicing correctly

There’s 2 cases to handle:

case 1: we’re accessing via a slice
in this case, we have to extract out the class and then build another Vector instance from the slice of the components array.
this is what allows us to properly return Vector classes on sliced accesses.
case 2: we’re accessing via a single index
then we can extract out the index from the key using operator.index(key)
operator.index() function calls the __index__ special method. The function and the special method. It’s defined in this PEP 357
it’s different from int in the sense that operator.index() will return a TypeError for non-int arguments supplied as an attempt to access an index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  def __len__(self):
          return len(self._components)

  def __getitem__(self, key):
          # case 1: we're accessing via a slice
          if isinstance(key, slice):
                  cls = type(self)
                  return cls(self._components[key])

          # case 2: we're accessing via a single index
          index = operator.index(key)
          return self._components[index]

Vector Take #3: Dynamic Attribute Access #

the __getattr__ is the fallback function if a name is not found within the various hierarchy graphs (not in instance, not in class, not in inheritance graph)

KIV part 4 of the textbook for more info on attribute lookups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  __match_args__ = ('x', 'y', 'z', 't') # allows positional pattern matching

  def __getattr__(self, name):
          cls = type(self)
          try:
                  pos = cls.__match_args__.index(name)
          except ValueError:
                  pos = -1

          if 0 <= pos < len(self._components):
                  return self._components[pos]

          msg = f'{cls.__name__!r} object has no attribute {name!r}'
          raise AttributeError(msg)

GOTCHA: since __getattr__ is a fallback, the following snippet behaves inaccurately

this is because when we do the v.x, it gets accessed to a new attribute called v.x within instance v. Therefore, the name resolution never gets done by the fallback (__getattr__)

The implementation for __getattr__ also doesn’t account for such names

\(\implies\) we implement __setattr__ because the problem here is in the attribute setting, that’s not behaving properly here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  def __setattr__(self, name, value):
          cls = type(self)
          if len(name) == 1:
                  if name in cls.__match_args__:
                          error = 'readonly attribute {attr_name!r}'
                  elif name.islower():
                          error = "can't set attributes 'a' to 'z' in {cls_name!r}"
                  else:
                          error = ''
                  if error:
                          msg = error.format(cls_name=cls.__name__, attr_name=name)
                          raise AttributeError(msg)

          # default: use the superclass's __setattr__
          super().__setattr__(name, value)

For this example, we want the x and y to be readonly, that’s why we’re throwing attribute errors.

NOTE: usually getters and setters come together to ensure some consistency in the use of the objects.
here, we had to implement both __getattr__ and __setattr__
NOTE: we shouldn’t use __slots__ as a shortcut to prevent instance attribute creation, they should be used only to save memory, when needed. In this case, we prevent readonly attribute overwrites by implementing the __setattr__ properly that handles this.

Vector Take #4: Hashing and a Faster == #

implementing the hash function that is performant

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  import functools
  import operator

  def __eq__(self, other):
          return tuple(self) == tuple(other)

  def __hash__(self):
          # NOTE: use generator here for lazy operations.
          hashes = (hash(x) for x in self._components)
          return functools.reduce(operator.xor, hashes, 0)

alternatively, hash could have been implemented as:

1
2
3
          def __hash__(self):
                  hashes = map(hash, self._components)
                  return functools.reduce(operator.xor, hashes)

so the fast hash here can use an XOR:
functools.reduce(lambda a, b: a ^ b, range(n))
or using operator.xor like so functools.reduce(operator.xor, range(n))
interesting: we can see the initializer ALSo as a value to return on empty sequence (in addition to the usual “first argument in the reducing loop”).
for +, |, ^ the initializer should be 0, but for *, & it should be 1.
TO_HABIT: remember that operator provides the functionality of all Python infix operators in function form, so using it will prevent custom lambda definitions
TO_HABIT: using functools.reduce for the fast compute of a hash with huge number of components is a good use case for using reduce.

improving the performance of __eq__ doing the tuple conversion will be expensive for large vectors.

the better implementation reminds me of Java style:

1
2
3
4
5
6
7
  def __eq__(self, other):
          if len(self) != len(other):
                  return False
          for a, b in zip(self, other):
                  if a != b:
                          return False
          return True

a one liner:

1
2
  def __eq__(self, other):
          return len(self) == len(other) and all(a == b for a, b in zip(self, other))

Vector Take #5: Formatting #

Chapter Summary #

So this is the final code, vector_v5.py:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
"""
A multidimensional ``Vector`` class, take 5

A ``Vector`` is built from an iterable of numbers:
>>> Vector([3.1, 4.2])
Vector([3.1, 4.2])
>>> Vector((3, 4, 5))
Vector([3.0, 4.0, 5.0])
>>> Vector(range(10))
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])

Tests with two dimensions (same results as ``vector2d_v1.py``):
>>> v1 = Vector([3, 4])
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector([3.0, 4.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector([0, 0]))
(True, False)

Test of ``.frombytes()`` class method:
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0])
>>> v1 == v1_clone
True

Tests with three dimensions:
>>> v1 = Vector([3, 4, 5])
>>> x, y, z = v1
>>> x, y, z
(3.0, 4.0, 5.0)
>>> v1
Vector([3.0, 4.0, 5.0])
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0, 5.0)
>>> abs(v1) # doctest:+ELLIPSIS
7.071067811...
>>> bool(v1), bool(Vector([0, 0, 0]))
(True, False)

Tests with many dimensions:
>>> v7 = Vector(range(7))
>>> v7
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])
>>> abs(v7) # doctest:+ELLIPSIS
9.53939201...

Test of ``.__bytes__`` and ``.frombytes()`` methods:
>>> v1 = Vector([3, 4, 5])
>>> v1_clone = Vector.frombytes(bytes(v1))
>>> v1_clone
Vector([3.0, 4.0, 5.0])
>>> v1 == v1_clone
True

Tests of sequence behavior:
>>> v1 = Vector([3, 4, 5])
>>> len(v1)
3
>>> v1[0], v1[len(v1)-1], v1[-1]
(3.0, 5.0, 5.0)

Test of slicing:
>>> v7 = Vector(range(7))
>>> v7[-1]
6.0
>>> v7[1:4]
Vector([1.0, 2.0, 3.0])
>>> v7[-1:]
Vector([6.0])
>>> v7[1,2]
Traceback (most recent call last):
...
TypeError: 'tuple' object cannot be interpreted as an integer

Tests of dynamic attribute access:
>>> v7 = Vector(range(10))
>>> v7.x
0.0
>>> v7.y, v7.z, v7.t
(1.0, 2.0, 3.0)

Dynamic attribute lookup failures:
>>> v7.k
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'k'
>>> v3 = Vector(range(3))
>>> v3.t
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 't'
>>> v3.spam
Traceback (most recent call last):
...
AttributeError: 'Vector' object has no attribute 'spam'

Tests of hashing:
>>> v1 = Vector([3, 4])
>>> v2 = Vector([3.1, 4.2])
>>> v3 = Vector([3, 4, 5])
>>> v6 = Vector(range(6))
>>> hash(v1), hash(v3), hash(v6)
(7, 2, 1)

Most hash codes of non-integers vary from a 32-bit to 64-bit CPython build:
>>> import sys
>>> hash(v2) == (384307168202284039 if sys.maxsize > 2**32 else 357915986)
True

Tests of ``format()`` with Cartesian coordinates in 2D:
>>> v1 = Vector([3, 4])
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

Tests of ``format()`` with Cartesian coordinates in 3D and 7D:
>>> v3 = Vector([3, 4, 5])
>>> format(v3)
'(3.0, 4.0, 5.0)'
>>> format(Vector(range(7)))
'(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0)'

Tests of ``format()`` with spherical coordinates in 2D, 3D and 4D:
>>> format(Vector([1, 1]), 'h') # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector([1, 1]), '.3eh')
'<1.414e+00, 7.854e-01>'
>>> format(Vector([1, 1]), '0.5fh')
'<1.41421, 0.78540>'
>>> format(Vector([1, 1, 1]), 'h') # doctest:+ELLIPSIS
'<1.73205..., 0.95531..., 0.78539...>'
>>> format(Vector([2, 2, 2]), '.3eh')
'<3.464e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 0, 0]), '0.5fh')
'<0.00000, 0.00000, 0.00000>'
>>> format(Vector([-1, -1, -1, -1]), 'h') # doctest:+ELLIPSIS
'<2.0, 2.09439..., 2.18627..., 3.92699...>'
>>> format(Vector([2, 2, 2, 2]), '.3eh')
'<4.000e+00, 1.047e+00, 9.553e-01, 7.854e-01>'
>>> format(Vector([0, 1, 0, 0]), '0.5fh')
'<1.00000, 1.57080, 0.00000, 0.00000>'
"""

from array import array
import reprlib
import math
import functools
import operator
import itertools

class Vector:
    typecode = 'd'
    __match_args__ = ('x', 'y', 'z', 't')

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find('['):-1]
        return f'Vector({components})'

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __eq__(self, other):
        return (len(self) == len(other) and
                all(a == b for a, b in zip(self, other)))

    def __hash__(self):
        hashes = (hash(x) for x in self)
        return functools.reduce(operator.xor, hashes, 0)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
        if 0 <= pos < len(self._components):
            return self._components[pos]
        msg = f"{cls.__name__!r} object has no attribute {name!r}"
        raise AttributeError(msg)

    def angle(self, n):
        r = math.hypot(*self[n:])
        a = math.atan2(r, self[n-1])
        if (n == len(self) - 1) and (self[-1] < 0):
            return math.pi * 2 - a
        else:
            return a

    def angles(self):
        return (self.angle(n) for n in range(1, len(self)))

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('h'):  # hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = itertools.chain([abs(self)], self.angles())
            outer_fmt = '<{}>'
        else:
            coords = self
            outer_fmt = '({})'
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt.format(', '.join(components))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

uses itertools.chain for the __format__ function
KIV the generator tricks until chapter 17