- rtshkmr's digital garden/
- Readings/
- Books/
- Fluent Python: Clear, Concise, and Effective Programming – Luciano Ramalho/
- Chapter 6. Object References, Mutability, and Recycling/
Chapter 6. Object References, Mutability, and Recycling
Table of Contents
What’s New in This Chapter #
Variables Are Not Boxes, they are labels #
updated mental model and language #
We should see it as a “to bind” instead of “to assign” whereby a name is bound to an object.
A sticky note is a better image rather than a box.
Identity, Equality, and Aliases #
id() checking #
the is operator does id checking, the = operator uses whatever the __eq__ is defined as (typically value-based checking).
programming. Identity checks are most often done with the is operator, which compares the object IDs, so our code doesn’t need to call id() explicitly.
Choosing Between == and is #
for equality check, use ==
for identity check, use is, this avoids the direct use of id() since
is is used when comparing with singletons – typically just None #
So the correct way to do None check is via a singleton
However, if you are comparing a variable to a singleton, then it makes sense to use is. By far, the most common case is checking whether a variable is bound to None. This is the recommended way to do it: x is None And the proper way to write its negation is: x is not None None is the most common singleton we test with is.
Default to == if unsure #
is is faster than == because it can’t be overloaded #
The Relative Immutability of Tuples #
this is why tuples are unhashable, because they are container types and though they are immutable, their containees may not be
Copies Are Shallow by Default #
shallow copying is more of a problem if mutable items within the inner nestings #
This saves memory and causes no problems if all the items are immutable. But if there are mutable items, this may lead to unpleasant surprises.
shallow-copy negative example #
the example below will demonstrate how when the inner element is mutable, then only the reference is copied, so if we modify that, then the original mutable entity gets mutated.
l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = list(l1)
l1.append(100)
l1[1].remove(55) # removal removes from both of the nested arrays since it's the same reference
print('l1:', l1)
print('l2:', l2)
l2[1] += [33, 22]
l2[2] += (10, 11)
print('l1:', l1)
print('l2:', l2)
Deep and Shallow Copies of Arbitrary Objects #
complexity in cyclical references #
if it’s a naive implementation, circular references can give deepcopying an issue, but the usual deepcopy will handle things alright, not to worry
this is because deepcopy does a graph-traversal of the original object and uses a memo table to keep track of references.
Note that making deep copies is not a simple matter in the general case. Objects may have cyclic references that would cause a naïve algorithm to enter an infinite loop. The deepcopy function remembers the objects already copied to handle cyclic refer‐ ences gracefully.
Function Parameters as References #
“Call by sharing”/“pass by reference” is the only mode of parameter passing in python.
This is usually the case for OOP languages in general (JS, Ruby, Java [though in Java, primitive types are call by value])
Mutable Types as Parameter Defaults: Bad Idea #
the default params, if mutable and used, will all point to the same SHARED mutable obj since the params are just aliases to it #
issue with mutable defaults explains why None is commonly used as the default value for parameters that may receive mutable values.
demonstrates, when a HauntedBus is instantiated with passengers, it works as expected. Strange things happen only when a HauntedBus starts empty, because then self.passengers becomes an alias for the default value of the passengers parameter. The problem is that each default value is evaluated when the function is defined—i.e., usually when the module is loaded—and the default values become attributes of the function object. So if a default value is a mutable object, and you change it, the change will affect every future call of the function.
Defensive Programming with Mutable Parameters #
Principle of Least Astonishment == no surprising side-effects #
TwilightBus violates the “Principle of least astonishment,” a best practice of interface design.3 It surely is astonishing that when the bus drops a student, their name is removed from the basketball team roster.
### TL;DR: Principle of Least Astonishment (POLA)
The **Principle of Least Astonishment** (POLA), also known as the **Principle of Least Surprise**, is a design guideline in user interface and software design that emphasizes creating systems that behave in ways users expect. The main idea is to minimize confusion and surprises, ensuring that interactions are intuitive and predictable.
#### Key Points:
- **User Expectations**: Systems should align with users' mental models and past experiences to reduce cognitive load and learning curves.
- **Behavior Consistency**: Components of a system should operate consistently, following common conventions to avoid unexpected behavior.
- **Applications**: POLA applies across various aspects of design, including user interfaces, API design, and error handling.
- **Benefits**: Adhering to POLA leads to improved usability, reduced development time, enhanced maintainability, and increased user satisfaction.
By following the Principle of Least Astonishment, designers can create more intuitive and user-friendly applications that enhance overall user experience.
Citations:
[1] https://pointjupiter.com/ultimate-guide-principle-of-least-astonishment-pola/
[2] https://en.wikipedia.org/wiki/Least_surprise
[3] https://deviq.com/principles/principle-of-least-astonishment/
[4] https://usertimes.io/2018/12/07/the-principle-of-least-astonishment/
[5] https://www.centercode.com/glossary/principle-of-least-surprise
[6] https://www.linkedin.com/pulse/principle-least-surprise-incus-data-pty-ltd
[7] https://dovetail.com/ux/principle-of-least-surprise/
[8] https://barrgroup.com/blog/how-endianness-works-big-endian-vs-little-endian
rule of thumb on when to alias vs make a copy on mutable args #
Just make a copy if you’re not sure (when you’re going to be consuming a mutable argument).
Unless a method is explicitly intended to mutate an object received as an argument, you should think twice before aliasing the argu‐ ment object by simply assigning it to an instance variable in your class. If in doubt, make a copy. Your clients will be happier. Of course, making a copy is not free: there is a cost in CPU and mem‐ ory. However, an API that causes subtle bugs is usually a bigger problem than one that is a little slower or uses more resources.
del and Garbage Collection #
del is a statement and not a function, that’s why ew don’t do del(x), we do del x (though, this will work too)
Weak references are useful to have pointers but not affect refcount for an obj #
- good to do monitoring / caching activities using weak references
- see this for more elaboration: Weak References | Fluent Python, the lizard book
- To inspect whether an object is still alive without holding a strong reference, Python provides the weakref module. A weakref to an object returns None if the object has been garbage collected, effectively giving you a safe way to test “dangling-ness”:
import weakref class MyClass: pass obj = MyClass() obj_id = id(obj) weak_obj = weakref.ref(obj) print(weak_obj()) # <MyClass object at ...> del obj print(weak_obj()) # None, indicating the original object was garbage collected
This works because finalize holds a weak reference to {1, 2, 3}. Weak references to an object do not increase its reference count. Therefore, a weak reference does not prevent the target object from being garbage collected. Weak references are useful in caching applica‐ tions because you don’t want the cached objects to be kept alive just because they are referenced by the cache.
we can actually use ctypes to read memory spaces directly! #
this memory location will have to be casted first though.
import ctypes
x = 42
address = id(x)
# Use ctypes to cast the address back to a Python object and get its value
value = ctypes.cast(address, ctypes.py_object).value
print(value) # Output: 42
the __del__ method is more like a fini teardown #
unlikely that we actually will need to implement it.
if implemented for a class, it gets called by the interpreter before freeing up the memory.
also kind of depends on the implementation of python itself, e.g. some might keep track of more than just refcounts.
Tricks Python Plays with Immutables \(\rightarrow\) Interned Immutables #
Interning as an optimisation technique for the internal python implementation #
Basically some strings and common ints are shared memory, avoids unnecessary duplication.
- won’t work if you use
.copy() - won’t work if you use
[:]
NOTE: What is interened or not can’t be always determined because that implementation detail is undocumented.
NOTE: therefore, for immutables, always check sameness using = instead of =is
The sharing of string literals is an optimization technique called interning. CPython uses a similar technique with small integers to avoid unnecessary duplication of num‐ bers that appear frequently in programs like 0, 1, –1, etc. Note that CPython does not intern all strings or integers, and the criteria it uses to do so is an undocumented implementation detail. Never depend on str or int interning! Always use == instead of is to compare strings or integers for equality. Interning is an optimi‐ zation for internal use of the Python interpreter.
Chapter Summary #
Practical Consequences of using references #
- simple assignment doen’t create copies
- for augmented assignments e.g.
+=,*=, it depends on the LHS variable:- if bound to immutable object, then it creates new objects
- if bound to mutable object, then it modifies that object in place
- re-binding: assigning a new value to an existing variable doesn’t change the object previously bound to it, the var is just boudn to a different object.
- function params are passed as aliases
- mutable objects may get mutated unless the consumer function copies it
- it’s dangerous to use mutable default values \(\implies\) that’s why the convention is to use
Noneinstead.
Further Reading #
Object identity becomes important only when objects are mutable
- if everything was immutable, it makes no difference whether variables hold actual objects or they hold refs to shared objects (intered). Just comparing them by value would hanve been sufficient.
mutable objects end up being the reason why threaded programming is hard
- if multiple threads mutate objects and the synchronization is not handled correctly, then it leads to corrupted data
GC used to be just refcounts, but that can leak memory (e.g. when there are refcycles to unreachable objects, leading to cyclic garbage). Current GC is a generational GC.
Mental Model:
The memory is thought of as having generations: each generation is a collection of objects grouped by how long they’ve existed.
Younger generations (new objects) are collected (checked for unreachable cycles) frequently. Older generations (objects that survived earlier collections) are checked less often.
More elaboration:
### Generational Garbage Collector in CPython: Mental Model and Rationale #### 1. **Reference Counting Only: Its Limits and Memory Leaks** - **Reference Counting** (the core memory management scheme in CPython) works by keeping a count of how many references exist to each object. When the count drops to zero, memory is released immediately. - **Primary shortcoming:** If objects reference each other (e.g., two lists referencing each other), but nothing outside references them, neither’s count drops to zero. They become **"garbage"**—unreachable—but their counts never reach zero. This is a **classic memory leak**: unused memory that cannot be reclaimed. #### 2. **Generational GC: Solving Cycles and Leaks** To address cyclical references—and reduce overhead—CPython complements refcounting with a **generational garbage collector** (`gc` module). **Mental Model:** - The memory is thought of as having *generations*: each generation is a collection of objects grouped by how long they've existed. - **Younger generations** (new objects) are *collected* (checked for unreachable cycles) frequently. **Older generations** (objects that survived earlier collections) are checked *less* often. #### 3. **Why Generational GC Is Effective** - **Empirical observation:** Most objects in Python die young (they become unreachable soon after they're created). Therefore, checking *new* objects often is efficient. - **Cyclic collection:** - During collection, the GC looks for reference cycles—sets of objects referring only to each other but not from elsewhere. - The GC can safely reclaim all objects in such cycles. - By extending beyond simple refcounting, the cycle detector enables memory occupied by unreachable cycles to be safely released. - **Old objects that survive collections are promoted to older generations**; these are checked less frequently, reducing unnecessary overhead. #### 4. **Generational Structure in CPython** CPython typically uses *three generations*: - **Generation 0**: Collected (checked) most frequently; new objects start here. - **Generation 1**: Objects promoted from gen 0 if they survive one collection. - **Generation 2**: The oldest and least frequently collected generation; objects promoted from gen 1 after surviving further collections. Collections trigger: - Automatically based on allocation thresholds. - Explicitly via the `gc.collect()` API. #### 5. **Memory Leak Solution: How It Works** - **Pure reference counting** cannot detect cyclic garbage, leading to leaks. - **Generational GC** *detects* and *collects* cyclically-linked groups of unreachable objects, returning their memory to the system. - Thus, even if the reference count of an object never drops to zero due to a reference cycle, the GC will eventually detect and collect it if it has become unreachable. #### 6. **Practical Takeaways for Tech Leaders** - **Mental Model:** CPython’s memory management is twofold—reference count for immediacy and generational GC for cycle detection. - **Leak prevention:** Programmers need not (and usually cannot) manually break all cycles; the GC rescues memory otherwise lost in cycles. - **Performance:** The generational design reduces overhead by focusing frequent scans on objects most likely to be garbage. #### 7. **Further Reading and References** - The CPython documentation for the `gc` module provides details and empirical thresholds for collection. - Deep dives into Python’s memory management explain the symbiosis of refcounting and generational GC as a pragmatic solution balancing immediacy, overhead, and completeness (detection of cycles). **In summary:** A generational garbage collector in CPython efficiently manages memory by combining reference counting (for immediate reclamation) with cycle detection (generational collection). This hybrid approach solves the memory leak issue inherent in pure reference-counted systems—cycles are detected and collected—making Python both safe and performant for real-world programs.
Rebinding a ref within a fn body doesn’t effect changes outside the fn because it’s a copy of the ref #
because the function gets a copy of the reference in an argument, rebinding it in the function body has no effect outside of the function.