Be careful using python’s dataclass
@dataclass
is a powerful tool for creating data containers with minimal boilerplate code. However, it introduces a subtle pitfall when working with mutable defaults like lists or NumPy arrays. This article explores this common issue, its root cause, and how to fix it effectively.
## 1 What’s the Problem with Mutable Defaults?
When using @dataclass
, attributes defined with default values can behave unexpectedly if the default value is a mutable object. In Python, mutable objects (e.g., lists, dictionaries, NumPy arrays) are shared across all instances if defined at the class level. This can lead to unintentional coupling between instances.
1.1 A Stress-Strain Data Class
I encountered many of these issues when I tried to rewrite an old code of mine following clean architecture princicples. It is my FEM solver I did during my PhD. Let’s have a look at one of the dataclasses CStressStrainData
. The goal is to store and manipulate stress-strain-related properties such as stress, strain, and energy which occur during processing in one integration point.
Here’s a straightforward implementation using @dataclass
:
from dataclasses import dataclass
import numpy as np
@dataclass
class CStressStrainData:
= np.zeros(4)
stress: np.ndarray = np.zeros(4)
eps_total: np.ndarray float = 0 energy:
Different integration algorithms requiring the a converged solution to start in each step. A class which integrates the current and the converged data is MaterialModel
class. We do not need to bother with the actual intend of the class.
class MaterialModel:
def __init__(self):
self.stress_strain_converged = CStressStrainData()
self.stress_strain = CStressStrainData()
At first glance, this seems fine. Each instance of CStressStrainData
appears independent. However, this assumption is incorrect.
2 Why This Happens: The Core of Mutable Defaults
In Python:
- Immutable types (e.g., integers, floats, strings) are passed by value.
- Mutable types (e.g., lists, dictionaries, NumPy arrays) are passed by reference.
When you define a default value like np.zeros(4)
, it becomes a class attribute, shared among all instances of the class. Any modification affects every instance referencing the same object.
In our example, both model1.stress_strain.eps_total
and model2.stress_strain.eps_total
point to the same NumPy array.
2.1 Fixing the Mutable Default Issue
The solution is to ensure that each instance gets its own copy of the mutable object. In @dataclass
, this can be achieved using field(default_factory=...)
for mutable defaults.
Here’s the corrected version:
from dataclasses import dataclass, field
@dataclass
class CStressStrainData:
= field(default_factory=lambda: np.zeros(4))
stress: np.ndarray = field(default_factory=lambda: np.zeros(4))
eps_total: np.ndarray float = 0 energy:
2.2 Why Does This Work?
- The expression
field(default_factory=...)
ensures that a new object is created for each instance during initialization. - The lambda function (
lambda: np.zeros(4)
) ensures that the factory function is called each time, creating an independent NumPy array.
Now, the behavior is as expected:
= MaterialModel()
model1 = MaterialModel()
model2
+= np.array([1, 0, 0, 0])
model1.stress_strain.eps_total print(model2.stress_strain.eps_total) # Output: [0. 0. 0. 0.]
Each instance of CStressStrainData
now has its own independent eps_total
.
3 When to Use @dataclass
and Mutable Defaults
3.1 Pros of @dataclass
:
- Reduces boilerplate code by generating
__init__
,__repr__
, and other methods. - Works well for simple data containers with default values.
3.2 Cons of @dataclass
with Mutable Defaults:
- Requires careful handling of mutable types to avoid shared state issues.
- Can become awkward for complex initialization logic.
3.3 General Rules:
- Use
field(default_factory=...)
for mutable defaults. - Avoid defining mutable objects directly as default values.
3.4 For complex classes switch back to traditional classses
If the class has complex initialization or significant behavior, a traditional class definition might be more appropriate:
class CStressStrainData:
def __init__(self):
self.stress = np.zeros(4)
self.eps_total = np.zeros(4)
self.energy = 0
This approach avoids the pitfalls of shared mutable defaults and offers greater flexibility.
4 Key Takeaways
Understand mutable defaults:
- Avoid using mutable objects like lists or NumPy arrays as direct default values.
Use
field(default_factory=...)
:- It’s the correct way to define mutable defaults in
@dataclass
.
- It’s the correct way to define mutable defaults in
Test for shared references:
- Use
id()
or inspect behavior to confirm objects are independent.
- Use
Know when to skip
@dataclass
:- If initialization or behavior is complex, a regular class might be a better choice.
By following these best practices, you can leverage the power of @dataclass
without falling into the mutable defaults trap.
5 Conclusion
Mutable defaults can be a subtle but impactful bug in Python. Using @dataclass
is a great way to simplify your code, but you must handle mutable objects carefully. With field(default_factory=...)
and proper design, you can avoid unexpected behavior and keep your code clean and robust.