dataclasses
on this page
overview
dataclasses - standard library module for creating classes that primarily store data
- part of python standard library since 3.7
- reduces boilerplate for simple data containers
- automatic
__init__
,__repr__
,__eq__
- no runtime validation or serialization
- zero dependencies
when to use
use dataclasses when:
- you need simple data containers
- data is already validated elsewhere
- performance matters (minimal overhead)
- you want to avoid external dependencies
don’t use when:
- you need runtime validation
- you need json/yaml serialization
- you’re building apis or handling external data
- complex business logic dominates
basic usage
simple data class
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
@dataclass
class User:
name: str
email: str
age: int
tags: List[str] = field(default_factory=list) # mutable defaults need field()
created_at: datetime = field(default_factory=datetime.now)
immutable data
@dataclass(frozen=True)
class Point:
x: float
y: float
# point = Point(1.0, 2.0)
# point.x = 3.0 # raises FrozenInstanceError
post-init validation
@dataclass
class Email:
address: str
def __post_init__(self):
if "@" not in self.address:
raise ValueError(f"invalid email: {self.address}")
field options
@dataclass
class Config:
# hide from repr
api_key: str = field(repr=False)
# exclude from equality checks
cache_size: int = field(default=100, compare=False)
# not included in __init__
_internal: str = field(default="", init=False)
# keyword-only argument (python 3.10+)
timeout: int = field(default=30, kw_only=True)
performance features
slots (python 3.10+)
@dataclass(slots=True)
class OptimizedData:
x: int
y: int
# 20-30% memory savings
# faster attribute access
# no dynamic attributes allowed
ordered comparisons
@dataclass(order=True)
class Task:
priority: int = field(compare=True)
title: str = field(compare=False)
created: datetime = field(default_factory=datetime.now, compare=False)
# tasks sort by priority only
tasks = [Task(3, "low"), Task(1, "high"), Task(2, "medium")]
sorted_tasks = sorted(tasks) # sorts by priority
pattern matching (python 3.10+)
@dataclass
class Response:
status: int
data: Optional[dict] = None
match response:
case Response(status=200, data=data) if data:
return data
case Response(status=404):
raise NotFound()
case Response(status=status):
raise APIError(f"status {status}")
serialization patterns
basic dict conversion
from dataclasses import asdict, astuple
@dataclass
class Person:
name: str
age: int
person = Person("alice", 30)
data = asdict(person) # {"name": "alice", "age": 30}
values = astuple(person) # ("alice", 30)
json serialization
import json
from dataclasses import asdict
@dataclass
class Person:
name: str
age: int
def to_json(self) -> str:
return json.dumps(asdict(self))
@classmethod
def from_json(cls, json_str: str):
return cls(**json.loads(json_str))
common patterns
configuration with defaults
@dataclass
class DatabaseConfig:
host: str = "localhost"
port: int = 5432
database: str = "myapp"
@classmethod
def from_env(cls):
import os
return cls(
host=os.getenv("DB_HOST", cls.host),
port=int(os.getenv("DB_PORT", cls.port)),
database=os.getenv("DB_NAME", cls.database)
)
inheritance
@dataclass
class BaseModel:
id: int
created_at: datetime = field(default_factory=datetime.now)
@dataclass
class User(BaseModel):
name: str
email: str
# inherits id and created_at
custom init
@dataclass
class Temperature:
celsius: float = field(init=False)
def __init__(self, fahrenheit: float):
self.celsius = (fahrenheit - 32) * 5 / 9
complete example
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
from decimal import Decimal
@dataclass(slots=True)
class Product:
"""immutable product with efficient memory usage"""
id: int
name: str
price: Decimal
tags: List[str] = field(default_factory=list)
def __post_init__(self):
if self.price < 0:
raise ValueError("price cannot be negative")
@dataclass
class Cart:
"""mutable shopping cart with computed properties"""
items: List[Product] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.now)
def add_item(self, product: Product) -> None:
self.items.append(product)
@property
def total(self) -> Decimal:
return sum(item.price for item in self.items)
def to_dict(self) -> dict:
return {
"items": [asdict(item) for item in self.items],
"total": str(self.total),
"created_at": self.created_at.isoformat()
}
comparison with alternatives
feature | dataclasses | pydantic | attrs | sqlalchemy |
---|---|---|---|---|
standard library | ✓ | ✗ | ✗ | ✗ |
runtime validation | ✗ | ✓ | ✓ | ✓ |
serialization | basic | advanced | advanced | orm |
performance | fastest | fast | fast | slower |
learning curve | minimal | moderate | moderate | steep |
best practices
do
- use
frozen=True
for immutable data - use
slots=True
for better performance - use
field(default_factory=...)
for mutable defaults - validate in
__post_init__
when needed - keep dataclasses focused on data
don’t
- use mutable default values directly
- mix heavy business logic into dataclasses
- rely on type hints for runtime validation
- use for classes with many methods
- inherit from multiple dataclasses
limitations
- no automatic type validation at runtime
- limited serialization capabilities
- no schema generation
- no async validation support
- no built-in json schema support
migration path
when dataclasses aren’t enough:
- need validation → pydantic
- need database → sqlalchemy
- need both → see choosing a data model
references
══════════════════════════════════════════════════════════════════