dataclasses
on this page
overview
dataclasses - standard library module for creating classes that primarily store data
- part of python standard library since 3.7
- reduces boilerplate for simple data containers
- automatic __init__,__repr__,__eq__
- no runtime validation or serialization
- zero dependencies
when to use
use dataclasses when:
- you need simple data containers
- data is already validated elsewhere
- performance matters (minimal overhead)
- you want to avoid external dependencies
don’t use when:
- you need runtime validation
- you need json/yaml serialization
- you’re building apis or handling external data
- complex business logic dominates
basic usage
simple data class
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
@dataclass
class User:
    name: str
    email: str
    age: int
    tags: List[str] = field(default_factory=list)  # mutable defaults need field()
    created_at: datetime = field(default_factory=datetime.now)immutable data
@dataclass(frozen=True)
class Point:
    x: float
    y: float
# point = Point(1.0, 2.0)
# point.x = 3.0  # raises FrozenInstanceErrorpost-init validation
@dataclass
class Email:
    address: str
    def __post_init__(self):
        if "@" not in self.address:
            raise ValueError(f"invalid email: {self.address}")field options
@dataclass
class Config:
    # hide from repr
    api_key: str = field(repr=False)
    # exclude from equality checks
    cache_size: int = field(default=100, compare=False)
    # not included in __init__
    _internal: str = field(default="", init=False)
    # keyword-only argument (python 3.10+)
    timeout: int = field(default=30, kw_only=True)performance features
slots (python 3.10+)
@dataclass(slots=True)
class OptimizedData:
    x: int
    y: int
    # 20-30% memory savings
    # faster attribute access
    # no dynamic attributes allowedordered comparisons
@dataclass(order=True)
class Task:
    priority: int = field(compare=True)
    title: str = field(compare=False)
    created: datetime = field(default_factory=datetime.now, compare=False)
# tasks sort by priority only
tasks = [Task(3, "low"), Task(1, "high"), Task(2, "medium")]
sorted_tasks = sorted(tasks)  # sorts by prioritypattern matching (python 3.10+)
@dataclass
class Response:
    status: int
    data: Optional[dict] = None
match response:
    case Response(status=200, data=data) if data:
        return data
    case Response(status=404):
        raise NotFound()
    case Response(status=status):
        raise APIError(f"status {status}")serialization patterns
basic dict conversion
from dataclasses import asdict, astuple
@dataclass
class Person:
    name: str
    age: int
person = Person("alice", 30)
data = asdict(person)  # {"name": "alice", "age": 30}
values = astuple(person)  # ("alice", 30)json serialization
import json
from dataclasses import asdict
@dataclass
class Person:
    name: str
    age: int
    def to_json(self) -> str:
        return json.dumps(asdict(self))
    @classmethod
    def from_json(cls, json_str: str):
        return cls(**json.loads(json_str))common patterns
configuration with defaults
@dataclass
class DatabaseConfig:
    host: str = "localhost"
    port: int = 5432
    database: str = "myapp"
    @classmethod
    def from_env(cls):
        import os
        return cls(
            host=os.getenv("DB_HOST", cls.host),
            port=int(os.getenv("DB_PORT", cls.port)),
            database=os.getenv("DB_NAME", cls.database)
        )inheritance
@dataclass
class BaseModel:
    id: int
    created_at: datetime = field(default_factory=datetime.now)
@dataclass
class User(BaseModel):
    name: str
    email: str
    # inherits id and created_atcustom init
@dataclass
class Temperature:
    celsius: float = field(init=False)
    def __init__(self, fahrenheit: float):
        self.celsius = (fahrenheit - 32) * 5 / 9complete example
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
from decimal import Decimal
@dataclass(slots=True)
class Product:
    """immutable product with efficient memory usage"""
    id: int
    name: str
    price: Decimal
    tags: List[str] = field(default_factory=list)
    def __post_init__(self):
        if self.price < 0:
            raise ValueError("price cannot be negative")
@dataclass
class Cart:
    """mutable shopping cart with computed properties"""
    items: List[Product] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    def add_item(self, product: Product) -> None:
        self.items.append(product)
    @property
    def total(self) -> Decimal:
        return sum(item.price for item in self.items)
    def to_dict(self) -> dict:
        return {
            "items": [asdict(item) for item in self.items],
            "total": str(self.total),
            "created_at": self.created_at.isoformat()
        }comparison with alternatives
| feature | dataclasses | pydantic | attrs | sqlalchemy | 
|---|---|---|---|---|
| standard library | ✓ | ✗ | ✗ | ✗ | 
| runtime validation | ✗ | ✓ | ✓ | ✓ | 
| serialization | basic | advanced | advanced | orm | 
| performance | fastest | fast | fast | slower | 
| learning curve | minimal | moderate | moderate | steep | 
best practices
do
- use frozen=Truefor immutable data
- use slots=Truefor better performance
- use field(default_factory=...)for mutable defaults
- validate in __post_init__when needed
- keep dataclasses focused on data
don’t
- use mutable default values directly
- mix heavy business logic into dataclasses
- rely on type hints for runtime validation
- use for classes with many methods
- inherit from multiple dataclasses
limitations
- no automatic type validation at runtime
- limited serialization capabilities
- no schema generation
- no async validation support
- no built-in json schema support
migration path
when dataclasses aren’t enough:
- need validation → pydantic
- need database → sqlalchemy
- need both → see choosing a data model