python data models performance

Overview

three approaches to data modeling in python:

dataclasses: standard library data containers
pydantic: validation and serialization
sqlalchemy: database orm

this benchmark measures their relative performance.

Test Setup

each library implements the same model:

# Dataclass
@dataclass
class PersonDataclass:
    name: str
    age: int
    email: str
    active: bool = True

# Pydantic
class PersonPydantic(BaseModel):
    name: str
    age: int
    email: str
    active: bool = True

# SQLAlchemy (with in-memory SQLite)
class PersonSQLAlchemy(Base):
    __tablename__ = 'persons'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    age = Column(Integer)
    email = Column(String)
    active = Column(Boolean, default=True)

Operations Tested

creation: instantiating objects
modification: updating attributes
serialization: converting to dictionaries

10,000 iterations per operation, measured with time.perf_counter()

Results

Python data model performance comparison

Average Performance (Python 3.10-3.14)

library	creation	modification	serialization
dataclasses	1.0x	1.0x	1.0x
pydantic	1.8x slower	1.1x slower	2.5x faster
sqlalchemy	50.5x slower	138.8x slower	1.0x

Key Findings

pydantic serializes ~2.5x faster due to rust-based implementation
sqlalchemy operates at a different scale (50-140x slower) due to database operations
dataclasses and pydantic have similar performance for basic operations

Actual Times (Microseconds)

operation	dataclasses	pydantic	sqlalchemy
creation	0.64	0.90	26.08
modification	1.72	1.96	239.49
serialization	2.78	0.76	2.03

When to Use Each

Dataclasses

simple data containers
validation not required
zero dependencies
significant performance improvements in python 3.13+

Pydantic

data validation required
json serialization workflows
api development (fastapi)
configuration management

SQLAlchemy

database persistence
complex queries and relationships
transaction management
database-agnostic applications

Benchmark Files

benchmark.py - benchmark runner
analyze.py - data analysis
visualize.py - visualization
summary.json - aggregated results
results_python_*.json - raw data per python version

Running the Benchmarks

# Run benchmarks with specific Python version
uv run --python 3.14 --prerelease allow --with pydantic --with sqlalchemy python benchmark.py

# Analyze results
uv run --with matplotlib python analyze.py

# Generate visualizations
uv run --with matplotlib python visualize.py

Conclusion

performance is one factor among many:

dataclasses: optimal for simple use cases, especially with python 3.13+
pydantic: best choice for validation and serialization workflows
sqlalchemy: necessary for database-backed applications

choose based on requirements, not benchmarks alone.