choosing a data model
on this page
overview
python offers three main approaches for data modeling:
- dataclasses - simple containers (standard library)
- pydantic - validation and serialization
- sqlalchemy - database persistence
quick decision guide
i need to… | use this |
---|---|
store data internally | dataclasses |
validate user input | pydantic |
handle json/api data | pydantic |
save to a database | sqlalchemy |
validate config files | pydantic |
pass data between functions | dataclasses |
build a web app | pydantic + sqlalchemy |
key differences
aspect | dataclasses | pydantic | sqlalchemy |
---|---|---|---|
purpose | data containers | validation | database orm |
validation | manual | automatic | database-level |
json support | basic | full | via serializers |
dependencies | none (stdlib) | pydantic package | sqlalchemy + driver |
complexity | simple | moderate | complex |
when to use each
use dataclasses when
- data is already validated
- working with internal application data
- performance is critical
- you want zero dependencies
@dataclass
class Point:
x: float
y: float
use pydantic when
- handling external data (apis, files, user input)
- need automatic validation
- working with json/yaml
- building apis with fastapi
class User(BaseModel):
email: EmailStr
age: int = Field(ge=0, le=150)
use sqlalchemy when
- need to persist data
- require complex queries
- need transactions
- working with relational data
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
email: Mapped[str] = mapped_column(unique=True)
common patterns
api development (fastapi)
# pydantic for validation
class UserCreate(BaseModel):
email: str
password: str
# sqlalchemy for storage
class UserDB(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
email: Mapped[str]
hashed_password: Mapped[str]
# pydantic for response
class UserResponse(BaseModel):
id: int
email: str
configuration management
# pydantic for config files
class Settings(BaseSettings):
database_url: str
api_key: SecretStr
debug: bool = False
data processing
# dataclasses for internal data
@dataclass(slots=True)
class DataPoint:
timestamp: float
value: float
sensor_id: int
combining approaches
you can use multiple tools together:
- pydantic + sqlalchemy: validate input, store in database
- dataclasses + pydantic: process data internally, validate at boundaries
- all three: complex applications with multiple layers
simple decision flow
need database? → sqlalchemy
need validation? → pydantic
else → dataclasses
performance
based on comprehensive benchmarks across python 3.10-3.14:
operation | dataclasses | pydantic | sqlalchemy |
---|---|---|---|
creation | 1.0x (baseline) | 1.8x slower | 50.5x slower |
modification | 1.0x (baseline) | 1.1x slower | 138.8x slower |
serialization | 1.0x (baseline) | 0.4x (2.5x faster!) | 1.0x (similar) |
note: sqlalchemy times include database operations (in-memory sqlite)
important: performance varies significantly by python version. see version comparison for details.
learn more
library documentation
performance benchmarks
- full performance comparison - detailed benchmarks with visualizations
- python version analysis - how performance evolved from python 3.10 to 3.14
- raw benchmark data - scripts and results
══════════════════════════════════════════════════════════════════