transformers v5.0
on this page
transformers v5.0 is the first major release in five years (since v4.0.0rc-1 in november 2020). it features 800+ commits with significant architectural refactoring focused on performance optimizations, cleaner defaults, and a modern codebase.
release status: v5.0.0rc0 (release candidate) released december 1, 2024
quick links
core documentation
- migration guide - breaking changes and how to update your code
- modular architecture - new model contribution system
- attention interface - unified attention abstraction
inference and serving
- serving and batching -
transformers serveand continuous batching - inference engines - vllm and sglang integration
- quantization - 18+ quantization methods
other changes
- tokenizer changes - unified backend system
troubleshooting
- TokenizersBackend does not exist - fixing version mismatch errors
growth metrics
| metric | v4 (2020) | v5 (2025) |
|---|---|---|
| daily pip installations | 20,000 | 3,000,000+ |
| total installations | - | 1.2 billion+ |
| model architectures | 40 | 400+ |
| hub checkpoints | ~1,000 | 750,000+ |
| contributors to rc0 | - | 149 |
key changes
framework consolidation
pytorch becomes the sole backend. tensorflow and jax/flax support removed entirely:
# v4.x - multiple backends
from transformers import TFAutoModel # removed
from transformers import FlaxAutoModel # removed
# v5.0 - pytorch only
from transformers import AutoModel major new features
| feature | description |
|---|---|
| modular transformers | model contributions reduced from 3000-6000 lines to ~500 lines |
| attentioninterface | centralized attention with runtime switching |
| continuous batching | dynamic request grouping for higher gpu utilization |
transformers serve | openai api-compatible server deployment |
| quantization redesign | 18+ methods as first-class citizens |
installation
# install pre-release (opt-in)
pip install transformers --pre
# or with uv
uv add transformers --prerelease=allow
# standard install still gets v4.x
pip install transformers core dependencies
- python >= 3.9.0
- pytorch 2.1+ (tested on 2.2+)
- huggingface_hub >= 1.0.0, < 2.0
- accelerate >= 1.1.0
critical breaking changes
these require immediate attention when upgrading:
authentication
# old (v4.x)
model = AutoModel.from_pretrained("model", use_auth_token="hf_...")
# new (v5.0)
model = AutoModel.from_pretrained("model", token="hf_...") quantization
# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
load_in_4bit=True, # no longer works
)
# new (v5.0)
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
quantization_config=quantization_config,
) image processing
# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor
# new (v5.0)
from transformers import AutoImageProcessor see the full migration guide for all breaking changes.
ecosystem partners
the v5 release includes integration from:
training: unsloth, axolotl, llamafactory, trl, maxtext
inference: vllm, sglang, tensorrt-llm, llama.cpp, mlx
quantization: bitsandbytes, torchao
distributed: torchtitan, megatron, nanotron
known issues in rc0
- peft + moe adapters have compatibility issues
- tensor/expert parallelism with vllm needs fixes
- custom pretrained models may auto-initialize weights