transformers v5.0

transformers v5.0 is the first major release in five years (since v4.0.0rc-1 in november 2020). it features 800+ commits with significant architectural refactoring focused on performance optimizations, cleaner defaults, and a modern codebase.

release status: v5.0.0rc0 (release candidate) released december 1, 2024

quick links

core documentation

migration guide - breaking changes and how to update your code
modular architecture - new model contribution system
attention interface - unified attention abstraction

inference and serving

serving and batching - transformers serve and continuous batching
inference engines - vllm and sglang integration
quantization - 18+ quantization methods

other changes

tokenizer changes - unified backend system

troubleshooting

TokenizersBackend does not exist - fixing version mismatch errors

growth metrics

metric	v4 (2020)	v5 (2025)
daily pip installations	20,000	3,000,000+
total installations	-	1.2 billion+
model architectures	40	400+
hub checkpoints	~1,000	750,000+
contributors to rc0	-	149

key changes

framework consolidation

pytorch becomes the sole backend. tensorflow and jax/flax support removed entirely:

# v4.x - multiple backends
from transformers import TFAutoModel  # removed
from transformers import FlaxAutoModel  # removed

# v5.0 - pytorch only
from transformers import AutoModel

major new features

feature	description
modular transformers	model contributions reduced from 3000-6000 lines to ~500 lines
attentioninterface	centralized attention with runtime switching
continuous batching	dynamic request grouping for higher gpu utilization
`transformers serve`	openai api-compatible server deployment
quantization redesign	18+ methods as first-class citizens

installation

# install pre-release (opt-in)
pip install transformers --pre

# or with uv
uv add transformers --prerelease=allow

# standard install still gets v4.x
pip install transformers

core dependencies

python >= 3.9.0
pytorch 2.1+ (tested on 2.2+)
huggingface_hub >= 1.0.0, < 2.0
accelerate >= 1.1.0

critical breaking changes

these require immediate attention when upgrading:

authentication

# old (v4.x)
model = AutoModel.from_pretrained("model", use_auth_token="hf_...")

# new (v5.0)
model = AutoModel.from_pretrained("model", token="hf_...")

quantization

# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    load_in_4bit=True,  # no longer works
)

# new (v5.0)
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    quantization_config=quantization_config,
)

image processing

# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor

# new (v5.0)
from transformers import AutoImageProcessor

see the full migration guide for all breaking changes.

ecosystem partners

the v5 release includes integration from:

training: unsloth, axolotl, llamafactory, trl, maxtext

inference: vllm, sglang, tensorrt-llm, llama.cpp, mlx

quantization: bitsandbytes, torchao

distributed: torchtitan, megatron, nanotron

known issues in rc0

peft + moe adapters have compatibility issues
tensor/expert parallelism with vllm needs fixes
custom pretrained models may auto-initialize weights