transformers v5.0

transformers v5.0 is the first major release in five years (since v4.0.0rc-1 in november 2020). it features 800+ commits with significant architectural refactoring focused on performance optimizations, cleaner defaults, and a modern codebase.

release status: v5.0.0rc0 (release candidate) released december 1, 2024

core documentation

inference and serving

other changes

troubleshooting

growth metrics

metricv4 (2020)v5 (2025)
daily pip installations20,0003,000,000+
total installations-1.2 billion+
model architectures40400+
hub checkpoints~1,000750,000+
contributors to rc0-149

key changes

framework consolidation

pytorch becomes the sole backend. tensorflow and jax/flax support removed entirely:

# v4.x - multiple backends
from transformers import TFAutoModel  # removed
from transformers import FlaxAutoModel  # removed

# v5.0 - pytorch only
from transformers import AutoModel

major new features

featuredescription
modular transformersmodel contributions reduced from 3000-6000 lines to ~500 lines
attentioninterfacecentralized attention with runtime switching
continuous batchingdynamic request grouping for higher gpu utilization
transformers serveopenai api-compatible server deployment
quantization redesign18+ methods as first-class citizens

installation

# install pre-release (opt-in)
pip install transformers --pre

# or with uv
uv add transformers --prerelease=allow

# standard install still gets v4.x
pip install transformers

core dependencies

  • python >= 3.9.0
  • pytorch 2.1+ (tested on 2.2+)
  • huggingface_hub >= 1.0.0, < 2.0
  • accelerate >= 1.1.0

critical breaking changes

these require immediate attention when upgrading:

authentication

# old (v4.x)
model = AutoModel.from_pretrained("model", use_auth_token="hf_...")

# new (v5.0)
model = AutoModel.from_pretrained("model", token="hf_...")

quantization

# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    load_in_4bit=True,  # no longer works
)

# new (v5.0)
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    quantization_config=quantization_config,
)

image processing

# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor

# new (v5.0)
from transformers import AutoImageProcessor

see the full migration guide for all breaking changes.

ecosystem partners

the v5 release includes integration from:

training: unsloth, axolotl, llamafactory, trl, maxtext

inference: vllm, sglang, tensorrt-llm, llama.cpp, mlx

quantization: bitsandbytes, torchao

distributed: torchtitan, megatron, nanotron

known issues in rc0

  1. peft + moe adapters have compatibility issues
  2. tensor/expert parallelism with vllm needs fixes
  3. custom pretrained models may auto-initialize weights

references

on this page