transformers v5.0 migration guide

this guide covers all breaking changes when upgrading from transformers v4.x to v5.0.

framework requirements

python version

required: python >= 3.9.0

framework changes

pytorch: 2.1+ (tested on 2.2+) - sole backend
tensorflow: removed entirely
jax/flax: removed entirely

core dependencies

huggingface_hub >= 1.0.0, < 2.0
accelerate >= 1.1.0
peft >= 0.18.0
bitsandbytes >= 0.46.1
Pillow >= 10.0.1, <= 15.0

critical breaking changes

1. framework support

tensorflow and jax/flax are completely removed:

# v4.x - multiple backends
from transformers import TFAutoModel  # removed
from transformers import FlaxAutoModel  # removed
from transformers import TFBertModel  # removed
from transformers import FlaxBertModel  # removed

# v5.0 - pytorch only
from transformers import AutoModel
from transformers import BertModel

action: migrate all tensorflow/jax code to pytorch equivalents.

2. authentication parameter

# old (v4.x)
model = AutoModel.from_pretrained("private/model", use_auth_token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", use_auth_token="hf_...")

# new (v5.0)
model = AutoModel.from_pretrained("private/model", token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", token="hf_...")

3. quantization api

# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    load_in_4bit=True,
    device_map="auto"
)

# new (v5.0)
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    device_map="auto",
    quantization_config=quantization_config
)

4. image processing

featureextractor classes deprecated (use imageprocessor instead):

# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("model-name")

# new (v5.0)
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("model-name")

image processors now only exist in “fast” variant (requires torchvision).

5. tokenization changes

fast/slow distinction eliminated. see tokenizer changes for details.

# encode_plus deprecated
# old (v4.x)
tokens = tokenizer.encode_plus(text)
tokens = tokenizer.batch_encode_plus([text1, text2])

# new (v5.0)
tokens = tokenizer(text)
tokens = tokenizer([text1, text2])

apply_chat_template returns batchencoding:

# v4.x - returned input_ids only
input_ids = tokenizer.apply_chat_template(messages)

# v5.0 - returns BatchEncoding dict
output = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
# output is BatchEncoding with 'input_ids', 'attention_mask'

6. configuration changes

# from_xxx_config() methods removed
# old (v4.x)
config = SomeConfig.from_other_config(other_config)

# new (v5.0)
config = SomeConfig(**other_config.to_dict())

rope parameters:

# old (v4.x)
rope_theta = config.rope_theta

# new (v5.0)
rope_params = config.rope_parameters  # returns dict for multi-config models

trainingarguments changes

removed without deprecation

these parameters no longer exist:

mp_parameters
_n_gpu
overwrite_output_dir
logging_dir
jit_mode_eval
tpu_num_cores
past_index
ray_scope
warmup_ratio → use warmup_step (accepts float)

deprecated and removed

old parameter	new parameter
`fsdp_min_num_params`	`fsdp_config`
`fsdp_transformer_layer_cls_to_wrap`	`fsdp_config`
`push_to_hub_model_id`	`hub_model_id`
`push_to_hub_token`	`hub_token`
`no_cuda`	`use_cpu`
`fp16_backend`	removed (torch.amp only)
`half_precision_backend`	removed (torch.amp only)
`model_path`	`resume_from_checkpoint`

trainer parameter

# old (v4.x)
trainer = Trainer(model=model, tokenizer=tokenizer, ...)

# new (v5.0)
trainer = Trainer(model=model, processor=processor, ...)

cli changes

# old (v4.x)
transformers-cli login
transformers-cli download

# new (v5.0)
transformers login
transformers download

environment variables

# old (v4.x) - deprecated
export TRANSFORMERS_CACHE=/path/to/cache

# new (v5.0)
export HF_HOME=/path/to/cache

removed features

these features are completely removed:

head masking
relative positional biases in bert-like models
head pruning functionality
bettertransformer
legacy cache format
torchscript and torch.fx support (use dynamo and export)

generation api changes

deprecated output type aliases removed
reduced to 4 output classes (decoder type + beam usage)
grouped_entities parameter removed
default cache now model-defined (not always dynamiccache)

default changes

setting	old default	new default
`use_cache` in model configs	`True`	`False`
`report_to` in trainingarguments	various	`"none"`

attention output

# v5.0 requires explicit attention implementation for output_attentions
model = AutoModel.from_pretrained(
    "model-name",
    attn_implementation="eager"  # required for output_attentions=True
)
outputs = model(**inputs, output_attentions=True)

migration checklist

framework migration

remove all tensorflow imports and code
remove all jax/flax imports and code
migrate to pytorch equivalents

authentication

replace use_auth_token with token

quantization

replace load_in_4bit/load_in_8bit with quantization_config

image processing

replace FeatureExtractor with ImageProcessor
ensure torchvision is installed

tokenization

replace encode_plus() with __call__()
update apply_chat_template() handling for BatchEncoding

training

remove deprecated trainingarguments parameters
update fsdp configuration to use fsdp_config
replace push_to_hub_* with hub_model_id/hub_token
change tokenizer to processor in trainer

generation

update output type handling
remove grouped_entities usage

configuration

replace from_xxx_config() with __init__()
update rope access to rope_parameters

environment

replace TRANSFORMERS_CACHE with HF_HOME

cli

replace transformers-cli with transformers

attention

add attn_implementation="eager" when using output_attentions=True

known issues in rc0

peft + moe adapters: compatibility issues present
tensor/expert parallelism with vllm: needs fixes
custom pretrained models: may auto-initialize weights (workaround available)