transformers v5.0 migration guide
on this page
this guide covers all breaking changes when upgrading from transformers v4.x to v5.0.
framework requirements
python version
- required: python >= 3.9.0
framework changes
- pytorch: 2.1+ (tested on 2.2+) - sole backend
- tensorflow: removed entirely
- jax/flax: removed entirely
core dependencies
huggingface_hub >= 1.0.0, < 2.0
accelerate >= 1.1.0
peft >= 0.18.0
bitsandbytes >= 0.46.1
Pillow >= 10.0.1, <= 15.0 critical breaking changes
1. framework support
tensorflow and jax/flax are completely removed:
# v4.x - multiple backends
from transformers import TFAutoModel # removed
from transformers import FlaxAutoModel # removed
from transformers import TFBertModel # removed
from transformers import FlaxBertModel # removed
# v5.0 - pytorch only
from transformers import AutoModel
from transformers import BertModel action: migrate all tensorflow/jax code to pytorch equivalents.
2. authentication parameter
# old (v4.x)
model = AutoModel.from_pretrained("private/model", use_auth_token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", use_auth_token="hf_...")
# new (v5.0)
model = AutoModel.from_pretrained("private/model", token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", token="hf_...") 3. quantization api
# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
load_in_4bit=True,
device_map="auto"
)
# new (v5.0)
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
device_map="auto",
quantization_config=quantization_config
) 4. image processing
featureextractor classes deprecated (use imageprocessor instead):
# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("model-name")
# new (v5.0)
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("model-name") image processors now only exist in “fast” variant (requires torchvision).
5. tokenization changes
fast/slow distinction eliminated. see tokenizer changes for details.
# encode_plus deprecated
# old (v4.x)
tokens = tokenizer.encode_plus(text)
tokens = tokenizer.batch_encode_plus([text1, text2])
# new (v5.0)
tokens = tokenizer(text)
tokens = tokenizer([text1, text2]) apply_chat_template returns batchencoding:
# v4.x - returned input_ids only
input_ids = tokenizer.apply_chat_template(messages)
# v5.0 - returns BatchEncoding dict
output = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
# output is BatchEncoding with 'input_ids', 'attention_mask' 6. configuration changes
# from_xxx_config() methods removed
# old (v4.x)
config = SomeConfig.from_other_config(other_config)
# new (v5.0)
config = SomeConfig(**other_config.to_dict()) rope parameters:
# old (v4.x)
rope_theta = config.rope_theta
# new (v5.0)
rope_params = config.rope_parameters # returns dict for multi-config models trainingarguments changes
removed without deprecation
these parameters no longer exist:
mp_parameters_n_gpuoverwrite_output_dirlogging_dirjit_mode_evaltpu_num_corespast_indexray_scopewarmup_ratio→ usewarmup_step(accepts float)
deprecated and removed
| old parameter | new parameter |
|---|---|
fsdp_min_num_params | fsdp_config |
fsdp_transformer_layer_cls_to_wrap | fsdp_config |
push_to_hub_model_id | hub_model_id |
push_to_hub_token | hub_token |
no_cuda | use_cpu |
fp16_backend | removed (torch.amp only) |
half_precision_backend | removed (torch.amp only) |
model_path | resume_from_checkpoint |
trainer parameter
# old (v4.x)
trainer = Trainer(model=model, tokenizer=tokenizer, ...)
# new (v5.0)
trainer = Trainer(model=model, processor=processor, ...) cli changes
# old (v4.x)
transformers-cli login
transformers-cli download
# new (v5.0)
transformers login
transformers download environment variables
# old (v4.x) - deprecated
export TRANSFORMERS_CACHE=/path/to/cache
# new (v5.0)
export HF_HOME=/path/to/cache removed features
these features are completely removed:
- head masking
- relative positional biases in bert-like models
- head pruning functionality
- bettertransformer
- legacy cache format
- torchscript and torch.fx support (use
dynamoandexport)
generation api changes
- deprecated output type aliases removed
- reduced to 4 output classes (decoder type + beam usage)
grouped_entitiesparameter removed- default cache now model-defined (not always
dynamiccache)
default changes
| setting | old default | new default |
|---|---|---|
use_cache in model configs | True | False |
report_to in trainingarguments | various | "none" |
attention output
# v5.0 requires explicit attention implementation for output_attentions
model = AutoModel.from_pretrained(
"model-name",
attn_implementation="eager" # required for output_attentions=True
)
outputs = model(**inputs, output_attentions=True) migration checklist
framework migration
- remove all tensorflow imports and code
- remove all jax/flax imports and code
- migrate to pytorch equivalents
authentication
- replace
use_auth_tokenwithtoken
quantization
- replace
load_in_4bit/load_in_8bitwithquantization_config
image processing
- replace
FeatureExtractorwithImageProcessor - ensure torchvision is installed
tokenization
- replace
encode_plus()with__call__() - update
apply_chat_template()handling forBatchEncoding
training
- remove deprecated trainingarguments parameters
- update fsdp configuration to use
fsdp_config - replace
push_to_hub_*withhub_model_id/hub_token - change
tokenizertoprocessorin trainer
generation
- update output type handling
- remove
grouped_entitiesusage
configuration
- replace
from_xxx_config()with__init__() - update rope access to
rope_parameters
environment
- replace
TRANSFORMERS_CACHEwithHF_HOME
cli
- replace
transformers-cliwithtransformers
attention
- add
attn_implementation="eager"when usingoutput_attentions=True
known issues in rc0
- peft + moe adapters: compatibility issues present
- tensor/expert parallelism with vllm: needs fixes
- custom pretrained models: may auto-initialize weights (workaround available)