transformers v5.0 migration guide

this guide covers all breaking changes when upgrading from transformers v4.x to v5.0.

framework requirements

python version

  • required: python >= 3.9.0

framework changes

  • pytorch: 2.1+ (tested on 2.2+) - sole backend
  • tensorflow: removed entirely
  • jax/flax: removed entirely

core dependencies

huggingface_hub >= 1.0.0, < 2.0
accelerate >= 1.1.0
peft >= 0.18.0
bitsandbytes >= 0.46.1
Pillow >= 10.0.1, <= 15.0

critical breaking changes

1. framework support

tensorflow and jax/flax are completely removed:

# v4.x - multiple backends
from transformers import TFAutoModel  # removed
from transformers import FlaxAutoModel  # removed
from transformers import TFBertModel  # removed
from transformers import FlaxBertModel  # removed

# v5.0 - pytorch only
from transformers import AutoModel
from transformers import BertModel

action: migrate all tensorflow/jax code to pytorch equivalents.

2. authentication parameter

# old (v4.x)
model = AutoModel.from_pretrained("private/model", use_auth_token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", use_auth_token="hf_...")

# new (v5.0)
model = AutoModel.from_pretrained("private/model", token="hf_...")
tokenizer = AutoTokenizer.from_pretrained("private/model", token="hf_...")

3. quantization api

# old (v4.x) - removed
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    load_in_4bit=True,
    device_map="auto"
)

# new (v5.0)
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    device_map="auto",
    quantization_config=quantization_config
)

4. image processing

featureextractor classes deprecated (use imageprocessor instead):

# old (v4.x) - deprecated
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("model-name")

# new (v5.0)
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("model-name")

image processors now only exist in “fast” variant (requires torchvision).

5. tokenization changes

fast/slow distinction eliminated. see tokenizer changes for details.

# encode_plus deprecated
# old (v4.x)
tokens = tokenizer.encode_plus(text)
tokens = tokenizer.batch_encode_plus([text1, text2])

# new (v5.0)
tokens = tokenizer(text)
tokens = tokenizer([text1, text2])

apply_chat_template returns batchencoding:

# v4.x - returned input_ids only
input_ids = tokenizer.apply_chat_template(messages)

# v5.0 - returns BatchEncoding dict
output = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
# output is BatchEncoding with 'input_ids', 'attention_mask'

6. configuration changes

# from_xxx_config() methods removed
# old (v4.x)
config = SomeConfig.from_other_config(other_config)

# new (v5.0)
config = SomeConfig(**other_config.to_dict())

rope parameters:

# old (v4.x)
rope_theta = config.rope_theta

# new (v5.0)
rope_params = config.rope_parameters  # returns dict for multi-config models

trainingarguments changes

removed without deprecation

these parameters no longer exist:

  • mp_parameters
  • _n_gpu
  • overwrite_output_dir
  • logging_dir
  • jit_mode_eval
  • tpu_num_cores
  • past_index
  • ray_scope
  • warmup_ratio → use warmup_step (accepts float)

deprecated and removed

old parameternew parameter
fsdp_min_num_paramsfsdp_config
fsdp_transformer_layer_cls_to_wrapfsdp_config
push_to_hub_model_idhub_model_id
push_to_hub_tokenhub_token
no_cudause_cpu
fp16_backendremoved (torch.amp only)
half_precision_backendremoved (torch.amp only)
model_pathresume_from_checkpoint

trainer parameter

# old (v4.x)
trainer = Trainer(model=model, tokenizer=tokenizer, ...)

# new (v5.0)
trainer = Trainer(model=model, processor=processor, ...)

cli changes

# old (v4.x)
transformers-cli login
transformers-cli download

# new (v5.0)
transformers login
transformers download

environment variables

# old (v4.x) - deprecated
export TRANSFORMERS_CACHE=/path/to/cache

# new (v5.0)
export HF_HOME=/path/to/cache

removed features

these features are completely removed:

  • head masking
  • relative positional biases in bert-like models
  • head pruning functionality
  • bettertransformer
  • legacy cache format
  • torchscript and torch.fx support (use dynamo and export)

generation api changes

  • deprecated output type aliases removed
  • reduced to 4 output classes (decoder type + beam usage)
  • grouped_entities parameter removed
  • default cache now model-defined (not always dynamiccache)

default changes

settingold defaultnew default
use_cache in model configsTrueFalse
report_to in trainingargumentsvarious"none"

attention output

# v5.0 requires explicit attention implementation for output_attentions
model = AutoModel.from_pretrained(
    "model-name",
    attn_implementation="eager"  # required for output_attentions=True
)
outputs = model(**inputs, output_attentions=True)

migration checklist

framework migration

  • remove all tensorflow imports and code
  • remove all jax/flax imports and code
  • migrate to pytorch equivalents

authentication

  • replace use_auth_token with token

quantization

  • replace load_in_4bit/load_in_8bit with quantization_config

image processing

  • replace FeatureExtractor with ImageProcessor
  • ensure torchvision is installed

tokenization

  • replace encode_plus() with __call__()
  • update apply_chat_template() handling for BatchEncoding

training

  • remove deprecated trainingarguments parameters
  • update fsdp configuration to use fsdp_config
  • replace push_to_hub_* with hub_model_id/hub_token
  • change tokenizer to processor in trainer

generation

  • update output type handling
  • remove grouped_entities usage

configuration

  • replace from_xxx_config() with __init__()
  • update rope access to rope_parameters

environment

  • replace TRANSFORMERS_CACHE with HF_HOME

cli

  • replace transformers-cli with transformers

attention

  • add attn_implementation="eager" when using output_attentions=True

known issues in rc0

  1. peft + moe adapters: compatibility issues present
  2. tensor/expert parallelism with vllm: needs fixes
  3. custom pretrained models: may auto-initialize weights (workaround available)

references

on this page