cuda 13.0 release overview

published: August 5, 2025

cuda 13.0 was released in august 2025 with significant architectural changes and unified arm platform support.

overview

cuda 13.0 introduces:

  • unified arm platform installation
  • blackwell gpu support
  • architectural deprecations (maxwell, pascal, volta)
  • fatbin compression switch from lz4 to zstd
  • shared memory register spilling

platform changes

unified arm support

cuda 13.0 consolidates arm support across platforms:

  • single installer for all arm architectures
  • arm64-sbsa unified support
  • grace hopper (gh200) optimizations
  • jetson orin excluded from initial release

dropped architectures

removed support for:

  • maxwell (gtx 750, gtx 900 series) - compute 5.x
  • pascal (gtx 1000 series) - compute 6.x
  • volta (titan v, quadro gv100) - compute 7.0

nvidia states these architectures are “feature-complete with no further enhancements planned.”

new features

compiler improvements

  • llvm clang 20 support
  • gcc 15 support
  • compile time advisor (ctadvisor) tool
  • 32-bit vector type alignment for blackwell

performance enhancements

  • register spilling to shared memory
  • zstd fatbin compression (smaller binaries)
  • improved cuda graph performance
  • enhanced error reporting for cuda apis

api additions

// new host memory support
cuMemCreate()  // with host support
cudaMallocAsync()  // host allocation support

driver requirements

minimum driver: r580 series (580.65.06+)

# verify driver version
nvidia-smi | grep "Driver Version"
# must show 580.xx or higher

distribution support

newly supported

  • red hat enterprise linux 10/9.6
  • debian 12.10
  • fedora 42
  • rocky linux 9.6/10.0
  • ubuntu 24.04 lts
  • ubuntu 22.04 lts (continued)

not supported

  • ubuntu 25.04 (non-lts)
  • ubuntu 25.10 (non-lts)
  • ubuntu 23.10 (non-lts, eol)

note: nvidia typically only supports ubuntu lts releases. debian 12.10 is supported despite being a point release.

dropped

  • ubuntu 20.04 lts
  • older rhel/centos versions

migration guide

checking gpu compatibility

# list gpu compute capability
nvidia-smi --query-gpu=name,compute_cap --format=csv

# supported architectures (compute 7.5+):
# - turing (rtx 20xx)
# - ampere (rtx 30xx)
# - ada lovelace (rtx 40xx)
# - hopper (h100)
# - blackwell (b100/b200)

code migration

  1. vector type alignment

    // old (may cause issues on blackwell)
    struct float4 { float x, y, z, w; };
    
    // cuda 13.0 (32-bit aligned)
    __align__(32) struct float4 { float x, y, z, w; };
  2. deprecated apis

    • multi-device launch apis removed
    • legacy vector types deprecated
    • nvprof and nvidia visual profiler removed

pytorch compatibility

pytorch cuda 13.0 support status (august 2025):

  • tracking issue: pytorch#159779
  • release engineering evaluating build complexity
  • potential removal of cuda 12.x builds to accommodate

installation

# cuda 13.0 base image (when available)
docker pull nvidia/cuda:13.0-base-ubuntu24.04

# runtime test
docker run --rm --gpus all nvidia/cuda:13.0-base-ubuntu24.04 nvidia-smi

native installation

# download cuda 13.0
wget https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64

# install driver first (if needed)
sudo apt install nvidia-driver-580

# install cuda toolkit
sudo sh cuda_13.0_linux.run --toolkit --silent

jax with cuda 13.0

warning: as of august 2025, the jax-cuda13-plugin package may have version conflicts:

# this currently fails with dependency resolution issues
uv pip install --prerelease=allow "jax[cuda13]>=0.7.0"
# Error: no version of jax-cuda13-plugin[with-cuda]==0.7.0

the cuda13 extras are defined in jax’s setup.py but the corresponding plugin packages may not be published yet. monitor jax releases for updates.

# when available, install with:
uv pip install --prerelease=allow "jax[cuda13]>=0.7.0"

# or for local cuda installation
uv pip install --prerelease=allow "jax[cuda13-local]>=0.7.0"

# verify installation
uv run python -c "import jax; print(jax.devices())"

performance considerations

fatbin compression

cuda 13.0 switches from lz4 to zstd:

  • ~20% smaller fatbin files
  • slightly slower initial load
  • better for distribution/containers

shared memory spilling

new feature allows register spillage to shared memory:

  • reduces local memory pressure
  • improves kernel occupancy
  • automatic optimization

framework support

update: jax has added cuda 13.0 support in recent releases.

frameworkcuda 13.0 statusverified sources
pytorchno official supportgithub #159779 - discussing build complexity
tensorflowno official supportlatest nvidia containers use cuda 12.8 (per nvidia docs)
jaxplannedcuda13 extras in setup.py (pypi, source) - plugin not yet published

current supported cuda versions (august 2025):

  • pytorch: cuda 11.8, 12.1, 12.4 (planning 12.6 for v2.6)
  • tensorflow: up to cuda 12.8 in nvidia optimized containers
  • jax: cuda 12.x (cuda 13.x defined but plugin not yet available)

troubleshooting

common issues

  1. unsupported gpu error

    • check compute capability >= 7.5
    • maxwell/pascal/volta no longer supported
  2. driver version mismatch

    # requires r580+ driver
    nvidia-smi  # should show 580.xx+
  3. framework compatibility

    • continue using cuda 12.x or 11.8 builds
    • monitor framework release notes for cuda 13.0 support
    • pytorch tracking: github #159779
  4. vllm dependency chain

    • vllm depends on pytorch and cupy
    • neither pytorch nor cupy support cuda 13.0 yet
    • vllm cuda 13.0 support blocked until dependencies update

future roadmap

risc-v support

nvidia announced cuda coming to risc-v:

  • no timeline in cuda 13.0
  • part of broader architecture expansion
  • following arm unification pattern

potential cuda 14.0

based on deprecation patterns:

  • turing (compute 7.5) likely next removal target
  • further arm platform integration
  • potential risc-v preview

references

══════════════════════════════════════════════════════════════════
on this page