cuda 13.0 release overview

cuda 13.0 was released in august 2025 with significant architectural changes and unified arm platform support.

overview

cuda 13.0 introduces:

unified arm platform installation
blackwell gpu support
architectural deprecations (maxwell, pascal, volta)
fatbin compression switch from lz4 to zstd
shared memory register spilling

platform changes

unified arm support

cuda 13.0 consolidates arm support across platforms:

single installer for all arm architectures
arm64-sbsa unified support
grace hopper (gh200) optimizations
jetson orin excluded from initial release

dropped architectures

removed support for:

maxwell (gtx 750, gtx 900 series) - compute 5.x
pascal (gtx 1000 series) - compute 6.x
volta (titan v, quadro gv100) - compute 7.0

nvidia states these architectures are “feature-complete with no further enhancements planned.”

new features

compiler improvements

llvm clang 20 support
gcc 15 support
compile time advisor (ctadvisor) tool
32-bit vector type alignment for blackwell

performance enhancements

register spilling to shared memory
zstd fatbin compression (smaller binaries)
improved cuda graph performance
enhanced error reporting for cuda apis

api additions

// new host memory support
cuMemCreate()  // with host support
cudaMallocAsync()  // host allocation support

driver requirements

minimum driver: r580 series (580.65.06+)

# verify driver version
nvidia-smi | grep "Driver Version"
# must show 580.xx or higher

distribution support

newly supported

red hat enterprise linux 10/9.6
debian 12.10
fedora 42
rocky linux 9.6/10.0
ubuntu 24.04 lts
ubuntu 22.04 lts (continued)

not supported

ubuntu 25.04 (non-lts)
ubuntu 25.10 (non-lts)
ubuntu 23.10 (non-lts, eol)

note: nvidia typically only supports ubuntu lts releases. debian 12.10 is supported despite being a point release.

dropped

ubuntu 20.04 lts
older rhel/centos versions

migration guide

checking gpu compatibility

# list gpu compute capability
nvidia-smi --query-gpu=name,compute_cap --format=csv

# supported architectures (compute 7.5+):
# - turing (rtx 20xx)
# - ampere (rtx 30xx)
# - ada lovelace (rtx 40xx)
# - hopper (h100)
# - blackwell (b100/b200)

code migration

vector type alignment

// old (may cause issues on blackwell)
struct float4 { float x, y, z, w; };

// cuda 13.0 (32-bit aligned)
__align__(32) struct float4 { float x, y, z, w; };

deprecated apis
- multi-device launch apis removed
- legacy vector types deprecated
- nvprof and nvidia visual profiler removed

pytorch compatibility

pytorch cuda 13.0 support status (august 2025):

tracking issue: pytorch#159779
target date: august 29, 2025 for nightly builds
timeline:
- august 22, 2025: linux and linux aarch64 cd nightly
- august 29, 2025: windows cd nightly
- august 29, 2025: removal of cuda 12.9 builds
binary size reduction: ~71% smaller for cuda math apis
arm support: pr #161257 merged for cuda 13.0 arm64-sbsa support
- wheel size reduced from 3.28gb (cuda 12.9) to 2.18gb (cuda 13.0)
- approximately 33.5% smaller binary size
- supports sm_80 to sm_120 architectures including thor (sm_110) and spark (sm_121)
build complexity: team considering removing at least one cuda 12.x build to manage complexity
project board: pytorch cuda 13.0 project

installation

docker (recommended)

prerequisite: nvidia container toolkit is required for gpu access in containers

# install nvidia container toolkit (ubuntu/debian)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# cuda 13.0 images are now available with version 13.0.0
# base image (minimal, no nvidia-smi)
docker pull nvidia/cuda:13.0.0-base-ubuntu24.04

# runtime image (includes cuda runtime and nvidia-smi)
docker pull nvidia/cuda:13.0.0-runtime-ubuntu24.04

# devel image (includes nvcc compiler and development tools)
docker pull nvidia/cuda:13.0.0-devel-ubuntu24.04

# test gpu access (requires nvidia container toolkit)
docker run --rm --gpus all nvidia/cuda:13.0.0-runtime-ubuntu24.04 nvidia-smi

# check nvcc version (devel image only)
docker run --rm --gpus all nvidia/cuda:13.0.0-devel-ubuntu24.04 nvcc --version

available variants:

base: minimal cuda libraries only
runtime: cuda runtime + nvidia-smi
devel: full cuda toolkit with nvcc compiler
cudnn-runtime: runtime + cudnn libraries
cudnn-devel: devel + cudnn libraries
tensorrt-runtime: runtime + tensorrt
tensorrt-devel: devel + tensorrt

supported operating systems:

ubuntu 24.04, 22.04
rocky linux 8, 9, 10
ubi (red hat universal base image) 8, 9, 10
oracle linux 8, 9
opensuse 15

native installation

# download cuda 13.0
wget https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64

# install driver first (if needed)
sudo apt install nvidia-driver-580

# install cuda toolkit
sudo sh cuda_13.0_linux.run --toolkit --silent

jax with cuda 13.0

warning august 2025: jax-cuda13-plugin (0.0.1rc0) is a placeholder package with no actual cuda 13 support

# the cuda13 extra doesn't exist in jax 0.6.2
uv pip install "jax[cuda13]"  # warning: no extra named 'cuda13'

# jax 0.7.0 is available but cuda13 support not implemented
uv pip install --prerelease=allow "jax>=0.7.0"

# the cuda13 plugin is just a placeholder (empty wheel)
uv pip install --prerelease=allow jax-cuda13-plugin==0.0.1rc0
# installs but contains no actual code

# for now, use cuda 12.x builds
uv pip install "jax[cuda12]"  # cuda 12.x still recommended

current status: jax cuda 13.0 support is not yet functional. the plugin package exists on pypi but is an empty placeholder. continue using cuda 12.x builds until official support is released.

performance considerations

fatbin compression

cuda 13.0 switches from lz4 to zstd:

~20% smaller fatbin files
slightly slower initial load
better for distribution/containers

shared memory spilling

new feature allows register spillage to shared memory:

reduces local memory pressure
improves kernel occupancy
automatic optimization

framework support

update august 27, 2025: pytorch actively implementing cuda 13.0 support for late august release - fingers crossed!

framework	cuda 13.0 status	verified sources
pytorch	nightly builds coming aug 29, 2025	github #159779 - active implementation
tensorflow	no official support	latest nvidia containers use cuda 12.8 (per nvidia docs)
jax	placeholder only	jax-cuda13-plugin 0.0.1rc0 is empty package - no actual implementation
cugraph	in progress	pr #286 - depends on pytorch cuda 13.0 binaries

current supported cuda versions (august 2025):

pytorch: cuda 11.8, 12.1, 12.4, 12.6 (cuda 13.0 nightly coming aug 29)
tensorflow: up to cuda 12.8 in nvidia optimized containers
jax: cuda 12.x (cuda 13.x defined but plugin not yet available)
rapids/cugraph: transitioning to cuda 13.0 (draft prs open)

troubleshooting

common issues

unsupported gpu error
- check compute capability >= 7.5
- maxwell/pascal/volta no longer supported

driver version mismatch

# requires r580+ driver
nvidia-smi  # should show 580.xx+

framework compatibility
- pytorch nightly builds with cuda 13.0 expected august 29, 2025
- continue using cuda 12.x builds for stable releases
- monitor pytorch nightly channel for cuda 13.0 builds:
```
# check for cuda 13.0 nightly builds after aug 29
pip install torch --pre --index-url https://download.pytorch.org/whl/nightly/cu130
```
- pytorch tracking: github #159779
vllm dependency chain
- vllm depends on pytorch and cupy
- neither pytorch nor cupy support cuda 13.0 yet
- vllm cuda 13.0 support blocked until dependencies update

future roadmap

risc-v support

nvidia announced cuda coming to risc-v:

no timeline in cuda 13.0
part of broader architecture expansion
following arm unification pattern

potential cuda 14.0

based on deprecation patterns:

turing (compute 7.5) likely next removal target
further arm platform integration
potential risc-v preview