cuda 13.0 release overview
on this page
cuda 13.0 was released in august 2025 with significant architectural changes and unified arm platform support.
overview
cuda 13.0 introduces:
- unified arm platform installation
- blackwell gpu support
- architectural deprecations (maxwell, pascal, volta)
- fatbin compression switch from lz4 to zstd
- shared memory register spilling
platform changes
unified arm support
cuda 13.0 consolidates arm support across platforms:
- single installer for all arm architectures
- arm64-sbsa unified support
- grace hopper (gh200) optimizations
- jetson orin excluded from initial release
dropped architectures
removed support for:
- maxwell (gtx 750, gtx 900 series) - compute 5.x
- pascal (gtx 1000 series) - compute 6.x
- volta (titan v, quadro gv100) - compute 7.0
nvidia states these architectures are “feature-complete with no further enhancements planned.”
new features
compiler improvements
- llvm clang 20 support
- gcc 15 support
- compile time advisor (ctadvisor) tool
- 32-bit vector type alignment for blackwell
performance enhancements
- register spilling to shared memory
- zstd fatbin compression (smaller binaries)
- improved cuda graph performance
- enhanced error reporting for cuda apis
api additions
// new host memory support
cuMemCreate() // with host support
cudaMallocAsync() // host allocation support
driver requirements
minimum driver: r580 series (580.65.06+)
# verify driver version
nvidia-smi | grep "Driver Version"
# must show 580.xx or higher
distribution support
newly supported
- red hat enterprise linux 10/9.6
- debian 12.10
- fedora 42
- rocky linux 9.6/10.0
- ubuntu 24.04 lts
- ubuntu 22.04 lts (continued)
not supported
- ubuntu 25.04 (non-lts)
- ubuntu 25.10 (non-lts)
- ubuntu 23.10 (non-lts, eol)
note: nvidia typically only supports ubuntu lts releases. debian 12.10 is supported despite being a point release.
dropped
- ubuntu 20.04 lts
- older rhel/centos versions
migration guide
checking gpu compatibility
# list gpu compute capability
nvidia-smi --query-gpu=name,compute_cap --format=csv
# supported architectures (compute 7.5+):
# - turing (rtx 20xx)
# - ampere (rtx 30xx)
# - ada lovelace (rtx 40xx)
# - hopper (h100)
# - blackwell (b100/b200)
code migration
vector type alignment
// old (may cause issues on blackwell) struct float4 { float x, y, z, w; }; // cuda 13.0 (32-bit aligned) __align__(32) struct float4 { float x, y, z, w; };
deprecated apis
- multi-device launch apis removed
- legacy vector types deprecated
- nvprof and nvidia visual profiler removed
pytorch compatibility
pytorch cuda 13.0 support status (august 2025):
- tracking issue: pytorch#159779
- release engineering evaluating build complexity
- potential removal of cuda 12.x builds to accommodate
installation
docker (recommended)
# cuda 13.0 base image (when available)
docker pull nvidia/cuda:13.0-base-ubuntu24.04
# runtime test
docker run --rm --gpus all nvidia/cuda:13.0-base-ubuntu24.04 nvidia-smi
native installation
# download cuda 13.0
wget https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64
# install driver first (if needed)
sudo apt install nvidia-driver-580
# install cuda toolkit
sudo sh cuda_13.0_linux.run --toolkit --silent
jax with cuda 13.0
warning: as of august 2025, the jax-cuda13-plugin package may have version conflicts:
# this currently fails with dependency resolution issues
uv pip install --prerelease=allow "jax[cuda13]>=0.7.0"
# Error: no version of jax-cuda13-plugin[with-cuda]==0.7.0
the cuda13 extras are defined in jax’s setup.py but the corresponding plugin packages may not be published yet. monitor jax releases for updates.
# when available, install with:
uv pip install --prerelease=allow "jax[cuda13]>=0.7.0"
# or for local cuda installation
uv pip install --prerelease=allow "jax[cuda13-local]>=0.7.0"
# verify installation
uv run python -c "import jax; print(jax.devices())"
performance considerations
fatbin compression
cuda 13.0 switches from lz4 to zstd:
- ~20% smaller fatbin files
- slightly slower initial load
- better for distribution/containers
shared memory spilling
new feature allows register spillage to shared memory:
- reduces local memory pressure
- improves kernel occupancy
- automatic optimization
framework support
update: jax has added cuda 13.0 support in recent releases.
framework | cuda 13.0 status | verified sources |
---|---|---|
pytorch | no official support | github #159779 - discussing build complexity |
tensorflow | no official support | latest nvidia containers use cuda 12.8 (per nvidia docs) |
jax | planned | cuda13 extras in setup.py (pypi, source) - plugin not yet published |
current supported cuda versions (august 2025):
- pytorch: cuda 11.8, 12.1, 12.4 (planning 12.6 for v2.6)
- tensorflow: up to cuda 12.8 in nvidia optimized containers
- jax: cuda 12.x (cuda 13.x defined but plugin not yet available)
troubleshooting
common issues
unsupported gpu error
- check compute capability >= 7.5
- maxwell/pascal/volta no longer supported
driver version mismatch
# requires r580+ driver nvidia-smi # should show 580.xx+
framework compatibility
- continue using cuda 12.x or 11.8 builds
- monitor framework release notes for cuda 13.0 support
- pytorch tracking: github #159779
vllm dependency chain
- vllm depends on pytorch and cupy
- neither pytorch nor cupy support cuda 13.0 yet
- vllm cuda 13.0 support blocked until dependencies update
future roadmap
risc-v support
nvidia announced cuda coming to risc-v:
- no timeline in cuda 13.0
- part of broader architecture expansion
- following arm unification pattern
potential cuda 14.0
based on deprecation patterns:
- turing (compute 7.5) likely next removal target
- further arm platform integration
- potential risc-v preview
references
══════════════════════════════════════════════════════════════════