cuda gpu computing setup
on this page
overview
cuda enables gpu acceleration for compute workloads. this guide helps choose the right setup path.
current ecosystem:
- cuda 13.0 - latest stable (august 2025)
- cuda 12.x - previous stable
- cuda 11.8 - legacy support
- driver version != cuda toolkit version
- runtime vs development requirements
quick decision tree
need cuda? ├── just running apps (inference/containers) │ └── docker → cuda-docker ├── developing cuda code (compiling) │ └── native → cuda-native
setup comparison
aspect | docker | native |
---|---|---|
isolation | complete | none |
cuda versions | multiple | one |
disk space | ~2gb/image | ~4gb |
complexity | simple | moderate |
performance | ~same | baseline |
root access | not required* | required |
*with rootless docker
compatibility matrix
gpu generations
gpu | compute capability | min cuda | recommended |
---|---|---|---|
blackwell (b100) | 10.0 | 13.0 | 13.0 |
hopper (h100) | 9.0 | 11.8 | 13.0 |
ada (40xx) | 8.9 | 11.8 | 13.0 |
ampere (30xx) | 8.6 | 11.0 | 12.x+ |
turing (20xx) | 7.5 | 10.0 | 12.x+ |
volta (v100) | 7.0 | 9.0 | dropped |
pascal (10xx) | 6.x | 8.0 | dropped |
maxwell (9xx) | 5.x | 6.0 | dropped |
driver requirements
# check current driver
nvidia-smi | grep "Driver Version"
# Driver Version: 575.57.08
minimum driver versions:
- cuda 13.0 → 580.65.06+ (r580 series)
- cuda 12.9 → 550.54.14+
- cuda 12.0 → 525.60.13+
- cuda 11.8 → 450.80.02+
quick test
verify gpu access:
# docker test
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
# cuda 13.0 when available:
# docker run --rm --gpus all nvidia/cuda:13.0-base-ubuntu24.04 nvidia-smi
# python test
uv run https://michaelbommarito.com/wiki/python/scripts/check-pytorch.py
common use cases
machine learning
most ml frameworks work with docker:
- pytorch → setup guide
- tensorflow → docker with
tensorflow/tensorflow:latest-gpu
- jax → docker with cuda base image
cuda development
compiling cuda code requires toolkit:
- kernel development → native install
- research code → either option
- production → docker for reproducibility
deployment scenarios
scenario | recommendation | reason |
---|---|---|
ml inference | docker | version isolation |
training clusters | docker | multi-user safety |
edge devices | native | resource constraints |
development | both | flexibility |
framework support
all major frameworks support containerized cuda:
- pytorch: custom index urls
- tensorflow: official gpu images
- jax: cuda + cudnn required
- rapids: cuda 11.2+ required
troubleshooting checklist
gpu not detected
# verify driver loaded lsmod | grep nvidia # check pci device lspci | grep -i nvidia
version mismatch
- driver supports newer cuda
- toolkit can’t be newer than driver
- containers handle this automatically
permission denied
- docker: need
--gpus
flag - native: check
/dev/nvidia*
permissions
- docker: need
architecture notes
arm64
arm64 support (cuda 13.0+):
- unified arm platform - single installer
- grace hopper (gh200) - full support
- jetson orin - excluded from cuda 13.0
- see cuda 13.0 guide for details
wsl2
windows subsystem for linux:
- requires windows 11 or 10 21h2+
- nvidia driver on windows side only
- cuda toolkit in wsl2
next steps
containers (recommended) → cuda docker setup
bare metal → cuda native setup
framework specific → pytorch setup
resources
══════════════════════════════════════════════════════════════════