cuda gpu computing setup

Running apps:	Use Docker setup (simpler, isolated)
Developing:	Use native setup (compilation required)
Check driver:	`nvidia-smi \| grep "Driver Version"`
Latest stable:	CUDA 13.0 (requires 580.65.06+ driver)
Test GPU:	`docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi`

overview

cuda enables gpu acceleration for compute workloads. this guide helps choose the right setup path.

current ecosystem:

cuda 13.0 - latest stable (august 2025)
cuda 12.x - previous stable
cuda 11.8 - legacy support
driver version != cuda toolkit version
runtime vs development requirements

quick decision tree

need cuda? ├── just running apps (inference/containers) │ └── docker → cuda-docker ├── developing cuda code (compiling) │ └── native → cuda-native

setup comparison

aspect	docker	native
isolation	complete	none
cuda versions	multiple	one
disk space	~2gb/image	~4gb
complexity	simple	moderate
performance	~same	baseline
root access	not required*	required

*with rootless docker

compatibility matrix

gpu generations

gpu	compute capability	min cuda	recommended
blackwell (b100)	10.0	13.0	13.0
hopper (h100)	9.0	11.8	13.0
ada (40xx)	8.9	11.8	13.0
ampere (30xx)	8.6	11.0	12.x+
turing (20xx)	7.5	10.0	12.x+
volta (v100)	7.0	9.0	dropped
pascal (10xx)	6.x	8.0	dropped
maxwell (9xx)	5.x	6.0	dropped

driver requirements

# check current driver
nvidia-smi | grep "Driver Version"
# Driver Version: 575.57.08

minimum driver versions:

cuda 13.0 → 580.65.06+ (r580 series)
cuda 12.9 → 550.54.14+
cuda 12.0 → 525.60.13+
cuda 11.8 → 450.80.02+

quick test

verify gpu access:

# docker test
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
# cuda 13.0 when available:
# docker run --rm --gpus all nvidia/cuda:13.0-base-ubuntu24.04 nvidia-smi

# python test
uv run https://michaelbommarito.com/wiki/python/scripts/check-pytorch.py

common use cases

machine learning

most ml frameworks work with docker:

pytorch → setup guide
tensorflow → docker with tensorflow/tensorflow:latest-gpu
jax → docker with cuda base image

cuda development

compiling cuda code requires toolkit:

kernel development → native install
research code → either option
production → docker for reproducibility

deployment scenarios

scenario	recommendation	reason
ml inference	docker	version isolation
training clusters	docker	multi-user safety
edge devices	native	resource constraints
development	both	flexibility

framework support

all major frameworks support containerized cuda:

pytorch: custom index urls
tensorflow: official gpu images
jax: cuda + cudnn required
rapids: cuda 11.2+ required

troubleshooting checklist

gpu not detected

# verify driver loaded
lsmod | grep nvidia
# check pci device
lspci | grep -i nvidia

version mismatch
- driver supports newer cuda
- toolkit can’t be newer than driver
- containers handle this automatically
permission denied
- docker: need --gpus flag
- native: check /dev/nvidia* permissions

architecture notes

arm64

arm64 support (cuda 13.0+):

unified arm platform - single installer
grace hopper (gh200) - full support
jetson orin - excluded from cuda 13.0
see cuda 13.0 guide for details

wsl2

windows subsystem for linux:

requires windows 11 or 10 21h2+
nvidia driver on windows side only
cuda toolkit in wsl2

next steps

containers (recommended) → cuda docker setup
bare metal → cuda native setup
framework specific → pytorch setup