cuda gpu computing setup

published: August 3, 2025 updated: August 5, 2025

overview

cuda enables gpu acceleration for compute workloads. this guide helps choose the right setup path.

current ecosystem:

  • cuda 13.0 - latest stable (august 2025)
  • cuda 12.x - previous stable
  • cuda 11.8 - legacy support
  • driver version != cuda toolkit version
  • runtime vs development requirements

quick decision tree

need cuda? ├── just running apps (inference/containers) │ └── docker → cuda-docker ├── developing cuda code (compiling) │ └── native → cuda-native

setup comparison

aspectdockernative
isolationcompletenone
cuda versionsmultipleone
disk space~2gb/image~4gb
complexitysimplemoderate
performance~samebaseline
root accessnot required*required

*with rootless docker

compatibility matrix

gpu generations

gpucompute capabilitymin cudarecommended
blackwell (b100)10.013.013.0
hopper (h100)9.011.813.0
ada (40xx)8.911.813.0
ampere (30xx)8.611.012.x+
turing (20xx)7.510.012.x+
volta (v100)7.09.0dropped
pascal (10xx)6.x8.0dropped
maxwell (9xx)5.x6.0dropped

driver requirements

# check current driver
nvidia-smi | grep "Driver Version"
# Driver Version: 575.57.08

minimum driver versions:

  • cuda 13.0 → 580.65.06+ (r580 series)
  • cuda 12.9 → 550.54.14+
  • cuda 12.0 → 525.60.13+
  • cuda 11.8 → 450.80.02+

quick test

verify gpu access:

# docker test
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
# cuda 13.0 when available:
# docker run --rm --gpus all nvidia/cuda:13.0-base-ubuntu24.04 nvidia-smi

# python test
uv run https://michaelbommarito.com/wiki/python/scripts/check-pytorch.py

common use cases

machine learning

most ml frameworks work with docker:

  • pytorch → setup guide
  • tensorflow → docker with tensorflow/tensorflow:latest-gpu
  • jax → docker with cuda base image

cuda development

compiling cuda code requires toolkit:

  • kernel development → native install
  • research code → either option
  • production → docker for reproducibility

deployment scenarios

scenariorecommendationreason
ml inferencedockerversion isolation
training clustersdockermulti-user safety
edge devicesnativeresource constraints
developmentbothflexibility

framework support

all major frameworks support containerized cuda:

  • pytorch: custom index urls
  • tensorflow: official gpu images
  • jax: cuda + cudnn required
  • rapids: cuda 11.2+ required

troubleshooting checklist

  1. gpu not detected

    # verify driver loaded
    lsmod | grep nvidia
    # check pci device
    lspci | grep -i nvidia
  2. version mismatch

    • driver supports newer cuda
    • toolkit can’t be newer than driver
    • containers handle this automatically
  3. permission denied

    • docker: need --gpus flag
    • native: check /dev/nvidia* permissions

architecture notes

arm64

arm64 support (cuda 13.0+):

  • unified arm platform - single installer
  • grace hopper (gh200) - full support
  • jetson orin - excluded from cuda 13.0
  • see cuda 13.0 guide for details

wsl2

windows subsystem for linux:

  • requires windows 11 or 10 21h2+
  • nvidia driver on windows side only
  • cuda toolkit in wsl2

next steps

  1. containers (recommended) → cuda docker setup

  2. bare metalcuda native setup

  3. framework specificpytorch setup

resources

══════════════════════════════════════════════════════════════════
on this page