nvidia gpu compute capability reference

comprehensive reference of nvidia gpu compute capabilities. compute capability determines which cuda features and optimizations are available for each gpu.

source: nvidia cuda gpus official documentation

what is compute capability?

compute capability represents a gpu’s feature set and computational abilities:

  • major version: indicates core architecture (e.g., 9.x = hopper)
  • minor version: indicates incremental improvements within architecture
  • feature support: determines available cuda features, tensor cores, fp formats
  • optimization targets: compiler uses this for optimal code generation

quick reference

checking your gpu

# nvidia-smi query
nvidia-smi --query-gpu=name,compute_cap --format=csv

# python (pytorch)
uv run --with=torch --index https://download.pytorch.org/whl/cu128 python -c "import torch; print(torch.cuda.get_device_capability())"

# cuda devicequery
# deviceQuery | grep "CUDA Capability"
# this is where i found "devicequery" on ubuntu 25.04 after installing cuda via apt
/usr/local/cuda/bin/__nvcc_device_query

key capability thresholds

capabilitymilestone features
>= 12.0blackwell architecture, next-gen tensor cores
>= 10.0blackwell datacenter, enhanced fp8
>= 9.0mxfp4 support, fp8 tensor engine, hopper
>= 8.9ada lovelace, av1 encode, 3rd gen rt cores
>= 8.6ampere consumer, 2x fp32 throughput
>= 8.0ampere datacenter, tf32, structured sparsity
>= 7.5turing, rt cores, tensor cores int8/int4
>= 7.0volta, tensor cores fp16, independent thread scheduling

complete gpu list by compute capability

compute capability 12.0 (blackwell rtx)

professional workstation

  • nvidia rtx pro 6000 blackwell server edition
  • nvidia rtx pro 6000 blackwell workstation edition
  • nvidia rtx pro 6000 blackwell max-q workstation edition
  • nvidia rtx pro 5000 blackwell
  • nvidia rtx pro 4500 blackwell
  • nvidia rtx pro 4000 blackwell

consumer (rtx 50 series)

  • geforce rtx 5090
  • geforce rtx 5080
  • geforce rtx 5070 ti
  • geforce rtx 5070
  • geforce rtx 5060 ti
  • geforce rtx 5060

compute capability 10.0 (blackwell datacenter)

  • nvidia gb200 (grace blackwell superchip)
  • nvidia b200
  • nvidia b100

compute capability 9.0 (hopper)

  • nvidia gh200 (grace hopper superchip)
  • nvidia h200 (141gb hbm3e)
  • nvidia h100 (80gb/90gb hbm3)

compute capability 8.9 (ada lovelace)

datacenter

  • l40 (48gb)
  • l40s (48gb)
  • l4 (24gb)

professional

  • rtx 6000 ada generation (48gb)
  • rtx 5000 ada generation (32gb)
  • rtx 4500 ada generation (24gb)
  • rtx 4000 ada generation (20gb)
  • rtx 4000 sff ada generation (20gb)

consumer (rtx 40 series)

  • geforce rtx 4090 (24gb)
  • geforce rtx 4080 super (16gb)
  • geforce rtx 4080 (16gb)
  • geforce rtx 4070 ti super (16gb)
  • geforce rtx 4070 ti (12gb)
  • geforce rtx 4070 super (12gb)
  • geforce rtx 4070 (12gb)
  • geforce rtx 4060 ti (8gb/16gb)
  • geforce rtx 4060 (8gb)

mobile

  • geforce rtx 4090 laptop
  • geforce rtx 4080 laptop
  • geforce rtx 4070 laptop
  • geforce rtx 4060 laptop
  • geforce rtx 4050 laptop

compute capability 8.6 (ampere consumer)

datacenter

  • a40 (48gb)
  • a10 (24gb)
  • a10g (24gb)
  • a16 (4x16gb)
  • a2 (16gb)

professional

  • rtx a6000 (48gb)
  • rtx a5500 (24gb)
  • rtx a5000 (24gb)
  • rtx a4500 (20gb)
  • rtx a4000 (16gb)
  • rtx a2000 (6gb/12gb)

consumer (rtx 30 series)

  • geforce rtx 3090 ti (24gb)
  • geforce rtx 3090 (24gb)
  • geforce rtx 3080 ti (12gb)
  • geforce rtx 3080 (10gb/12gb)
  • geforce rtx 3070 ti (8gb)
  • geforce rtx 3070 (8gb)
  • geforce rtx 3060 ti (8gb)
  • geforce rtx 3060 (8gb/12gb)
  • geforce rtx 3050 (6gb/8gb)

compute capability 8.0 (ampere datacenter)

  • nvidia a100 (40gb/80gb hbm2e)
  • nvidia a30 (24gb hbm2)

compute capability 7.5 (turing)

datacenter

  • t4 (16gb)

professional

  • quadro rtx 8000 (48gb)
  • quadro rtx 6000 (24gb)
  • quadro rtx 5000 (16gb)
  • quadro rtx 4000 (8gb)

consumer (rtx 20 series)

  • titan rtx (24gb)
  • geforce rtx 2080 ti (11gb)
  • geforce rtx 2080 super (8gb)
  • geforce rtx 2080 (8gb)
  • geforce rtx 2070 super (8gb)
  • geforce rtx 2070 (8gb)
  • geforce rtx 2060 super (8gb)
  • geforce rtx 2060 (6gb/12gb)

consumer (gtx 16 series)

  • geforce gtx 1660 ti (6gb)
  • geforce gtx 1660 super (6gb)
  • geforce gtx 1660 (6gb)
  • geforce gtx 1650 super (4gb)
  • geforce gtx 1650 (4gb)

compute capability 7.0 (volta)

  • nvidia gv100 (32gb)
  • nvidia v100 (16gb/32gb hbm2)
  • titan v (12gb)
  • quadro gv100 (32gb)

compute capability 6.x (pascal)

6.1

  • titan xp, titan x (pascal)
  • geforce gtx 1080 ti, 1080, 1070 ti, 1070, 1060, 1050 ti, 1050
  • quadro p6000, p5000, p4000, p2000, p1000, p600, p400

6.0

  • tesla p100 (12gb/16gb)
  • quadro gp100

compute capability 5.x (maxwell)

5.3

  • jetson tx1, nano, tegra x1

5.2

  • geforce gtx titan x, 980 ti, 980, 970, 960, 950
  • quadro m6000, m5000, m4000

5.0

  • geforce gtx 750 ti, 750
  • quadro k2200, k1200, k620

feature support by capability

featuremin capabilitynotes
mxfp4 quantization9.0hopper and newer only
fp8 tensor cores9.0transformer engine
structured sparsity8.02:4 sparsity patterns
tf32 tensor cores8.0automatic mixed precision
bfloat168.0brain floating point
multi-instance gpu (mig)8.0a100/a30/h100 only
3rd gen tensor cores8.0int8, int4 support
rt cores (2nd gen)8.6improved ray tracing
av1 encode8.9ada lovelace
independent thread scheduling7.0volta cooperative groups
tensor cores (1st gen)7.0fp16 mixed precision
rt cores (1st gen)7.5real-time ray tracing
unified memory6.0pascal page migration engine
nvlink6.0high-speed interconnect
dynamic parallelism3.5kernel launch from device

cuda version compatibility

compute capabilityminimum cudarecommended cudadropped in cuda
12.013.013.0-
10.013.013.0-
9.011.813.0-
8.911.812.x/13.0-
8.611.112.x-
8.011.012.x-
7.510.012.x-
7.09.011.x13.0
6.x8.011.x12.0
5.x6.511.x12.0

common use cases

deep learning frameworks

pytorch requirements

  • minimum: compute 3.7 (kepler k80)
  • recommended: compute 7.0+ (tensor cores)
  • optimal: compute 8.0+ (ampere features)

tensorflow requirements

  • minimum: compute 3.5
  • recommended: compute 7.0+ (mixed precision)
  • optimal: compute 8.0+ (tf32 automatic)

specific model requirements

llm inference

  • flash attention: compute >= 7.5
  • flash attention 2: compute >= 8.0
  • mxfp4 models (gpt-oss): compute >= 9.0
  • fp8 models: compute >= 9.0

vision models

  • stable diffusion: compute >= 6.1 recommended
  • sam (segment anything): compute >= 7.0
  • flux: compute >= 7.5

troubleshooting

common errors

“gpu not supported”


      # check if gpu is too old for framework
nvidia-smi --query-gpu=name,compute_cap --format=csv
# verify against framework requirements

    

“mxfp4 requires compute >= 9.0”

# use alternative quantization
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    load_in_4bit=True,  # use bitsandbytes instead
    bnb_4bit_compute_dtype=torch.float16
)

“cuda arch not found”

# specify arch during compilation
export TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;8.9;9.0"
pip install --no-cache-dir package-name

references

══════════════════════════════════════════════════════════════════
on this page