cuda native installation

overview

native cuda toolkit installation for compiling cuda code and maximum performance. required for kernel development and cuda compilation.

current version: cuda 12.9 update 1 (august 2025)

when to use:

cuda kernel development
custom cuda libraries
system-wide cuda tools
no docker overhead acceptable

prerequisites

# verify nvidia driver
nvidia-smi
# need driver 550.54.14+ for cuda 12.9

# check gcc
gcc --version
# need gcc 11 or 12

# verify kernel headers
uname -r
ls /usr/src/linux-headers-$(uname -r)

installation methods

method 1: deb packages (recommended)

for ubuntu 22.04:

# download cuda keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb

# update and install
sudo apt update
sudo apt install cuda-toolkit-12-9

for ubuntu 24.04:

# download cuda keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb

# update and install
sudo apt update
sudo apt install cuda-toolkit-12-9

method 2: runfile installer

more control but manual management:

# download runfile
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_550.54.14_linux.run

# make executable
chmod +x cuda_12.9.0_550.54.14_linux.run

# install (skip driver if already installed)
sudo sh cuda_12.9.0_550.54.14_linux.run --toolkit --silent --override

interactive options:

# interactive mode for component selection
sudo sh cuda_12.9.0_550.54.14_linux.run
# select cuda-gdb-src in "CUDA Tools 12.9" for debugging

post-installation setup

environment variables

add to ~/.bashrc:

# cuda paths
export PATH=/usr/local/cuda-12.9/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# optional: default cuda version
export CUDA_HOME=/usr/local/cuda-12.9

apply changes:

source ~/.bashrc

verification

# check nvcc
nvcc --version
# nvcc: NVIDIA (R) Cuda compiler driver
# Cuda compilation tools, release 12.9

# compile test
cat > test.cu << 'EOF'
#include <stdio.h>
__global__ void hello() {
    printf("Hello from GPU!\n");
}
int main() {
    hello<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}
EOF

nvcc test.cu -o test
./test
# Hello from GPU!

component overview

default installation includes:

component	location	purpose
nvcc	`/usr/local/cuda/bin/nvcc`	cuda compiler
cuda libraries	`/usr/local/cuda/lib64/`	runtime libraries
headers	`/usr/local/cuda/include/`	development headers
samples	`/usr/local/cuda/samples/`	example code
nsight	`/usr/local/cuda/bin/nsight`	ide for cuda
cuda-gdb	`/usr/local/cuda/bin/cuda-gdb`	cuda debugger

multiple cuda versions

install multiple versions side-by-side:

# install cuda 11.8
sudo apt install cuda-toolkit-11-8

# install cuda 12.9
sudo apt install cuda-toolkit-12-9

# switch versions
sudo update-alternatives --config cuda

# or manually
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

version management script

#!/bin/bash
# save as ~/bin/cuda-switch

CUDA_VERSION=$1
if [ -z "$CUDA_VERSION" ]; then
    echo "Usage: cuda-switch <version>"
    echo "Available versions:"
    ls -1 /usr/local/ | grep cuda- | sed 's/cuda-//'
    exit 1
fi

export PATH=/usr/local/cuda-$CUDA_VERSION/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA_VERSION/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-$CUDA_VERSION

echo "Switched to CUDA $CUDA_VERSION"
nvcc --version

development tools

cuda samples

# copy samples to home
cp -r /usr/local/cuda/samples ~/cuda-samples

# build all samples
cd ~/cuda-samples
make -j$(nproc)

# run deviceQuery
./bin/x86_64/linux/release/deviceQuery

profiling tools

# nsight systems
nsys profile ./myapp

# nsight compute
ncu ./myapp

# legacy profiler
nvprof ./myapp  # deprecated but still useful

debugging

# compile with debug info
nvcc -g -G test.cu -o test

# debug with cuda-gdb
cuda-gdb ./test
(cuda-gdb) break main
(cuda-gdb) run
(cuda-gdb) info cuda kernels

cudnn installation

deep learning requires cudnn:

# download from nvidia (requires account)
# https://developer.nvidia.com/cudnn

# install deb package
sudo dpkg -i cudnn-linux-x86_64-9.3.0.xxx_cuda12.deb

# or manual installation
tar -xf cudnn-linux-x86_64-9.3.0.xxx_cuda12.tgz
sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

uninstallation

deb method

# remove cuda toolkit
sudo apt remove --purge cuda-toolkit-12-9
sudo apt autoremove

# remove repository
sudo rm /etc/apt/sources.list.d/cuda-ubuntu2204-x86_64.list
sudo apt update

runfile method

# use uninstaller
sudo /usr/local/cuda-12.9/bin/cuda-uninstaller

# or manual removal
sudo rm -rf /usr/local/cuda-12.9

troubleshooting

gcc version mismatch

# cuda 12.9 supports gcc 11-13
sudo apt install gcc-12 g++-12
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100

library not found

# regenerate cache
sudo ldconfig

# check library path
ldconfig -p | grep cuda

# manually add path
echo '/usr/local/cuda/lib64' | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

kernel module issues

# rebuild kernel modules
sudo dkms status
sudo dkms install nvidia/xxx.xx.xx

# check loaded modules
lsmod | grep nvidia

performance tuning

persistence mode

# enable for lower latency
sudo nvidia-smi -pm 1

# set clock speeds
sudo nvidia-smi -ac 1215,1410  # memory,graphics clocks

memory overclocking

# check current clocks
nvidia-smi -q -d CLOCK

# set memory transfer rate offset
sudo nvidia-settings -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=500'

integration testing

pytorch test

# test pytorch cuda
python -c "import torch; print(torch.cuda.is_available())"

# detailed info
python -c "import torch; print(torch.version.cuda)"

tensorflow test

# test tensorflow cuda
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

comparison with docker

aspect	native	docker
performance	baseline	~same
isolation	none	complete
disk usage	~4gb	~2gb/image
multi-version	complex	simple
system impact	high	minimal

best practices

backup before installing
- driver conflicts possible
- kernel module issues
use cuda-toolkit-x-y packages
- easier updates
- dependency management
avoid mixing methods
- deb or runfile, not both
- conflicts with paths
test after updates
- kernel updates can break modules
- driver updates affect cuda

tips

install cuda samples for testing
use nvidia-smi dmon for monitoring
cuda-memcheck for memory debugging
prefer docker for production
native for development only