cuda native installation
on this page
overview
native cuda toolkit installation for compiling cuda code and maximum performance. required for kernel development and cuda compilation.
current version: cuda 12.9 update 1 (august 2025)
when to use:
- cuda kernel development
- custom cuda libraries
- system-wide cuda tools
- no docker overhead acceptable
prerequisites
# verify nvidia driver
nvidia-smi
# need driver 550.54.14+ for cuda 12.9
# check gcc
gcc --version
# need gcc 11 or 12
# verify kernel headers
uname -r
ls /usr/src/linux-headers-$(uname -r)
installation methods
method 1: deb packages (recommended)
for ubuntu 22.04:
# download cuda keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
# update and install
sudo apt update
sudo apt install cuda-toolkit-12-9
for ubuntu 24.04:
# download cuda keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
# update and install
sudo apt update
sudo apt install cuda-toolkit-12-9
method 2: runfile installer
more control but manual management:
# download runfile
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_550.54.14_linux.run
# make executable
chmod +x cuda_12.9.0_550.54.14_linux.run
# install (skip driver if already installed)
sudo sh cuda_12.9.0_550.54.14_linux.run --toolkit --silent --override
interactive options:
# interactive mode for component selection
sudo sh cuda_12.9.0_550.54.14_linux.run
# select cuda-gdb-src in "CUDA Tools 12.9" for debugging
post-installation setup
environment variables
add to ~/.bashrc
:
# cuda paths
export PATH=/usr/local/cuda-12.9/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# optional: default cuda version
export CUDA_HOME=/usr/local/cuda-12.9
apply changes:
source ~/.bashrc
verification
# check nvcc
nvcc --version
# nvcc: NVIDIA (R) Cuda compiler driver
# Cuda compilation tools, release 12.9
# compile test
cat > test.cu << 'EOF'
#include <stdio.h>
__global__ void hello() {
printf("Hello from GPU!\n");
}
int main() {
hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
EOF
nvcc test.cu -o test
./test
# Hello from GPU!
component overview
default installation includes:
component | location | purpose |
---|---|---|
nvcc | /usr/local/cuda/bin/nvcc | cuda compiler |
cuda libraries | /usr/local/cuda/lib64/ | runtime libraries |
headers | /usr/local/cuda/include/ | development headers |
samples | /usr/local/cuda/samples/ | example code |
nsight | /usr/local/cuda/bin/nsight | ide for cuda |
cuda-gdb | /usr/local/cuda/bin/cuda-gdb | cuda debugger |
multiple cuda versions
install multiple versions side-by-side:
# install cuda 11.8
sudo apt install cuda-toolkit-11-8
# install cuda 12.9
sudo apt install cuda-toolkit-12-9
# switch versions
sudo update-alternatives --config cuda
# or manually
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
version management script
#!/bin/bash
# save as ~/bin/cuda-switch
CUDA_VERSION=$1
if [ -z "$CUDA_VERSION" ]; then
echo "Usage: cuda-switch <version>"
echo "Available versions:"
ls -1 /usr/local/ | grep cuda- | sed 's/cuda-//'
exit 1
fi
export PATH=/usr/local/cuda-$CUDA_VERSION/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA_VERSION/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-$CUDA_VERSION
echo "Switched to CUDA $CUDA_VERSION"
nvcc --version
development tools
cuda samples
# copy samples to home
cp -r /usr/local/cuda/samples ~/cuda-samples
# build all samples
cd ~/cuda-samples
make -j$(nproc)
# run deviceQuery
./bin/x86_64/linux/release/deviceQuery
profiling tools
# nsight systems
nsys profile ./myapp
# nsight compute
ncu ./myapp
# legacy profiler
nvprof ./myapp # deprecated but still useful
debugging
# compile with debug info
nvcc -g -G test.cu -o test
# debug with cuda-gdb
cuda-gdb ./test
(cuda-gdb) break main
(cuda-gdb) run
(cuda-gdb) info cuda kernels
cudnn installation
deep learning requires cudnn:
# download from nvidia (requires account)
# https://developer.nvidia.com/cudnn
# install deb package
sudo dpkg -i cudnn-linux-x86_64-9.3.0.xxx_cuda12.deb
# or manual installation
tar -xf cudnn-linux-x86_64-9.3.0.xxx_cuda12.tgz
sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
uninstallation
deb method
# remove cuda toolkit
sudo apt remove --purge cuda-toolkit-12-9
sudo apt autoremove
# remove repository
sudo rm /etc/apt/sources.list.d/cuda-ubuntu2204-x86_64.list
sudo apt update
runfile method
# use uninstaller
sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
# or manual removal
sudo rm -rf /usr/local/cuda-12.9
troubleshooting
gcc version mismatch
# cuda 12.9 supports gcc 11-13
sudo apt install gcc-12 g++-12
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100
library not found
# regenerate cache
sudo ldconfig
# check library path
ldconfig -p | grep cuda
# manually add path
echo '/usr/local/cuda/lib64' | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
kernel module issues
# rebuild kernel modules
sudo dkms status
sudo dkms install nvidia/xxx.xx.xx
# check loaded modules
lsmod | grep nvidia
performance tuning
persistence mode
# enable for lower latency
sudo nvidia-smi -pm 1
# set clock speeds
sudo nvidia-smi -ac 1215,1410 # memory,graphics clocks
memory overclocking
# check current clocks
nvidia-smi -q -d CLOCK
# set memory transfer rate offset
sudo nvidia-settings -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=500'
integration testing
pytorch test
# test pytorch cuda
python -c "import torch; print(torch.cuda.is_available())"
# detailed info
python -c "import torch; print(torch.version.cuda)"
tensorflow test
# test tensorflow cuda
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
comparison with docker
aspect | native | docker |
---|---|---|
performance | baseline | ~same |
isolation | none | complete |
disk usage | ~4gb | ~2gb/image |
multi-version | complex | simple |
system impact | high | minimal |
best practices
backup before installing
- driver conflicts possible
- kernel module issues
use cuda-toolkit-x-y packages
- easier updates
- dependency management
avoid mixing methods
- deb or runfile, not both
- conflicts with paths
test after updates
- kernel updates can break modules
- driver updates affect cuda
tips
- install cuda samples for testing
- use
nvidia-smi dmon
for monitoring cuda-memcheck
for memory debugging- prefer docker for production
- native for development only
resources
══════════════════════════════════════════════════════════════════