Installing llama-cpp-python with CUDA on Windows (2026 Troubleshooting Guide)
The Problem
Installing llama-cpp-python with CUDA acceleration on Windows can be unexpectedly difficult, especially on newer systems using:
- CUDA 12.9+
- Visual Studio 2026
- Python 3.13
- MSYS2 / MinGW environments
- Ninja builds
The errors are often vague and misleading, even though all required software appears to be installed correctly.
This guide documents the exact problems encountered during installation and how they were resolved.
System Configuration
This setup was tested on:
| Component | Version |
| ------------------------- | ------------------------- |
| OS | Windows 11 |
| GPU | NVIDIA RTX 3070 Ti |
| CUDA | 12.9 |
| Python | 3.13 |
| Visual Studio Build Tools | VS 2026 |
| Build System | Ninja |
| Package | llama-cpp-python 0.3.23 |
Symptoms / Errors Encountered
1. torch / setuptools Conflict
After upgrading build tools:
torch 2.11.0+cu128 requires setuptools<82
Fix
Downgrade setuptools:
pip install "setuptools<82"
2. CUDA Toolkit Found, But "No CUDA Toolset Found"
Initial error:
Found CUDAToolkit: ...
CUDA Toolkit found
CMake Error:
No CUDA toolset found.
Cause
CUDA was installed correctly, but:
- CMake was using the Visual Studio 2026 generator
- CUDA support lagged behind the newest VS toolchain
3. Ninja Picked the Wrong Compiler
After switching to Ninja:
The C compiler identification is unknown
Check for working C compiler:
C:/msys64/mingw64/bin/cc.exe
Cause
CMake automatically selected MinGW/MSYS2 gcc instead of MSVC.
CUDA on Windows generally expects:
- MSVC (
cl.exe) - NOT MinGW gcc
4. cl.exe Not Found
After forcing MSVC:
Could not find the compiler specified in the environment variable CC:
cl.exe
Cause
The regular PowerShell terminal did not contain the Visual Studio compiler environment variables.
The Working Solution
Step 1 — Install Required Components
Install:
- Python
- CUDA Toolkit
- Visual Studio Build Tools
- Ninja
- CMake
Required Visual Studio Components
Inside Visual Studio Build Tools installer:
Install:
- Desktop development with C++
- MSVC toolchain
- Windows SDK
Step 2 — Verify CUDA Installation
Run:
nvcc --version
Expected:
Cuda compilation tools, release 12.9
Verify path:
where.exe nvcc
Expected:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin\nvcc.exe
Step 3 — Open the Correct Terminal
This part is critical.
DO NOT use normal PowerShell initially.
Open:
x64 Native Tools Command Prompt
Prefer:
Visual Studio 2022
If unavailable, use the newest installed version.
Step 4 — Activate Python Environment
Example:
E:
cd E:\Home\Documents\Programming\tz_llm
.venv\Scripts\activate
Step 5 — Install Build Dependencies
pip install --upgrade pip wheel cmake ninja
pip install "setuptools<82"
Step 6 — Configure Build Environment
Set environment variables:
set FORCE_CMAKE=1
set CMAKE_GENERATOR=Ninja
set CC=cl
set CXX=cl
set CUDAToolkit_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9
set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=86 -DCMAKE_CUDA_COMPILER="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin\nvcc.exe"
Step 7 — Install llama-cpp-python
pip install --no-cache-dir --force-reinstall llama-cpp-python
Verify CUDA Support
Run Python:
from llama_cpp import Llama
print("llama-cpp-python loaded successfully")
If CUDA support is working correctly, model loading should report CUDA layers being offloaded to the GPU.
Why This Is So Difficult
Several independent systems interact during this installation:
| Component | Role | | ------------- | -------------------------- | | Python | Package management | | pip | Build orchestration | | CMake | Native build generation | | Ninja | Build execution | | MSVC | Windows compiler | | CUDA | GPU compilation | | Visual Studio | CUDA toolchain integration | | llama.cpp | Native C/C++ backend |
Unfortunately:
- CUDA strongly prefers MSVC
- Windows supports multiple compiler ecosystems
- CMake auto-detects compilers incorrectly
- CUDA often lags behind the newest Visual Studio versions
- Python packages hide low-level native build complexity
This creates confusing failure chains where the actual issue is hidden several layers below the visible error.
Recommendations
For the least painful setup:
| Recommended | Avoid |
| -------------------- | ----------------------- |
| Python 3.11 or 3.12 | Bleeding-edge Python |
| VS 2022 Build Tools | Newest unreleased VS |
| Stable CUDA versions | Brand-new CUDA releases |
| MSVC (cl.exe) | MinGW for CUDA builds |
Notes About MSYS2 / MinGW
If you use MSYS2 or MinGW for development, be aware:
CMake may silently select:
C:/msys64/mingw64/bin/cc.exe
This often breaks CUDA builds on Windows.
To force MSVC:
set CC=cl
set CXX=cl
Final Thoughts
Once successfully built, llama-cpp-python is generally stable and performs very well with CUDA acceleration.
The installation process is far more difficult than it should be, especially given how common the Python + CUDA + NVIDIA stack is becoming in local AI development.
Hopefully this guide saves someone else several hours of frustration.