Skip to content
Taggedz.me

Just me here.

Installing llama-cpp-python with CUDA on Windows (2026)

Installing llama-cpp-python with CUDA on Windows (2026 Troubleshooting Guide)

The Problem

Installing llama-cpp-python with CUDA acceleration on Windows can be unexpectedly difficult, especially on newer systems using:

  • CUDA 12.9+
  • Visual Studio 2026
  • Python 3.13
  • MSYS2 / MinGW environments
  • Ninja builds

The errors are often vague and misleading, even though all required software appears to be installed correctly.

This guide documents the exact problems encountered during installation and how they were resolved.


System Configuration

This setup was tested on:

| Component | Version | | ------------------------- | ------------------------- | | OS | Windows 11 | | GPU | NVIDIA RTX 3070 Ti | | CUDA | 12.9 | | Python | 3.13 | | Visual Studio Build Tools | VS 2026 | | Build System | Ninja | | Package | llama-cpp-python 0.3.23 |


Symptoms / Errors Encountered

1. torch / setuptools Conflict

After upgrading build tools:

torch 2.11.0+cu128 requires setuptools<82

Fix

Downgrade setuptools:

pip install "setuptools<82"

2. CUDA Toolkit Found, But "No CUDA Toolset Found"

Initial error:

Found CUDAToolkit: ...
CUDA Toolkit found

CMake Error:
No CUDA toolset found.

Cause

CUDA was installed correctly, but:

  • CMake was using the Visual Studio 2026 generator
  • CUDA support lagged behind the newest VS toolchain

3. Ninja Picked the Wrong Compiler

After switching to Ninja:

The C compiler identification is unknown

Check for working C compiler:
C:/msys64/mingw64/bin/cc.exe

Cause

CMake automatically selected MinGW/MSYS2 gcc instead of MSVC.

CUDA on Windows generally expects:

  • MSVC (cl.exe)
  • NOT MinGW gcc

4. cl.exe Not Found

After forcing MSVC:

Could not find the compiler specified in the environment variable CC:
cl.exe

Cause

The regular PowerShell terminal did not contain the Visual Studio compiler environment variables.


The Working Solution


Step 1 — Install Required Components

Install:

  • Python
  • CUDA Toolkit
  • Visual Studio Build Tools
  • Ninja
  • CMake

Required Visual Studio Components

Inside Visual Studio Build Tools installer:

Install:

  • Desktop development with C++
  • MSVC toolchain
  • Windows SDK

Step 2 — Verify CUDA Installation

Run:

nvcc --version

Expected:

Cuda compilation tools, release 12.9

Verify path:

where.exe nvcc

Expected:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin\nvcc.exe

Step 3 — Open the Correct Terminal

This part is critical.

DO NOT use normal PowerShell initially.

Open:

x64 Native Tools Command Prompt

Prefer:

Visual Studio 2022

If unavailable, use the newest installed version.


Step 4 — Activate Python Environment

Example:

E:
cd E:\Home\Documents\Programming\tz_llm
.venv\Scripts\activate

Step 5 — Install Build Dependencies

pip install --upgrade pip wheel cmake ninja
pip install "setuptools<82"

Step 6 — Configure Build Environment

Set environment variables:

set FORCE_CMAKE=1
set CMAKE_GENERATOR=Ninja

set CC=cl
set CXX=cl

set CUDAToolkit_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9

set CMAKE_ARGS=-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=86 -DCMAKE_CUDA_COMPILER="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin\nvcc.exe"

Step 7 — Install llama-cpp-python

pip install --no-cache-dir --force-reinstall llama-cpp-python

Verify CUDA Support

Run Python:

from llama_cpp import Llama
print("llama-cpp-python loaded successfully")

If CUDA support is working correctly, model loading should report CUDA layers being offloaded to the GPU.


Why This Is So Difficult

Several independent systems interact during this installation:

| Component | Role | | ------------- | -------------------------- | | Python | Package management | | pip | Build orchestration | | CMake | Native build generation | | Ninja | Build execution | | MSVC | Windows compiler | | CUDA | GPU compilation | | Visual Studio | CUDA toolchain integration | | llama.cpp | Native C/C++ backend |

Unfortunately:

  • CUDA strongly prefers MSVC
  • Windows supports multiple compiler ecosystems
  • CMake auto-detects compilers incorrectly
  • CUDA often lags behind the newest Visual Studio versions
  • Python packages hide low-level native build complexity

This creates confusing failure chains where the actual issue is hidden several layers below the visible error.


Recommendations

For the least painful setup:

| Recommended | Avoid | | -------------------- | ----------------------- | | Python 3.11 or 3.12 | Bleeding-edge Python | | VS 2022 Build Tools | Newest unreleased VS | | Stable CUDA versions | Brand-new CUDA releases | | MSVC (cl.exe) | MinGW for CUDA builds |


Notes About MSYS2 / MinGW

If you use MSYS2 or MinGW for development, be aware:

CMake may silently select:

C:/msys64/mingw64/bin/cc.exe

This often breaks CUDA builds on Windows.

To force MSVC:

set CC=cl
set CXX=cl

Final Thoughts

Once successfully built, llama-cpp-python is generally stable and performs very well with CUDA acceleration.

The installation process is far more difficult than it should be, especially given how common the Python + CUDA + NVIDIA stack is becoming in local AI development.

Hopefully this guide saves someone else several hours of frustration.