How to install LLAMA CPP with CUDA (on Windows) | by Kaizin | Medium

See original article

Installing LLAMA CPP with CUDA on Windows

This article details the process of installing the LLAMA CPP model, a C/C++ port of LLAMA, on Windows with CUDA for GPU acceleration. The focus is on leveraging the performance benefits of GPUs for faster text generation.

Zephyr 7B Model

The guide utilizes the Zephyr 7B model, a fine-tuned version of LLAMA known for its performance in various tasks. It uses the GGUF format. The Hugging Face repository is referenced: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF

Installation Steps

  • Install the llama-cpp-python package using Anaconda prompt, setting the environment variable CMAKE_ARGS=-DLLAMA_CUBLAS=on. This ensures CUDA support.
  • Use the command pip install llama-cpp-python, or if needed, pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir --verbose for reinstallation.
  • Verify successful installation by checking logs during model execution; BLAS should be set to 1, indicating GPU usage.
Sign up for a free account and get the following:
  • Save articles and sync them across your devices
  • Get a digest of the latest premium articles in your inbox twice a week, personalized to you (Coming soon).
  • Get access to our AI features