As LLM such as OpenAI GPT becomes very popular, many attempts have been done to install LLM in local environment. The most famous LLM that we can install in local environment is indeed LLAMA models. However running LLMs requires lots of computing power even when just generating texts. Therefore we need GPUs to boost up the speed of generating.
Recently C/C++ port of LLAMA model has been developed. Since it is written in C/C++ language which is high-performance programming language, it could be running faster than ChatGPT with high-performance computing platform.
Although I donβt have such a high-performance computing platform, I tried to install some LLAMA cpp models with GPU enables.
It is fine-tuned version of LLAMA and It shows great performance on Extraction, Coding, STEM, and Writing compare to other LLAMA models. LLAMA cpp team introduced a new format called GGUF for cpp models. Below repo contains model of GGUF format and I used this model to install.
https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF
To use LLAMA cpp, llama-cpp-python package should be installed. But to use GPU, we must set environment variable first. Make sure that there is no space,ββ, or ββ when set environment variable.
Since I use anaconda, run below codes to install llama-cpp-python.
# on anaconda prompt!set CMAKE_ARGS=-DLLAMA_CUBLAS=onpip install llama-cpp-python# if you somehow fail and need to re-install run below codes.# it ignore files that downloaded previously and re-install with new files.pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir --verbose
Running above code actually showed no errors, but you have to check if it is installed properly. When you run the model actually (with verbose True option), you can observe logs like below, and BLAS must be set as 1. Otherwise LLAMA model would not use GPU.
Skip the extension β just come straight here.
Weβve built a fast, permanent tool you can bookmark and use anytime.
Go To Paywall Unblock Tool