TheBloke/Llama-2-7B-Chat-GGUF · error loading model: GGUF with latest llama_cpp

conda list cuda-toolkit
cuda-toolkit 11.8.0 0 nvidia/label/cuda-11.8.0

llama_cpp_python 0.2.11 with LLAMA_CUBLAS=1

Python

from llama_cpp import Llama
llm = Llama(model_path="/opt/AI/LLM/Llama-2-7B-GGUF", n_ctx=2048)
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA RTXA6000-48Q, compute capability 8.6
gguf_init_from_file: invalid magic number 00000000
error loading model: llama_model_loader: failed to load model from /opt/AI/LLM/Llama-2-7B-GGUF

llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "", line 1, in
File "/opt/AI/anaconda3/envs/test/lib/python3.11/site-packages/llama_cpp/llama.py", line 365, in init
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError

llm = Llama(model_path="/opt/AI/LLM/Llama-2-7B-GGUF")
gguf_init_from_file: invalid magic number 00000000
error loading model: llama_model_loader: failed to load model from /opt/AI/LLM/Llama-2-7B-GGUF

TheBloke
/

Llama-2-7B-Chat-GGUF

error loading model: GGUF with latest llama_cpp_python 0.2.11