Ranjanunicode
/

unicode-llama-2-chat-Hf-q4-gguf

Inference Endpoints

Model card Files Files and versions Community

Ranjanunicode commited on May 4

Commit

7bfe306

•

1 Parent(s): 62efeb6

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ base_model:
 - meta-llama/Llama-2-7b-chat-hf
 ---
-# Q-int4 unicode-llama-2-chat-Hf-q4-2
 - A condensed edition of Llama 2 chat hugging face, designed for deployment with minimal hardware specifications.
@@ -51,7 +51,7 @@ Output Models generate text only.
 from ctransformers import AutoModelForCausalLM
 #Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
-llm = AutoModelForCausalLM.from_pretrained("Ranjanunicode/unicode-llama-2-chat-Hf-q4-2", model_file="unicode-llama-2-chat-Hf-q4-2.gguf", model_type="llama", gpu_layers=40)
 print(llm("AI is going to"))

 - meta-llama/Llama-2-7b-chat-hf
 ---
+# Q-int4 unicode-llama-2-chat-Hf-q4-gguf
 - A condensed edition of Llama 2 chat hugging face, designed for deployment with minimal hardware specifications.
 from ctransformers import AutoModelForCausalLM
 #Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
+llm = AutoModelForCausalLM.from_pretrained("Ranjanunicode/unicode-llama-2-chat-Hf-q4-gguf", model_file="unicode-llama-2-chat-Hf-q4-2.gguf", model_type="llama", gpu_layers=40)
 print(llm("AI is going to"))