chenghenry
/

gemma-2-9b-it-GGUF

Inference Endpoints

Model card Files Files and versions Community

chenghenry commited on Jul 4

Commit

b10f354

•

1 Parent(s): 8ff40e8

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -2,4 +2,39 @@
 license: gemma
 library_name: transformers
 base_model: google/gemma-2-9b-it
----

 license: gemma
 library_name: transformers
 base_model: google/gemma-2-9b-it
+---
+## Usage (llama-cli with GPU):
+```
+llama-cli -m ./gemma-2-9b-it-Q6_K.gguf -ngl 100 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
+```
+## Usage (llama-cli with CPU):
+```
+llama-cli -m ./gemma-2-9b-it-Q6_K.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"
+```
+## Usage (llama-cpp-python via Hugging Face Hub):
+```
+from llama_cpp import Llama
+llm = Llama.from_pretrained(
+    repo_id="chenghenry/gemma-2-9b-it-GGUF",
+    filename="gemma-2-9b-it-Q6_K.gguf",
+    n_ctx=8192,
+    n_batch=2048,
+    n_gpu_layers=100,
+    verbose=False,
+    chat_format="gemma"
+)
+prompt = "Why is the sky blue?"
+messages = [{"role": "user", "content": prompt}]
+response = llm.create_chat_completion(
+    messages=messages,
+    repeat_penalty=1.0,
+    temperature=0)
+print(response["choices"][0]["message"]["content"])
+```