shuvom
/

yuj-v1-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

shuvom commited on Feb 13

Commit

ac8e3eb

•

1 Parent(s): 79084cf

update the usage

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -31,4 +31,39 @@ GGUF and GGML are file formats used for storing models for inference, especially
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
-| [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |

 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |
+## Usage
+1. Installing lamma.cpp python client and HuggingFace-hub
+```python
+!pip install llama-cpp-python huggingface-hub
+```
+2. Downloading GGUF formatted model
+```python
+!huggingface-cli download shuvom/yuj-v1-GGUF yuj-v1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
+```
+3. Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
+```python
+from llama_cpp import Llama
+llm = Llama(
+  model_path="./yuj-v1.Q4_K_M.gguf",  # Download the model file first
+  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
+  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
+  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
+)
+```
+4. Chat Completion API
+```python
+llm = Llama(model_path="/content/yuj-v1.Q4_K_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
+llm.create_chat_completion(
+    messages = [
+        {"role": "system", "content": "You are a story writing assistant."},
+        {
+            "role": "user",
+            "content": "युज शीर्ष द्विभाषी मॉडल में से एक है"
+        }
+    ]
+)
+```