update the usage
Browse files
README.md
CHANGED
@@ -31,4 +31,39 @@ GGUF and GGML are file formats used for storing models for inference, especially
|
|
31 |
|
32 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
33 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
34 |
-
| [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
33 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
34 |
+
| [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |
|
35 |
+
|
36 |
+
## Usage
|
37 |
+
|
38 |
+
1. Installing lamma.cpp python client and HuggingFace-hub
|
39 |
+
```python
|
40 |
+
!pip install llama-cpp-python huggingface-hub
|
41 |
+
```
|
42 |
+
2. Downloading GGUF formatted model
|
43 |
+
```python
|
44 |
+
!huggingface-cli download shuvom/yuj-v1-GGUF yuj-v1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
|
45 |
+
```
|
46 |
+
3. Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
47 |
+
```python
|
48 |
+
from llama_cpp import Llama
|
49 |
+
|
50 |
+
llm = Llama(
|
51 |
+
model_path="./yuj-v1.Q4_K_M.gguf", # Download the model file first
|
52 |
+
n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
|
53 |
+
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
|
54 |
+
n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
|
55 |
+
)
|
56 |
+
```
|
57 |
+
4. Chat Completion API
|
58 |
+
```python
|
59 |
+
llm = Llama(model_path="/content/yuj-v1.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
|
60 |
+
llm.create_chat_completion(
|
61 |
+
messages = [
|
62 |
+
{"role": "system", "content": "You are a story writing assistant."},
|
63 |
+
{
|
64 |
+
"role": "user",
|
65 |
+
"content": "युज शीर्ष द्विभाषी मॉडल में से एक है"
|
66 |
+
}
|
67 |
+
]
|
68 |
+
)
|
69 |
+
```
|