shuvom commited on
Commit
ac8e3eb
1 Parent(s): 79084cf

update the usage

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -31,4 +31,39 @@ GGUF and GGML are file formats used for storing models for inference, especially
31
 
32
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
33
  | ---- | ---- | ---- | ---- | ---- | ----- |
34
- | [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
33
  | ---- | ---- | ---- | ---- | ---- | ----- |
34
+ | [yuj-v1.Q4_K_M.gguf](https://huggingface.co/shuvom/yuj-v1-GGUF/blob/main/yuj-v1.Q4_K_M.gguf) | Q4_K_M | 4 | 4.17 GB| 6.87 GB | medium, balanced quality - recommended |
35
+
36
+ ## Usage
37
+
38
+ 1. Installing lamma.cpp python client and HuggingFace-hub
39
+ ```python
40
+ !pip install llama-cpp-python huggingface-hub
41
+ ```
42
+ 2. Downloading GGUF formatted model
43
+ ```python
44
+ !huggingface-cli download shuvom/yuj-v1-GGUF yuj-v1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
45
+ ```
46
+ 3. Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
47
+ ```python
48
+ from llama_cpp import Llama
49
+
50
+ llm = Llama(
51
+ model_path="./yuj-v1.Q4_K_M.gguf", # Download the model file first
52
+ n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
53
+ n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
54
+ n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
55
+ )
56
+ ```
57
+ 4. Chat Completion API
58
+ ```python
59
+ llm = Llama(model_path="/content/yuj-v1.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
60
+ llm.create_chat_completion(
61
+ messages = [
62
+ {"role": "system", "content": "You are a story writing assistant."},
63
+ {
64
+ "role": "user",
65
+ "content": "युज शीर्ष द्विभाषी मॉडल में से एक है"
66
+ }
67
+ ]
68
+ )
69
+ ```