Update README.md (#9)
Browse files- Update README.md (2204e44dac5929245fe2aedcaecdddffd892846d)
Co-authored-by: Zoey Shu <[email protected]>
README.md
CHANGED
@@ -31,7 +31,14 @@ tags:
|
|
31 |
**Acknowledgement**:
|
32 |
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
1. **Clone and compile:**
|
37 |
|
@@ -42,49 +49,56 @@ cd llama.cpp
|
|
42 |
make
|
43 |
```
|
44 |
|
45 |
-
2. **
|
46 |
-
|
47 |
-
Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.
|
48 |
|
49 |
-
|
50 |
|
51 |
```bash
|
52 |
-
|
53 |
```
|
54 |
-
|
55 |
-
3. **Execute the Model:**
|
56 |
|
57 |
-
Run
|
|
|
|
|
|
|
|
|
58 |
|
59 |
```bash
|
60 |
-
|
61 |
```
|
62 |
|
63 |
-
|
64 |
```bash
|
65 |
-
|
66 |
```
|
67 |
|
68 |
-
|
69 |
-
1. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
|
70 |
```bash
|
71 |
-
|
72 |
-
```
|
|
|
|
|
73 |
|
74 |
-
2. Use the following command to add the model to Ollama:
|
75 |
```bash
|
76 |
-
|
77 |
PARAMETER temperature 0
|
78 |
PARAMETER num_ctx 1024
|
79 |
PARAMETER stop <nexa_end>
|
80 |
```
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
```bash
|
84 |
ollama ls
|
85 |
```
|
86 |
|
87 |
-
|
88 |
```bash
|
89 |
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
90 |
```
|
@@ -119,4 +133,4 @@ ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query fro
|
|
119 |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
|
120 |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
|
121 |
|
122 |
-
_Quantized with llama.cpp_
|
|
|
31 |
**Acknowledgement**:
|
32 |
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
|
33 |
|
34 |
+
|
35 |
+
## Get Started
|
36 |
+
To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
|
37 |
+
```
|
38 |
+
git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
|
39 |
+
```
|
40 |
+
|
41 |
+
## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended)
|
42 |
|
43 |
1. **Clone and compile:**
|
44 |
|
|
|
49 |
make
|
50 |
```
|
51 |
|
52 |
+
2. **Execute the Model:**
|
|
|
|
|
53 |
|
54 |
+
Run the following command in the terminal:
|
55 |
|
56 |
```bash
|
57 |
+
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
58 |
```
|
|
|
|
|
59 |
|
60 |
+
## Run with [Ollama](https://github.com/ollama/ollama)
|
61 |
+
|
62 |
+
Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
|
63 |
+
|
64 |
+
1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)
|
65 |
|
66 |
```bash
|
67 |
+
git clone https://github.com/ollama/ollama.git ollama
|
68 |
```
|
69 |
|
70 |
+
2. Locate the local Ollama directory:
|
71 |
```bash
|
72 |
+
cd ollama
|
73 |
```
|
74 |
|
75 |
+
3. Create a `Modelfile` in your directory
|
|
|
76 |
```bash
|
77 |
+
touch Modelfile
|
78 |
+
```
|
79 |
+
|
80 |
+
4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:
|
81 |
|
|
|
82 |
```bash
|
83 |
+
FROM ./path/to/octopus-v4-Q4_K_M.gguf
|
84 |
PARAMETER temperature 0
|
85 |
PARAMETER num_ctx 1024
|
86 |
PARAMETER stop <nexa_end>
|
87 |
```
|
88 |
|
89 |
+
5. Use the following command to add the model to Ollama:
|
90 |
+
|
91 |
+
```bash
|
92 |
+
ollama create octopus-v4-Q4_K_M -f Modelfile
|
93 |
+
```
|
94 |
+
|
95 |
+
6. Verify that the model has been successfully imported:
|
96 |
+
|
97 |
```bash
|
98 |
ollama ls
|
99 |
```
|
100 |
|
101 |
+
7. Run the model
|
102 |
```bash
|
103 |
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
104 |
```
|
|
|
133 |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
|
134 |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
|
135 |
|
136 |
+
_Quantized with llama.cpp_
|