Update README.md (#9)

Browse files

- Update README.md (2204e44dac5929245fe2aedcaecdddffd892846d)

Co-authored-by: Zoey Shu <[email protected]>

Files changed (1) hide show

README.md +35 -21

README.md CHANGED Viewed

@@ -31,7 +31,14 @@ tags:
 **Acknowledgement**:
 We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
-## (Recommended) Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)
 1. **Clone and compile:**
@@ -42,49 +49,56 @@ cd llama.cpp
 make
 ```
-2. **Prepare the Input Prompt File:**
-Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.
-   `chat-with-octopus.txt`:
 ```bash
-User:
 ```
-3. **Execute the Model:**
-Run the following command in the terminal:
 ```bash
-./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt
 ```
-Example prompt to interact
 ```bash
-<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>
 ```
-## Run with [Ollama](https://github.com/ollama/ollama)
-1. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
 ```bash
-FROM ./path/to/octopus-v4-Q4_K_M.gguf
-```
-2. Use the following command to add the model to Ollama:
 ```bash
-ollama create octopus-v4-Q4_K_M -f Modelfile
 PARAMETER temperature 0
 PARAMETER num_ctx 1024
 PARAMETER stop <nexa_end>
 ```
-3. Verify that the model has been successfully imported:
 ```bash
 ollama ls
 ```
-### Run the model
 ```bash
 ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
 ```
@@ -119,4 +133,4 @@ ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query fro
 | Octopus-v4-Q8_0.gguf   | Q8_0         | 8    | 4.06 GB | 50.10                  | very large, good quality                  |
 | Octopus-v4-f16.gguf    | f16          | 16   | 7.64 GB | 30.61                  | extremely large                           |
-_Quantized with llama.cpp_

 **Acknowledgement**:
 We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
+## Get Started
+To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
+```
+git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
+```
+## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended)
 1. **Clone and compile:**
 make
 ```
+2. **Execute the Model:**
+Run the following command in the terminal:
 ```bash
+./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
 ```
+## Run with [Ollama](https://github.com/ollama/ollama)
+Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
+1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)
 ```bash
+git clone https://github.com/ollama/ollama.git ollama
 ```
+2. Locate the local Ollama directory:
 ```bash
+cd ollama
 ```
+3. Create a `Modelfile` in your directory
 ```bash
+touch Modelfile
+```
+4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:
 ```bash
+FROM ./path/to/octopus-v4-Q4_K_M.gguf
 PARAMETER temperature 0
 PARAMETER num_ctx 1024
 PARAMETER stop <nexa_end>
 ```
+5. Use the following command to add the model to Ollama:
+```bash
+ollama create octopus-v4-Q4_K_M -f Modelfile
+```
+6. Verify that the model has been successfully imported:
 ```bash
 ollama ls
 ```
+7. Run the model
 ```bash
 ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
 ```
 | Octopus-v4-Q8_0.gguf   | Q8_0         | 8    | 4.06 GB | 50.10                  | very large, good quality                  |
 | Octopus-v4-f16.gguf    | f16          | 16   | 7.64 GB | 30.61                  | extremely large                           |
+_Quantized with llama.cpp_