metadata

language:
  - en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
  - function calling
  - on-device language model
  - gguf
  - llama cpp

Octopus V4-GGUF: Graph of language models

- Original Model - Nexa AI Website - Octopus-v4 Github - ArXiv - Domain LLM Leaderbaord

Acknowledgement:
We sincerely thank our community members, Mingyuan and Zoey, for their extraordinary contributions to this quantization effort. Please explore Octopus-v4 for our original huggingface model.

Get Started

To run the models, please download them to your local machine using either git clone or Hugging Face Hub

git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf

Run with llama.cpp (Recommended)

Clone and compile:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make

Execute the Model:

Run the following command in the terminal:

./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"

Run with Ollama

Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:

Install Ollama on your local machine. You can also following the guide from Ollama GitHub repository

git clone https://github.com/ollama/ollama.git ollama

Locate the local Ollama directory:

cd ollama

Create a Modelfile in your directory

touch Modelfile

In the Modelfile, include a FROM statement with the path to your local model, and the default parameters:

FROM ./path/to/octopus-v4-Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER num_ctx 1024
PARAMETER stop <nexa_end>

Use the following command to add the model to Ollama:

ollama create octopus-v4-Q4_K_M -f Modelfile

Verify that the model has been successfully imported:

ollama ls

Run the model

ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"

Dataset and Benchmark

Utilized questions from MMLU to evaluate the performances.
Evaluated with the Ollama llm-benchmark method.

Quantized GGUF Models

Name	Quant method	Bits	Size	Respons (token/second)	Use Cases
Octopus-v4.gguf			7.64 GB	27.64	extremely large
Octopus-v4-Q2_K.gguf	Q2_K	2	1.42 GB	54.20	extremely not recommended, high loss
Octopus-v4-Q3_K.gguf	Q3_K	3	1.96 GB	51.22	not recommended
Octopus-v4-Q3_K_S.gguf	Q3_K_S	3	1.68 GB	51.78	not very recommended
Octopus-v4-Q3_K_M.gguf	Q3_K_M	3	1.96 GB	50.86	not very recommended
Octopus-v4-Q3_K_L.gguf	Q3_K_L	3	2.09 GB	50.05	not very recommended
Octopus-v4-Q4_0.gguf	Q4_0	4	2.18 GB	65.76	good quality, recommended
Octopus-v4-Q4_1.gguf	Q4_1	4	2.41 GB	69.01	slow, good quality, recommended
Octopus-v4-Q4_K.gguf	Q4_K	4	2.39 GB	55.76	slow, good quality, recommended
Octopus-v4-Q4_K_S.gguf	Q4_K_S	4	2.19 GB	53.98	high quality, recommended
Octopus-v4-Q4_K_M.gguf	Q4_K_M	4	2.39 GB	58.39	some functions loss, not very recommended
Octopus-v4-Q5_0.gguf	Q5_0	5	2.64 GB	61.98	slow, good quality
Octopus-v4-Q5_1.gguf	Q5_1	5	2.87 GB	63.44	slow, good quality
Octopus-v4-Q5_K.gguf	Q5_K	5	2.82 GB	58.28	moderate speed, recommended
Octopus-v4-Q5_K_S.gguf	Q5_K_S	5	2.64 GB	59.95	moderate speed, recommended
Octopus-v4-Q5_K_M.gguf	Q5_K_M	5	2.82 GB	53.31	fast, good quality, recommended
Octopus-v4-Q6_K.gguf	Q6_K	6	3.14 GB	52.15	large, not very recommended
Octopus-v4-Q8_0.gguf	Q8_0	8	4.06 GB	50.10	very large, good quality
Octopus-v4-f16.gguf	f16	16	7.64 GB	30.61	extremely large

Quantized with llama.cpp