|
--- |
|
base_model: google/gemma-2-2b-it |
|
datasets: |
|
- DiTy/function-calling |
|
language: |
|
- en |
|
library_name: transformers |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- conversational |
|
- gemma2 |
|
- function-calling |
|
- trl |
|
--- |
|
|
|
# DiTy/gemma-2-2b-it-function-calling-GGUF |
|
|
|
> [!NOTE] |
|
> NB: If you want to use the model to call functions in complex, long and confusing dialogues, it is better to use a larger model [DiTy/gemma-2-9b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF) or [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF). |
|
|
|
This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) for the **Function Calling** task on non-synthetic data, |
|
fully annotated by humans only, on the English version of the <ins>*DiTy/function-calling*</ins> dataset. |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
In addition to **safetensors**, the model is available in **GGUF** formats (in this case, you need to download only a single file (*[how to inference GGUF model](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#high-level-api)*)): |
|
> [!NOTE] |
|
> However, it should be borne in mind that this model itself was difficult to master "Function Calling", so it is not recommended to use heavily quantized versions |
|
|
|
| Filename | Quant type | File Size | Description | |
|
| -------- | ---------- | --------- | ----------- | |
|
| [gemma-2-2B-it-function-calling-F16.gguf](https://huggingface.co/DiTy/gemma-2-2b-it-function-calling-GGUF/blob/main/gemma-2-2B-it-function-calling-F16.gguf) | F16 | 5.24GB | Base model with float16 *recommended* | |
|
| [gemma-2-2B-it-function-calling-Q8_0.gguf](https://huggingface.co/DiTy/gemma-2-2b-it-function-calling-GGUF/blob/main/gemma-2-2B-it-function-calling-Q8_0.gguf) | Q8_0 | 2.78GB | Extremely high quality, generally unneeded but max available quant. | |
|
|
|
## Model card tree |
|
|
|
* [How prepare your functions (tools) for *Function Calling*](#prepare_func_call) |
|
* [Just use chat template for generation](#just_chat_template) |
|
* [Prompt structure and expected content](#roles) |
|
* [Evaluation of function calling models](#eval) |
|
|
|
## Usage (HuggingFace Transformers) |
|
|
|
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: |
|
```bash |
|
pip install -U transformers |
|
``` |
|
|
|
### <a name="prepare_func_call"></a>Prepare your functions for *Function Calling* |
|
|
|
You should write the functions (tools) used by the model in *Python code* and make sure to add *Python docstrings* as in the example below: |
|
```python |
|
def get_weather(city: str): |
|
""" |
|
A function that returns the weather in a given city. |
|
|
|
Args: |
|
city: The city to get the weather for. |
|
""" |
|
import random |
|
|
|
return "sunny" if random.random() > 0.5 else "rainy" |
|
|
|
|
|
def get_sunrise_sunset_times(city: str): |
|
""" |
|
A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time]. |
|
|
|
Args: |
|
city: The city to get the sunrise and sunset times for. |
|
""" |
|
|
|
return ["6:00 AM", "6:00 PM"] |
|
``` |
|
|
|
### <a name="just_chat_template"></a>Just use chat template |
|
|
|
Next, you need to download the model and tokenizer: |
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"DiTy/gemma-2-2b-it-function-calling-GGUF", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, # use float16 or float32 if bfloat16 is not available to you. |
|
cache_dir=PATH_TO_MODEL_DIR, # optional |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained( |
|
"DiTy/gemma-2-2b-it-function-calling-GGUF", |
|
cache_dir=PATH_TO_MODEL_DIR, # optional |
|
) |
|
``` |
|
|
|
To get the result of generation, just use `apply_chat_template`. In order to take into account our written functions (tools), |
|
we need to pass them as a list through the `tools` attribute and also use `add_prompt_generation=True`. |
|
```python |
|
history_messages = [ |
|
{"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "}, |
|
{"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"}, |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
history_messages, |
|
tokenize=False, |
|
add_generation_prompt=True, # adding prompt for generation |
|
tools=[get_weather, get_sunrise_sunset_times], # our functions (tools) |
|
) |
|
|
|
print(inputs) |
|
``` |
|
|
|
Then our `inputs` will look like this: |
|
``` |
|
<bos><start_of_turn>user |
|
You are a helpful assistant with access to the following functions. Use them if required - { |
|
"name": "get_weather", |
|
"description": "A function that returns the weather in a given city.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the weather for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
}, |
|
{ |
|
"name": "get_sunrise_sunset_times", |
|
"description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the sunrise and sunset times for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
} |
|
|
|
Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn> |
|
<start_of_turn>model |
|
|
|
``` |
|
|
|
Now we can generate a model's response. |
|
Be careful because, after `apply_chat_template`, there is no need to *add special tokens* during tokenization. So, use `add_special_tokens=False`: |
|
```python |
|
terminator_ids = [ |
|
tokenizer.eos_token_id, |
|
tokenizer.convert_tokens_to_ids("<end_of_turn>"), |
|
] |
|
|
|
prompt_ids = tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device) |
|
generated_ids = model.generate( |
|
prompt_ids, |
|
max_new_tokens=512, |
|
eos_token_id=terminator_ids, |
|
bos_token_id=tokenizer.bos_token_id, |
|
) |
|
generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False) # `skip_special_tokens=False` for debug |
|
|
|
print(generated_response) |
|
``` |
|
|
|
We get the generation as a function call: |
|
``` |
|
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn> |
|
``` |
|
|
|
Great, now we can pick up and process the results with our *called function*, and then provide the model with the *function's response*: |
|
```python |
|
history_messages = [ |
|
{"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "}, |
|
{"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"}, |
|
{"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'}, |
|
{"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'}, # a hypothetical response from our function |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
history_messages, |
|
tokenize=False, |
|
add_generation_prompt=True, # adding prompt for generation |
|
tools=[get_weather, get_sunrise_sunset_times], # our functions (tools) |
|
) |
|
|
|
print(inputs) |
|
``` |
|
|
|
Let's make sure the `inputs` are correct: |
|
``` |
|
<bos><start_of_turn>user |
|
You are a helpful assistant with access to the following functions. Use them if required - { |
|
"name": "get_weather", |
|
"description": "A function that returns the weather in a given city.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the weather for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
}, |
|
{ |
|
"name": "get_sunrise_sunset_times", |
|
"description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the sunrise and sunset times for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
} |
|
|
|
Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn> |
|
<start_of_turn>model |
|
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn> |
|
<start_of_turn>user |
|
Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn> |
|
<start_of_turn>model |
|
|
|
``` |
|
|
|
Similarly, we generate a response from the model: |
|
```python |
|
prompt_ids = tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device) |
|
generated_ids = model.generate( |
|
prompt_ids, |
|
max_new_tokens=512, |
|
eos_token_id=terminator_ids, |
|
bos_token_id=tokenizer.bos_token_id, |
|
) |
|
generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False) # `skip_special_tokens=False` for debug |
|
|
|
print(generated_response) |
|
``` |
|
|
|
As a result, we get the model's response: |
|
``` |
|
The sunrise time in Los Angeles is 6:00 AM.<end_of_turn> |
|
``` |
|
|
|
## Usage via transformers `pipeline` |
|
|
|
<details> |
|
<summary> |
|
Generation via pipeline |
|
</summary> |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
|
|
generation_pipeline = pipeline( |
|
"text-generation", |
|
model="DiTy/gemma-2-2b-it-function-calling-GGUF", |
|
model_kwargs={ |
|
"torch_dtype": torch.bfloat16, # use float16 or float32 if bfloat16 is not supported for you. |
|
"cache_dir": PATH_TO_MODEL_DIR, # OPTIONAL |
|
}, |
|
device_map="auto", |
|
) |
|
|
|
history_messages = [ |
|
{"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "}, |
|
{"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"}, |
|
{"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'}, |
|
{"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'}, |
|
] |
|
|
|
inputs = generation_pipeline.tokenizer.apply_chat_template( |
|
history_messages, |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
tools=[get_weather, get_sunrise_sunset_times], |
|
) |
|
|
|
terminator_ids = [ |
|
generation_pipeline.tokenizer.eos_token_id, |
|
generation_pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>") |
|
] |
|
|
|
outputs = generation_pipeline( |
|
inputs, |
|
max_new_tokens=512, |
|
eos_token_id=terminator_ids, |
|
) |
|
|
|
print(outputs[0]["generated_text"][len(inputs):]) |
|
``` |
|
|
|
</details> |
|
|
|
## <a name="roles"></a>Prompt structure and expected content |
|
|
|
For the most correct operation of the model, it is assumed that `apply_chat_template` will be used. |
|
It is necessary to transmit the message history in a certain format. |
|
```python |
|
history_messages = [ |
|
{"role": "...", "content": "..."}, |
|
... |
|
] |
|
``` |
|
|
|
The following roles are available for use: |
|
|
|
* `system` - an optional role, its content is always placed at the very beginning and before listing the functions available to the model (tools). |
|
You can always use the standard option that was used during the training: ***"You are a helpful assistant with access to the following functions. Use them if required - "*** |
|
* `user` - the user's request is transmitted through this role. |
|
* `function-call` - The body of the function call is passed through this role. |
|
Although the model is trained to generate a function call in the form of ***"Function call: {...}\<end_of_turn\>"***, you should still pass only the body ***"{...}"*** |
|
to the *"content"* field, since using `apply_chat_template`, the postscript in the instructions is added automatically. |
|
* `function-response` - in this role, we must pass the response of our function in the *"content"* field as a dictionary ***'{"name_returnable_value": value}'***. |
|
* `model` - the content under this role is considered to be the generated text of the model. |
|
|
|
### Chat history with *Function Calling* |
|
|
|
``` |
|
[ |
|
{"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "}, |
|
{"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"}, |
|
{"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'}, |
|
{"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'}, |
|
] |
|
``` |
|
|
|
It looks like: |
|
``` |
|
<bos><start_of_turn>user |
|
You are a helpful assistant with access to the following functions. Use them if required - { |
|
"name": "get_weather", |
|
"description": "A function that returns the weather in a given city.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the weather for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
}, |
|
{ |
|
"name": "get_sunrise_sunset_times", |
|
"description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city to get the sunrise and sunset times for." |
|
} |
|
}, |
|
"required": [ |
|
"city" |
|
] |
|
} |
|
} |
|
|
|
Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn> |
|
<start_of_turn>model |
|
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn> |
|
<start_of_turn>user |
|
Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn> |
|
``` |
|
|
|
|
|
### Chat history with a standard user-model template |
|
|
|
``` |
|
[ |
|
{"role": "system", "content": "You are a helpful assistant"}, |
|
{"role": "user", "content": "Tell me about California"}, |
|
] |
|
``` |
|
|
|
It looks like: |
|
``` |
|
<bos><start_of_turn>user |
|
You are a helpful assistant |
|
|
|
Tell me about California<end_of_turn> |
|
``` |
|
|
|
## <a name="eval"></a>Evaluation |
|
|
|
During the learning process, the validation error was approximated to the following values: |
|
|
|
| **Model** | **Generation Language** | **Approximately Validation Loss** | |
|
| :-----: | :-----: | :-----: | |
|
| [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF) | EN | 0.47 | |
|
| [DiTy/gemma-2-9b-it-russian-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-russian-function-calling-GGUF) | RU | 0.57 | |
|
| [DiTy/gemma-2-9b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF) | EN | 0.5 | |
|
| [**DiTy/gemma-2-2b-it-function-calling**](https://huggingface.co/DiTy/gemma-2-2b-it-function-calling) | **EN** | **0.66** | |
|
|
|
## Citation |
|
|
|
```none |
|
@article{gemma_2024, |
|
title={Gemma}, |
|
url={https://www.kaggle.com/m/3301}, |
|
DOI={10.34740/KAGGLE/M/3301}, |
|
publisher={Kaggle}, |
|
author={Gemma Team}, |
|
year={2024} |
|
} |
|
``` |