File size: 5,884 Bytes
40927c8
 
 
 
e0110ac
 
7c210e5
 
d8fd1db
e0110ac
e9cc961
c4eade6
e9cc961
 
 
 
a033654
d8fd1db
002a533
 
e0110ac
0aa06b3
215ecb7
d8ed2fe
 
 
40927c8
54384f9
 
d8fd1db
002a533
 
54384f9
0aa06b3
215ecb7
693d098
 
 
 
54384f9
 
4347362
d8fd1db
4347362
 
 
 
 
9a453cf
4347362
 
d8fd1db
215ecb7
 
 
 
 
5d10c99
ec897e6
 
d8fd1db
ec897e6
d8fd1db
ec897e6
568e672
 
 
 
 
 
ec897e6
568e672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec897e6
 
568e672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: apache-2.0
---

# Model Card for MediaTek Research Breeze-7B-FC-v1_0



## 🏆 Performance

| Models                                                                                     | #Parameters | Organization | License    | 🧰 Function Calling? | 💬 Instrustion Following? |
|--------------------------------------------------------------------------------------------|-------------|------------|------------|-------------------|----------|
| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)| 7B          | MediaTek Research | Apache 2.0 | ❌  | ✅       |
| [**Breeze-7B-FC-v1_0**](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0)        | 7B          | MediaTek Research | Apache 2.0 | ✅ | ✅      |
| [Gorilla-OpenFunctions-v2](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0)     | 7B          | Gorilla LLM       | Apache 2.0 | ✅ | ❌       |
| [GPT-3.5-Turbo-0125](https://openai.com)                                                   |             | OpenAI            | Proprietary| ✅ | ✅      |

**Evaluate function calling on EN benchmark**

Berkeley function-calling leaderboard

| Models                            | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple  | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple  | 
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
| **Breeze-7B-FC-v1_0 (FC)**        | 86.01 |  74.58 | 90.00 | 93.00 | 82.00 | 83.00 | 98.00 | 92.00 | 88.00 | 75.00 |
| Gorilla-OpenFunctions-v2 (FC)     | 85.95 |  60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
| GPT-3.5-Turbo-0125 (FC)           | 72.77 |  4.58  | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |

![](misc/radar_chart_en.png)

**Evaluate function calling on ZHTW benchmark**

function-calling-leaderboard-for-zhtw

| Models                            | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple  | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple  | 
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
| **Breeze-7B-FC-v1_0 (FC)**        | 77.70 |  71.67 | 82.00 |	86.50 |	76.00 |	65.50 |	87.00 |	88.00 |	80.00 |	57.50 |
| Gorilla-OpenFunctions-v2 (FC)     | 75.68 |  53.75 | 84.75 |	86.50 |	72.50 |	68.00 |	92.00 |	92.00 |	62.00 |	72.50 |
| GPT-3.5-Turbo-0125 (FC)           | 66.15 |  7.50  | 83.75 |	83.50 |	73.00 |	65.50 |	88.00 |	84.00 |	72.00 |	40.00 |

![](misc/radar_chart_zhtw.png)


 **Evaluate instrustion following on EN benchmark**

MT-Bench

| | Win | Tie | Lose |
|---|---|---|---|
| **Breeze-7B-FC-v1_0** *v.s.* Breeze-7B-Instruct-v1_0 | 25 (15.6%) | 72 (45.0%) | 63 (39.4%) |


**Evaluate instrustion following on ZHTW benchmark**

MT-Bench-TC

| | Win | Tie | Lose |
|---|---|---|---|
| **Breeze-7B-FC-v1_0** *v.s.* Breeze-7B-Instruct-v1_0 | 36 (22.5%) | 81 (50.6%) | 43 (26.9%) |


## 👩‍💻 How to use

**Dependiency**

Install `mtkresearch` package

```
git clone https://github.com/mtkresearch/mtkresearch.git
cd mtkresearch
pip install -e .
```

**Hosting by VLLM**

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model='MediaTek-Research/Breeze-7B-FC-v1_0',
    tensor_parallel_size=num_gpu, # number of gpus
    gpu_memory_utilization=0.7
)

instance_end_token_id = llm.get_tokenizer().convert_token_to_ids('<|im_end|>')
params = SamplingParams(
    temperature=0.01,
    top_p=0.01,
    max_tokens=4096,
    repetition_penalty=1.1,
    stop_token_ids=[instance_end_token_id]
)

def _inference(prompt, llm, params):
    return llm.generate(prompt, params)[0].outputs[0].text

```

**Instruction following**

```python
from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

prompt_engine = MRPromptV2()

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": "請問什麼是深度學習?"},
]

prompt = prompt_engine.get_prompt(conversations)


output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) # 
```

**Function Calling**

```python
from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

functions = [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
]

prompt_engine = MRPromptV2()

# stage 1: query
conversations = [
    {"role": "user", "content": "台北目前溫度是攝氏幾度?"},
]

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) #

# stage 2: execute called functions

# stage 3: put executed results

```