YC-Chen commited on
Commit
215ecb7
β€’
1 Parent(s): bc5afb6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -8,7 +8,7 @@ license: apache-2.0
8
 
9
  ## Performance
10
 
11
- | Models | #Parameters | Organization | License | Function Calling? | Chatbot? |
12
  |--------------------------------------------------------------------------------------------|-------------|------------|------------|-------------------|----------|
13
  | [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)| 7B | MediaTek Research | Apache 2.0 | No | Yes |
14
  | [**Breeze-7B-FC-v1_0**](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0) | 7B | MediaTek Research | Apache 2.0 | Yes | Yes |
@@ -19,8 +19,8 @@ license: apache-2.0
19
 
20
  Berkeley function-calling leaderboard
21
 
22
- | Models | ↑ Overall | Relevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
- |-------------------------------------------------------------------------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
24
  | **Breeze-7B-FC-v1_0 (FC)** | 86.01 | 74.58 | 90.00 | 93.00 | 82.00 | 83.00 | 98.00 | 92.00 | 88.00 | 75.00 |
25
  | Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
26
  | GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
@@ -31,11 +31,18 @@ Berkeley function-calling leaderboard
31
 
32
  function-calling-leaderboard-for-zhtw
33
 
34
- | Models | ↑ Overall | Relevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
35
- |-------------------------------------------------------------------------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
36
  | **Breeze-7B-FC-v1_0 (FC)** | 77.70 | 71.67 | 82.00 | 86.50 | 76.00 | 65.50 | 87.00 | 88.00 | 80.00 | 57.50 |
37
  | Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
38
  | GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
39
 
40
  ![](misc/radar_chart_zhtw.png)
41
 
 
 
 
 
 
 
 
 
8
 
9
  ## Performance
10
 
11
+ | Models | #Parameters | Organization | License | Function Calling? | Instrustion Following? |
12
  |--------------------------------------------------------------------------------------------|-------------|------------|------------|-------------------|----------|
13
  | [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)| 7B | MediaTek Research | Apache 2.0 | No | Yes |
14
  | [**Breeze-7B-FC-v1_0**](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0) | 7B | MediaTek Research | Apache 2.0 | Yes | Yes |
 
19
 
20
  Berkeley function-calling leaderboard
21
 
22
+ | Models | ↑ Overall | Relevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
+ |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
24
  | **Breeze-7B-FC-v1_0 (FC)** | 86.01 | 74.58 | 90.00 | 93.00 | 82.00 | 83.00 | 98.00 | 92.00 | 88.00 | 75.00 |
25
  | Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
26
  | GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
 
31
 
32
  function-calling-leaderboard-for-zhtw
33
 
34
+ | Models | ↑ Overall | Relevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
35
+ |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
36
  | **Breeze-7B-FC-v1_0 (FC)** | 77.70 | 71.67 | 82.00 | 86.50 | 76.00 | 65.50 | 87.00 | 88.00 | 80.00 | 57.50 |
37
  | Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
38
  | GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
39
 
40
  ![](misc/radar_chart_zhtw.png)
41
 
42
+ πŸ“Œ **Evaluate instrustion following on ZHTW benchmark**
43
+
44
+ MT-Bench-TC
45
+
46
+ | | Win | Tie | Lose |
47
+ |---|---|---|---|
48
+ | **Breeze-7B-FC-v1_0** v.s. Breeze-7B-Instruct-v1_0 | 42 (26.3%) | 71 (44.4%) | 47 (29.4%) |