Update README.md

64674b4 verified about 2 months ago

7.42 kB

	---
	license: apache-2.0
	---

	# Model Card for MediaTek Research Breeze-7B-FC-v1_0



	## 🏆 Performance

	\| Models \| #Parameters \| Organization \| License \| 🧰 Function Calling? \| 💬 Instrustion Following? \|
	\|--------------------------------------------------------------------------------------------\|-------------\|------------\|------------\|-------------------\|----------\|
	\| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)\| 7B \| MediaTek Research \| Apache 2.0 \| ❌ \| ✅ \|
	\| [Breeze-7B-FC-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0) \| 7B \| MediaTek Research \| Apache 2.0 \| ✅ \| ✅ \|
	\| [Gorilla-OpenFunctions-v2](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0) \| 7B \| Gorilla LLM \| Apache 2.0 \| ✅ \| ❌ \|
	\| [GPT-3.5-Turbo-0125](https://openai.com) \| \| OpenAI \| Proprietary\| ✅ \| ✅ \|

	Evaluate function calling on EN benchmark

	Berkeley function-calling leaderboard

	\| Models \| ↑ Overall \| Irrelevance<br/>Detection \| AST/<br/>Simple \| AST/<br/>Multiple \| AST/<br/>Parallel \| AST/<br/>Parallel-Multiple \| Exec/<br/>Simple \| Exec/<br/>Multiple \| Exec/<br/>Parallel \| Exec/<br/>Parallel-Multiple \|
	\|-----------------------------------\|----------\|---------------------\|------------\|--------------\|--------------\|------------------------\|--------------\|---------------------\|---------------------\|-------------------------------\|
	\| Breeze-7B-FC-v1_0 (FC) \| 86.89 \| 76.25 \| 90.00 \| 93.00 \| 84.00 \| 84.00 \| 100.00 \| 92.00 \| 88.00 \| 77.50 \|
	\| Gorilla-OpenFunctions-v2 (FC) \| 85.95 \| 60.00 \| 94.25 \| 95.50 \| 86.50 \| 86.00 \| 97.00 \| 96.00 \| 80.00 \| 75.00 \|
	\| GPT-3.5-Turbo-0125 (FC) \| 72.77 \| 4.58 \| 87.75 \| 90.50 \| 88.50 \| 82.50 \| 91.00 \| 82.00 \| 78.00 \| 52.50 \|



	![](misc/radar_chart_en.png)

	Evaluate function calling on ZHTW benchmark

	function-calling-leaderboard-for-zhtw

	\| Models \| ↑ Overall \| Irrelevance<br/>Detection \| AST/<br/>Simple \| AST/<br/>Multiple \| AST/<br/>Parallel \| AST/<br/>Parallel-Multiple \| Exec/<br/>Simple \| Exec/<br/>Multiple \| Exec/<br/>Parallel \| Exec/<br/>Parallel-Multiple \|
	\|-----------------------------------\|----------\|---------------------\|------------\|--------------\|--------------\|------------------------\|--------------\|---------------------\|---------------------\|-------------------------------\|
	\| Breeze-7B-FC-v1_0 (FC) \| 78.18 \| 72.50 \| 82.00 \| 86.00 \| 76.50\|67.00\|88.00\|88.00\|80.00\|60.00\|
	\| Gorilla-OpenFunctions-v2 (FC) \| 75.68 \| 53.75 \| 84.75 \| 86.50 \| 72.50 \| 68.00 \| 92.00 \| 92.00 \| 62.00 \| 72.50 \|
	\| GPT-3.5-Turbo-0125 (FC) \| 66.15 \| 7.50 \| 83.75 \| 83.50 \| 73.00 \| 65.50 \| 88.00 \| 84.00 \| 72.00 \| 40.00 \|



	![](misc/radar_chart_zhtw.png)


	Evaluate instrustion following on EN benchmark

	MT-Bench

	\| \| Win \| Tie \| Lose \|
	\|---\|---\|---\|---\|
	\| Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0 \| 27 (16.9%) \| 63 (39.4%) \| 70 (43.8%) \|


	Evaluate instrustion following on ZHTW benchmark

	MT-Bench-TC

	\| \| Win \| Tie \| Lose \|
	\|---\|---\|---\|---\|
	\| Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0 \| 40 (25.0%) \| 69 (43.1%) \| 51 (31.9%) \|


	## 👩‍💻 How to use

	Dependiency

	Install `mtkresearch` package

	```
	git clone https://github.com/mtkresearch/mtkresearch.git
	cd mtkresearch
	pip install -e .
	```

	Hosting by VLLM

	```python
	from vllm import LLM, SamplingParams

	llm = LLM(
	model='MediaTek-Research/Breeze-7B-FC-v1_0',
	tensor_parallel_size=num_gpu, # number of gpus
	gpu_memory_utilization=0.7
	)

	instance_end_token_id = llm.get_tokenizer().convert_token_to_ids('<\|im_end\|>')
	params = SamplingParams(
	temperature=0.01,
	top_p=0.01,
	max_tokens=4096,
	repetition_penalty=1.1,
	stop_token_ids=[instance_end_token_id]
	)

	def _inference(prompt, llm, params):
	return llm.generate(prompt, params)[0].outputs[0].text

	```

	Instruction following

	```python
	from mtkresearch.llm.prompt import MRPromptV2

	sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

	prompt_engine = MRPromptV2()

	conversations = [
	{"role": "system", "content": sys_prompt},
	{"role": "user", "content": "請問什麼是深度學習？"},
	]

	prompt = prompt_engine.get_prompt(conversations)


	output_str = _inference(prompt, llm, params)
	result = prompt_engine.parse_generated_str(output_str)

	print(result)
	# {'role': 'assistant',
	# 'content': '深度學習（Deep Learning）是一種機器學習方法，它模仿人類大腦的神經網路結構來處理複雜的數據和任務。在深度學習中，模型由多層人工神經元組成，每個神經元之間有權重連接，並通過非線性轉換進行計算。這些層與層之間的相互作用使模型能夠學習複雜的函數關係或模式，從而解決各種問題，如圖像識別、自然語言理解、語音辨識等。深度學習通常需要大量的數據和強大的計算能力，因此經常使用圖形處理器（GPU）或特殊的加速器來執行。'}
	```

	Function Calling

	```python
	import json

	from mtkresearch.llm.prompt import MRPromptV2

	functions = [
	{
	"name": "get_current_weather",
	"description": "Get the current weather in a given location",
	"parameters": {
	"type": "object",
	"properties": {
	"location": {
	"type": "string",
	"description": "The city and state, e.g. San Francisco, CA"
	},
	"unit": {
	"type": "string",
	"enum": ["celsius", "fahrenheit"]
	}
	},
	"required": ["location"]
	}
	}
	]

	def faked_get_current_weather(location, unit=None):
	return {'temperature': 30}

	mapping = {
	'get_current_weather': faked_get_current_weather
	}

	prompt_engine = MRPromptV2()

	# stage 1: query
	conversations = [
	{"role": "user", "content": "台北目前溫度是攝氏幾度？"},
	]

	prompt = prompt_engine.get_prompt(conversations, functions=functions)

	output_str = _inference(prompt, llm, params)
	result = prompt_engine.parse_generated_str(output_str)

	print(result)
	# {'role': 'assistant',
	# 'tool_calls': [
	# {'id': 'call_U9bYCBRAbF639uUqfwehwSbw', 'type': 'function',
	# 'function': {'name': 'get_current_weather', 'arguments': '{"location": "台北, 台灣", "unit": "攝氏"}'}}]}

	# stage 2: execute called functions
	conversations.append(result)

	tool_call = result['tool_calls'][0]
	func_name = tool_call['function']['name']
	func = mapping[func_name]
	arguments = json.loads(tool_call['function']['arguments'])
	called_result = func(**arguments)

	# stage 3: put executed results
	conversations.append(
	{
	'role': 'tool',
	'tool_call_id': tool_call['id'],
	'name': func_name,
	'content': json.dumps(called_result)
	}
	)

	prompt = prompt_engine.get_prompt(conversations, functions=functions)

	output_str2 = _inference(prompt, llm, params)
	result2 = prompt_engine.parse_generated_str(output_str2)
	print(result2)
	# {'role': 'assistant', 'content': '台北目前的溫度是攝氏30度。'}
	```