chienweichang
commited on
Commit
•
e699473
1
Parent(s):
0972cbb
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,345 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
base_model: yentinglin/Llama-3-Taiwan-70B-Instruct
|
7 |
+
tags:
|
8 |
+
- zhtw
|
9 |
+
license: llama3
|
10 |
+
---
|
11 |
+
|
12 |
+
## Description
|
13 |
+
|
14 |
+
This repo contains GGUF format model files for [yentinglin/Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct).
|
15 |
+
|
16 |
+
|
17 |
+
## Provided files
|
18 |
+
| Name | Quant method | Bits | Size |
|
19 |
+
| ---- | ---- | ---- | ---- |
|
20 |
+
| [llama-3-taiwan-70b-instruct-q4_k_s.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q4_k_s.gguf) | Q4_K_S | 4 | 40.3 GB|
|
21 |
+
| [llama-3-taiwan-70b-instruct-q4_k_m.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q4_k_m.gguf) | Q4_K_M | 4 | 42.5 GB|
|
22 |
+
| [llama-3-taiwan-70b-instruct-q5_k_s.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q5_k_s.gguf) | Q5_K_S | 5 | 48.7 GB|
|
23 |
+
| [llama-3-taiwan-70b-instruct-q5_k_m.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q5_k_m.gguf) | Q5_K_M | 5 | 49.9 GB|
|
24 |
+
| [llama-3-taiwan-70b-instruct-q6_k.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q6_k.gguf) | Q6_K | 6 | 59.89 GB|
|
25 |
+
| [llama-3-taiwan-70b-instruct-q8_0.gguf](https://huggingface.co/chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF/blob/main/llama-3-taiwan-70b-instruct-q8_0.gguf) | Q8_0 | 8 | 75 GB|
|
26 |
+
|
27 |
+
## Original model card
|
28 |
+
|
29 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/vlfv5sHbt4hBxb3YwULlU.png" alt="Taiwan LLM Logo" width="600" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
30 |
+
|
31 |
+
# 🚀 [Demo Site](https://twllm.com/)
|
32 |
+
|
33 |
+
Try out Llama-3-Taiwan interactively at [twllm.com](https://twllm.com/)
|
34 |
+
|
35 |
+
# ⚔️ [Chatbot Arena](https://arena.twllm.com/)
|
36 |
+
|
37 |
+
Participate in the exciting [Chatbot Arena](https://arena.twllm.com/) and compete against other chatbots!
|
38 |
+
|
39 |
+
🚀 We're excited to introduce Llama-3-Taiwan-70B! Llama-3-Taiwan-70B is a 70B parameter model finetuned on a large corpus of Traditional Mandarin and English data using the Llama-3 architecture. It demonstrates state-of-the-art performance on various Traditional Mandarin NLP benchmarks.
|
40 |
+
|
41 |
+
The model was trained with [NVIDIA NeMo™ Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) using the NVIDIA Taipei-1 built with [NVIDIA DGX H100](https://www.nvidia.com/en-us/data-center/dgx-h100/) systems.
|
42 |
+
|
43 |
+
The compute and data for training Llama-3-Taiwan-70B was generously sponsored by [Chang Gung Memorial Hospital](https://www.cgmh.org.tw/eng), [Chang Chun Group](https://www.ccp.com.tw/ccpweb.nsf/homepage?openagent), [Legalsign.ai](https://legalsign.ai/), [NVIDIA](https://www.nvidia.com/zh-tw/), [Pegatron](https://www.pegatroncorp.com/), [TechOrange](https://buzzorange.com/techorange/), and [Unimicron](https://www.unimicron.com/) (in alphabetical order).
|
44 |
+
|
45 |
+
We would like to acknowledge the [contributions](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1#contributions) of our data provider, team members and advisors in the development of this model, including [shasha77](https://www.youtube.com/@shasha77) for high-quality YouTube scripts and study materials, [Taiwan AI Labs](https://ailabs.tw/) for providing local media content, [Ubitus K.K.](https://ubitus.net/zh/) for offering gaming content, Professor Yun-Nung (Vivian) Chen for her guidance and advisement, Wei-Lin Chen for leading our pretraining data pipeline, Tzu-Han Lin for synthetic data generation, Chang-Sheng Kao for enhancing our synthetic data quality, and Kang-Chieh Chen for cleaning instruction-following data.
|
46 |
+
|
47 |
+
|
48 |
+
# Model Summary
|
49 |
+
|
50 |
+
Llama-3-Taiwan-70B is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue. Key features include:
|
51 |
+
|
52 |
+
- 70B parameters
|
53 |
+
- Languages: Traditional Mandarin (zh-tw), English (en)
|
54 |
+
- Finetuned on High-quality Traditional Mandarin and English corpus covering general knowledge as well as industrial knowledge in legal, manufacturing, medical, and electronics domains
|
55 |
+
- 8K context length
|
56 |
+
- Open model released under the Llama-3 license
|
57 |
+
|
58 |
+
# Training Details
|
59 |
+
|
60 |
+
- Training Framework: [NVIDIA NeMo](https://www.nvidia.com/zh-tw/ai-data-science/products/nemo/), [NVIDIA NeMo Megatron](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/megatron.html)
|
61 |
+
- Inference Framework: [NVIDIA TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)
|
62 |
+
- Base model: [Llama-3 70B](https://llama.meta.com/llama3/)
|
63 |
+
- Hardware: [NVIDIA DGX H100](https://www.nvidia.com/zh-tw/data-center/dgx-h100/) on Taipei-1
|
64 |
+
- Context length: 8K tokens ([128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-128k))
|
65 |
+
- Batch size: 2M tokens per step
|
66 |
+
|
67 |
+
# Evaluation
|
68 |
+
|
69 |
+
Checkout [Open TW LLM Leaderboard](https://huggingface.co/spaces/yentinglin/open-tw-llm-leaderboard) for full and updated list.
|
70 |
+
|
71 |
+
| Model | [TMLU](https://arxiv.org/pdf/2403.20180) | Taiwan Truthful QA | [Legal Eval](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1) | [TW MT-Bench](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) | Long context | Function Calling | [TMMLU+](https://github.com/iKala/ievals) |
|
72 |
+
|---------------------------------------------------------------------------------|--------------|---------------|--------------------|--------------|--------------|-----------------|-----------|
|
73 |
+
| | 學科知識 | 台灣在地化測試 | 台灣法律考題 | 中文多輪對答 | 長文本支援 | 函數呼叫 | |
|
74 |
+
| [**yentinglin/Llama-3-Taiwan-70B-Instruct**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) | **74.76%** | 80.95% | 68.42% | 7.54 | [128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-128k) | ✅ | 67.53% |
|
75 |
+
| [**yentinglin/Llama-3-Taiwan-70B-Instruct-DPO**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-DPO) | 74.60% | **81.75%** | **70.33%** | - | - | ✅ | - |
|
76 |
+
| [**yentinglin/Llama-3-Taiwan-70B-Instruct-128k**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) | 73.01% | 80.16% | 63.64% | - | - | ✅ | - |
|
77 |
+
| [**yentinglin/Llama-3-Taiwan-8B-Instruct**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct) | 59.50% | 61.11% | 53.11% | 7.21 | [128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-128k) | ✅ | 52.28% |
|
78 |
+
| [**yentinglin/Llama-3-Taiwan-8B-Instruct-DPO**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-DPO) | 59.88% | 59.52% | 52.63% | - | - | ✅ | - |
|
79 |
+
| [**yentinglin/Llama-3-Taiwan-8B-Instruct-128k**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-128k) | - | - | - | - | - | ✅ | - |
|
80 |
+
| [Claude-3-Opus](https://www.anthropic.com/api) | [73.59% (5-shot)](https://arxiv.org/pdf/2403.20180) | [69.84%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/opus-Taiwan-Truthful-QA) | [60.29%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/opus) | - | 200k | ✅ | - |
|
81 |
+
| [GPT4-o](https://platform.openai.com/docs/api-reference/chat/create) | [65.56% (0-shot), 69.88% (5-shot)](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o-tmlu) | [76.98%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o-Taiwan-Truthful-QA) | [53.59%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o) | - | 128k | ✅ | - |
|
82 |
+
| [GPT4-turbo](https://platform.openai.com/docs/api-reference/chat/create) | [70.42% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | - | 128k | ✅ | 60.34%^ |
|
83 |
+
| [Gemini-Pro](https://ai.google.dev/gemini-api/docs) | [61.40% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | - | 1000k | ✅ | 49.92%^ |
|
84 |
+
| [GPT-3.5-turbo-1106](https://platform.openai.com/docs/api-reference/chat/create) | [49.37% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | 7.1 | 128k | ✅ | 41.76%^ |
|
85 |
+
| [Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat) | **75.69%** | 66.67% | 49.28% | - | 32k | ✅ | 65.81% |
|
86 |
+
| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 73.59% | 71.43% | 55.02% | 6.9 | 200k | ✅ | 64.10% |
|
87 |
+
| [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 70.95% | 65.08% | 52.63% | - | 8k | ✅ | 62.75% |
|
88 |
+
| [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | 55.57% | 52.38% | 44.98% | - | 64k | ✅ | 52.16% |
|
89 |
+
| [Breexe-8x7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breexe-8x7B-Instruct-v0_1) | - | - | - | 7.2 | 8k | ❓ | 48.92% |
|
90 |
+
| [c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus) | 62.87% | 64.29% | 34.45% | - | 128k | ✅ | 49.75% |
|
91 |
+
| [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 55.81% | 46.83% | 35.89% | - | 8k | ✅ | 43.38% |
|
92 |
+
| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) | 55.57% | 52.38% | 39.23% | 6.0 | 32k | ❓ | 41.77% |
|
93 |
+
| [Llama3-TAIDE-LX-8B-Chat-Alpha1](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1) | 47.30% | 50.79% | 37.80% | - | 8k | ❓ | 39.03% |
|
94 |
+
| [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | 40.97% | 37.30% | 27.27% | - | 4k | ❓ | 33.02% |
|
95 |
+
|
96 |
+
Numbers are 0-shot by default.
|
97 |
+
|
98 |
+
[Eval implementation](https://github.com/adamlin120/lm-evaluation-harness)
|
99 |
+
|
100 |
+
^ taken the closet matching numbers from original dataset.
|
101 |
+
|
102 |
+
## Needle in a Haystack Evaluation
|
103 |
+
|
104 |
+
The "Needle in a 出師表" evaluation tests the model's ability to locate and recall important information embedded within a large body of text, using the classic Chinese text 《出師表》 by 諸葛亮.
|
105 |
+
|
106 |
+
To run the evaluation, use the [script](https://github.com/adamlin120/needle-haystack/tree/main).
|
107 |
+
|
108 |
+
|
109 |
+
# TW MT-Bench Score
|
110 |
+
|
111 |
+
- Average Score: 7.5375
|
112 |
+
- Maximum Score: 10
|
113 |
+
- Minimum Score: 1
|
114 |
+
- Median Score: 9.0
|
115 |
+
- Standard Deviation: 3.0349783771882133
|
116 |
+
- Total Number of Scores: 160
|
117 |
+
- [Model resopnse](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1/blob/main/Llama-3-Taiwan.jsonl)
|
118 |
+
- [GPT-4 Eval](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1/blob/main/gpt-4_single.jsonl)
|
119 |
+
- [Code fork from `mtkresearch/TCEval` with bug fixing](https://github.com/adamlin120/TCEval)
|
120 |
+
|
121 |
+
|
122 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/hfstnJElgB39Um4Bco11D.png)
|
123 |
+
|
124 |
+
|
125 |
+
# Use Cases
|
126 |
+
|
127 |
+
Llama-3-Taiwan-70B can be applied to a wide variety of NLP tasks in Traditional Chinese and English, including:
|
128 |
+
|
129 |
+
<details>
|
130 |
+
<summary>1. 多輪對話</summary>
|
131 |
+
|
132 |
+
<pre>
|
133 |
+
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
|
134 |
+
|
135 |
+
User: 嗨,你好!
|
136 |
+
|
137 |
+
Assistant: 你好!我今天可以如何協助你?
|
138 |
+
|
139 |
+
User: 就想聊天而已....
|
140 |
+
|
141 |
+
Assistant: 當然,我很樂意和你聊天!有什麼特別想談論的嗎?
|
142 |
+
</pre>
|
143 |
+
|
144 |
+
</details>
|
145 |
+
|
146 |
+
<details>
|
147 |
+
<summary>2. RAG(檢索增強生成)</summary>
|
148 |
+
|
149 |
+
Demo: 可以打開 `Search Web` on [twllm.com](https://twllm.com/)
|
150 |
+
|
151 |
+
|
152 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/Zcl20NsPmY2vObIaMtT8G.png)
|
153 |
+
|
154 |
+
|
155 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/xuXEHIkMgDEB7xxDd3JoL.png)
|
156 |
+
|
157 |
+
<pre>
|
158 |
+
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
|
159 |
+
|
160 |
+
User: 明天台大天氣如何?
|
161 |
+
|
162 |
+
Assistant: 根據天氣預報,明天台大天氣為陣雨。低溫 24°C。10 到 15 公里/小時 的 東 風 。降雨機率為 40%。
|
163 |
+
</pre>
|
164 |
+
|
165 |
+
</details>
|
166 |
+
|
167 |
+
<details>
|
168 |
+
<summary>3. 格式化輸出, 語言理解, 實體識別, 函數調用</summary>
|
169 |
+
|
170 |
+
If you are interested in function-calling, I strongly recommend [using constrained decoding](https://huggingface.co/docs/text-generation-inference/basic_tutorials/using_guidance) to turn on `json mode`.
|
171 |
+
|
172 |
+
Example from HW7 in `INTRODUCTION TO GENERATIVE AI 2024 SPRING` from `HUNG-YI LEE (李宏毅)`
|
173 |
+
|
174 |
+
|
175 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/dCGo92llL3_ammH_Nu2-c.png)
|
176 |
+
|
177 |
+
<pre>
|
178 |
+
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
|
179 |
+
|
180 |
+
User: 您是電影評論分析師,負責情緒分析。
|
181 |
+
分析每個被評論的句子,評估句子中每個字和標點符號的重要性,
|
182 |
+
並且將結果使用 tuple 的結構列出。
|
183 |
+
tuple 值的範圍介於 -1 (負向) 到 1 (正向) 之間,
|
184 |
+
然後,將評論分類為正向 (1) 或負向 (-1)。
|
185 |
+
這些需要被評論的句子,包含在 <review> 標籤中間。
|
186 |
+
|
187 |
+
輸出範例:
|
188 |
+
[(<單字或標點符號>, <float importance>), ...]
|
189 |
+
<int classification>
|
190 |
+
<review> 根據外媒報導,宏都拉斯在去年3月與台灣斷交,並轉往與中國建交之後,卻遲遲等不到中國的金援,談好要進口宏都拉斯的白蝦也拖延��久,甚至還將白蝦價格殺價至過往與台灣交易的半價,再加上墨西哥同時也中斷了與宏國的白蝦貿易,對當地蝦農可說是雪上加霜。 <review>
|
191 |
+
|
192 |
+
|
193 |
+
Assistant: [('根據', 0.2), ('外媒', 0.1), ('報導', -0.1), (',', 0.05), ('宏都拉斯', -0.3), ('在去年3月與台灣斷交, ', -0.5), ('並轉往與中國建交之後, ', -0.4), ('卻遲遲等不到中國的金援, ', -1),
|
194 |
+
('談好要進口宏都拉斯的白蝦也拖延許久, ', -1),
|
195 |
+
(',甚至還將白蝦價格殺價至過往與台灣交易的半價,再加上墨西哥同時也中斷了與宏國的白蝦貿易,對當地蝦農可說是雪上加霜。 ',-1)]
|
196 |
+
-1
|
197 |
+
|
198 |
+
</pre>
|
199 |
+
</details>
|
200 |
+
|
201 |
+
|
202 |
+
|
203 |
+
|
204 |
+
|
205 |
+
# Get Started
|
206 |
+
|
207 |
+
*Caveat: System message should always be set.*
|
208 |
+
|
209 |
+
## Hugging Face Transformers library
|
210 |
+
You can use Llama-3-Taiwan-70B with the Hugging Face Transformers library:
|
211 |
+
|
212 |
+
|
213 |
+
```python
|
214 |
+
import torch
|
215 |
+
from transformers import pipeline, StoppingCriteria
|
216 |
+
|
217 |
+
# Define a custom stopping criteria class
|
218 |
+
class EosListStoppingCriteria(StoppingCriteria):
|
219 |
+
def __init__(self, eos_sequence=[128256]):
|
220 |
+
self.eos_sequence = eos_sequence
|
221 |
+
|
222 |
+
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
|
223 |
+
last_ids = input_ids[:, -len(self.eos_sequence):].tolist()
|
224 |
+
return self.eos_sequence in last_ids
|
225 |
+
|
226 |
+
# Initialize the model with automatic device mapping
|
227 |
+
llm = pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1", device_map="auto")
|
228 |
+
tokenizer = llm.tokenizer
|
229 |
+
|
230 |
+
# Define a conversation example
|
231 |
+
chat = [
|
232 |
+
{"role": "system", "content": "You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project."},
|
233 |
+
{"role": "user", "content": "你好,請問你可以完成什麼任務?"},
|
234 |
+
{"role": "assistant", "content": "你好,我可以幫助您解決各種問題、提供資訊並協助完成多種任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},
|
235 |
+
{"role": "user", "content": "太棒了!"}
|
236 |
+
]
|
237 |
+
flatten_chat_for_generation = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
238 |
+
"""
|
239 |
+
<|im_start|>user
|
240 |
+
You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.<|im_end|>
|
241 |
+
<|im_start|>user
|
242 |
+
你好,請問你可以完成什麼任務?<|im_end|>
|
243 |
+
<|im_start|>assistant
|
244 |
+
你好,我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。<|im_end|>
|
245 |
+
<|im_start|>user
|
246 |
+
太棒了!<|im_end|>
|
247 |
+
<|im_start|>assistant
|
248 |
+
|
249 |
+
"""
|
250 |
+
|
251 |
+
# Generate a response using the custom stopping criteria
|
252 |
+
output = llm(flatten_chat_for_generation, return_full_text=False, max_new_tokens=128, top_p=0.9, temperature=0.7, stopping_criteria=[EosListStoppingCriteria([tokenizer.eos_token_id])])
|
253 |
+
print(output[0]['generated_text'])
|
254 |
+
"謝謝!很高興能夠為您服務。如果有任何其他需要協助的地方,請隨時與我聯繫。我會盡最大努力為您提供所需的支援。"
|
255 |
+
```
|
256 |
+
|
257 |
+
## vLLM
|
258 |
+
|
259 |
+
Start the server
|
260 |
+
```bash
|
261 |
+
export NUM_GPUS=4
|
262 |
+
export PORT=8000
|
263 |
+
|
264 |
+
docker run \
|
265 |
+
-e HF_TOKEN=$HF_TOKEN \
|
266 |
+
--gpus '"device=0,1,2,3"' \
|
267 |
+
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
268 |
+
-p "${PORT}:8000" \
|
269 |
+
--ipc=host \
|
270 |
+
vllm/vllm-openai:v0.4.0.post1 \
|
271 |
+
--model "yentinglin/Llama-3-Taiwan-70B-Instruct-rc1" \
|
272 |
+
-tp "${NUM_GPUS}"
|
273 |
+
```
|
274 |
+
|
275 |
+
Sample client code, or you can use anything OpenAI-API compatible clients
|
276 |
+
|
277 |
+
```python
|
278 |
+
# pip install "openai>=1.0.0"
|
279 |
+
from openai import OpenAI
|
280 |
+
# Set OpenAI's API key and API base to use vLLM's API server.
|
281 |
+
openai_api_key = "EMPTY"
|
282 |
+
openai_api_base = "http://localhost:8000/v1"
|
283 |
+
|
284 |
+
client = OpenAI(
|
285 |
+
api_key=openai_api_key,
|
286 |
+
base_url=openai_api_base,
|
287 |
+
)
|
288 |
+
|
289 |
+
chat_response = client.chat.completions.create(
|
290 |
+
model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1",
|
291 |
+
messages=[
|
292 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
293 |
+
{"role": "user", "content": "Tell me a joke."},
|
294 |
+
]
|
295 |
+
)
|
296 |
+
print("Chat response:", chat_response)
|
297 |
+
```
|
298 |
+
|
299 |
+
|
300 |
+
Enjoy exploring the capabilities of Llama-3-Taiwan-70B! We look forward to seeing what you create with this powerful open-source model. If you have any questions or feedback, please let us know.
|
301 |
+
|
302 |
+
# Contributions
|
303 |
+
- [**Professor Yun-Nung (Vivian) Chen**](https://www.csie.ntu.edu.tw/~yvchen/), for her guidance and advisement throughout the project.
|
304 |
+
- [**Wei-Lin Chen**](mailto:[email protected]), for leading our pretraining data pipeline.
|
305 |
+
- [**Tzu-Han Lin**](mailto:[email protected]), for synthetic data generation.
|
306 |
+
- [**Chang-Sheng Kao**](mailto:[email protected]), for enhancing our synthetic data quality.
|
307 |
+
- [**Kang-Chieh Chen**](mailto:[email protected]), for cleaning instruction-following data.
|
308 |
+
- [**Min-Yi Chen**](mailto:[email protected]) and [**Shao-Heng Hsu**](mailto:[email protected]), for collecting chemical engineering data and benchmarks.
|
309 |
+
- Chung-Yao Ma, Jonathan Guo and Kai-Chun Chang, for collecting manufacturing and electrical engineering data and benchmarks, and project progress management
|
310 |
+
|
311 |
+
# Citation
|
312 |
+
```
|
313 |
+
@article{DBLP:journals/corr/abs-2311-17487,
|
314 |
+
author = {Yen{-}Ting Lin and
|
315 |
+
Yun{-}Nung Chen},
|
316 |
+
title = {Taiwan {LLM:} Bridging the Linguistic Divide with a Culturally Aligned
|
317 |
+
Language Model},
|
318 |
+
journal = {CoRR},
|
319 |
+
volume = {abs/2311.17487},
|
320 |
+
year = {2023},
|
321 |
+
url = {https://doi.org/10.48550/arXiv.2311.17487},
|
322 |
+
doi = {10.48550/ARXIV.2311.17487},
|
323 |
+
eprinttype = {arXiv},
|
324 |
+
eprint = {2311.17487},
|
325 |
+
timestamp = {Tue, 05 Dec 2023 14:40:42 +0100},
|
326 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2311-17487.bib},
|
327 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
328 |
+
}
|
329 |
+
@article{DBLP:journals/corr/abs-2403-20180,
|
330 |
+
author = {Po{-}Heng Chen and
|
331 |
+
Sijia Cheng and
|
332 |
+
Wei{-}Lin Chen and
|
333 |
+
Yen{-}Ting Lin and
|
334 |
+
Yun{-}Nung Chen},
|
335 |
+
title = {Measuring Taiwanese Mandarin Language Understanding},
|
336 |
+
journal = {CoRR},
|
337 |
+
volume = {abs/2403.20180},
|
338 |
+
year = {2024},
|
339 |
+
url = {https://doi.org/10.48550/arXiv.2403.20180},
|
340 |
+
doi = {10.48550/ARXIV.2403.20180},
|
341 |
+
eprinttype = {arXiv},
|
342 |
+
eprint = {2403.20180},
|
343 |
+
timestamp = {Wed, 10 Apr 2024 17:37:45 +0200},
|
344 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2403-20180.bib},
|
345 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|