mrtw commited on
Commit
e90110f
1 Parent(s): ca5d50b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -105
README.md CHANGED
@@ -7,12 +7,12 @@ pipeline_tag: text-generation
7
  ---
8
 
9
  <h1 style='text-align: center '>BLOOM-zh</h1>
10
- <h2 style='text-align: center '><em>Open-access Multilingual Language Model based on BLOOM</em> </h2>
11
  <h3 style='text-align: center '>Model Card</h3>
12
 
13
  Version 1.0 / 20.Feb.2023
14
 
15
- This model is a joint collaboration between CKIP lab at Acedemia Sinica ([website](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([website](https://www.mtkresearch.com/)), and National Academy for Educational Research ([website](https://www.naer.edu.tw/)).
16
 
17
  ## Table of Contents
18
  1. [Model Details](#model-details)
@@ -26,8 +26,8 @@ This model is a joint collaboration between CKIP lab at Acedemia Sinica ([websit
26
  9. [Model Card Authors](#model-card-authors)
27
 
28
  ## Model Details
29
- BLOOM-zh is a modification from [BLOOMZ](https://huggingface.co/bigscience/bloomz).
30
- BLOOM-zh is trained extendedly on larger amounts of Traditional Chinese text data while it still maintains its pretrained English ability.
31
 
32
 
33
  ### Basics
@@ -50,7 +50,7 @@ BLOOM-zh is trained extendedly on larger amounts of Traditional Chinese text dat
50
 
51
  **Send Questions to:** [email protected]
52
 
53
- **Cite as:** MediaTek Research, MediaTek Research Open-access Multilingual Language Model based on BLOOM. International, February 2023.
54
 
55
  **Organizations of contributors:**
56
 
@@ -63,117 +63,33 @@ BLOOM-zh is trained extendedly on larger amounts of Traditional Chinese text dat
63
  ### Technical Specifications
64
  *This section provides information for people who work on model development.*
65
 
66
- <details>
67
- <summary>Click to expand</summary><br/>
68
-
69
- **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
70
-
71
- * Decoder-only architecture
72
-
73
- * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
74
-
75
- * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
76
-
77
- * 1,065,314,304 parameters:
78
-
79
- * 385,351,680 embedding parameters
80
-
81
- * 24 layers, 16 attention heads
82
-
83
- * Hidden layers are 1536-dimensional
84
-
85
- * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
86
-
87
- **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
88
-
89
- **Compute infrastructure:**
90
-
91
- * Hardware: 2 A6000 48GB GPUs (1 node):
92
-
93
-
94
- * Software:
95
-
96
- * Bigscience Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
97
-
98
- * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
99
-
100
- * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
101
-
102
- * PyTorch (pytorch-1.12 w/ CUDA-11.3; see [Github link](https://github.com/pytorch/pytorch))
103
-
104
- * apex ([Github link](https://github.com/NVIDIA/apex))
105
-
106
-
107
- #### **Training**
108
-
109
- Details are provided in the [paper](https://arxiv.org/).
110
-
111
- - Dates: Feb. 2023
112
-
113
- #### **Tokenization**
114
-
115
- The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
116
-
117
- - A byte-level Byte Pair Encoding (BPE) algorithm
118
-
119
- - A simple pre-tokenization rule, no normalization
120
-
121
- - A vocabulary size of 250,680
122
-
123
- It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
124
-
125
- </details>
126
 
127
 
128
  ### Environmental Impact
129
 
130
- <details>
131
- <summary>Click to expand</summary><br/>
132
-
133
- Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#model-details).
134
 
135
 
136
- </details>
137
- <p>&nbsp;</p>
138
-
139
  ## Uses
140
 
141
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
142
  It provides information for anyone considering using the model or who is affected by the model.*
143
 
144
-
145
- <details>
146
- <summary>Click to expand</summary><br/>
147
-
148
- Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#uses).
149
 
150
  </details>
151
  <p>&nbsp;</p>
152
 
153
  ## Training Data
154
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
155
-
156
-
157
- <details>
158
- <summary>Click to expand</summary><br/>
159
 
160
- We trained the 1B1 parameter model on a total of 6 Billion tokens mainly crawled from the internet and provided from National Academy for Educational Research. 75% of the training data is Traditional Chinese, 25% is English.
161
- Details are provided in the [paper](https://arxiv.org/).
162
-
163
- </details>
164
- </details>
165
- <p>&nbsp;</p>
166
 
167
  ## Risks and Limitations
168
  *This section identifies foreseeable harms and misunderstandings.*
169
-
170
- <details>
171
- <summary>Click to expand</summary><br/>
172
 
173
- Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#risks-and-limitations).
174
-
175
- </details>
176
- <p>&nbsp;</p>
177
 
178
  ### Factors
179
  *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
@@ -182,25 +98,16 @@ Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#risks-a
182
 
183
  - The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
184
 
185
- </details>
186
- <p>&nbsp;</p>
187
 
188
  ## Recommendations
189
 
190
  *This section provides information on warnings and potential mitigations.*
191
 
192
-
193
- <details>
194
- <summary>Click to expand</summary><br/>
195
-
196
- Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#recommendations).
197
-
198
- </details>
199
- <p>&nbsp;</p>
200
 
201
 
202
  ## Model Card Authors
203
  *Ordered roughly chronologically and by amount of time spent.*
204
 
205
- Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yin-Hsiang Liao, Chin-Tung Lin, Jezabel Rodriguez Garcia, Federica Freddi, Da-Shan Shiu, Wei-Yun Ma
206
  <!-- # Bloom_eval -->
 
7
  ---
8
 
9
  <h1 style='text-align: center '>BLOOM-zh</h1>
10
+ <h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
11
  <h3 style='text-align: center '>Model Card</h3>
12
 
13
  Version 1.0 / 20.Feb.2023
14
 
15
+ This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
16
 
17
  ## Table of Contents
18
  1. [Model Details](#model-details)
 
26
  9. [Model Card Authors](#model-card-authors)
27
 
28
  ## Model Details
29
+ BLOOM-zh is a language model with enhanced Traditional Chinese capability. It is derived from [BLOOMZ](https://huggingface.co/bigscience/bloomz).
30
+ BLOOM-zh is trained extendedly on large amount of Traditional Chinese text data.
31
 
32
 
33
  ### Basics
 
50
 
51
  **Send Questions to:** [email protected]
52
 
53
+ **Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
54
 
55
  **Organizations of contributors:**
56
 
 
63
  ### Technical Specifications
64
  *This section provides information for people who work on model development.*
65
 
66
+ For technical specifications, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#model-details).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
 
69
  ### Environmental Impact
70
 
71
+ For environmental impact, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#model-details).
 
 
 
72
 
73
 
 
 
 
74
  ## Uses
75
 
76
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
77
  It provides information for anyone considering using the model or who is affected by the model.*
78
 
79
+ For the uses of the model, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#uses).
 
 
 
 
80
 
81
  </details>
82
  <p>&nbsp;</p>
83
 
84
  ## Training Data
85
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
 
 
 
 
86
 
87
+ We trained the 1B1 parameter model on a total of 6 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/).
 
 
 
 
 
88
 
89
  ## Risks and Limitations
90
  *This section identifies foreseeable harms and misunderstandings.*
 
 
 
91
 
92
+ For risks and limitations, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#risks-and-limitations).
 
 
 
93
 
94
  ### Factors
95
  *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
 
98
 
99
  - The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
100
 
 
 
101
 
102
  ## Recommendations
103
 
104
  *This section provides information on warnings and potential mitigations.*
105
 
106
+ For recommendations, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#recommendations).
 
 
 
 
 
 
 
107
 
108
 
109
  ## Model Card Authors
110
  *Ordered roughly chronologically and by amount of time spent.*
111
 
112
+ Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
113
  <!-- # Bloom_eval -->