Training languages in the model card

#9
by fyvo - opened
Files changed (6) hide show
  1. .gitattributes +0 -1
  2. README.md +218 -185
  3. config.json +3 -5
  4. flax_model.msgpack +0 -3
  5. model.safetensors +0 -3
  6. tokenizer_config.json +1 -1
.gitattributes CHANGED
@@ -26,4 +26,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
29
- model.safetensors filter=lfs diff=lfs merge=lfs -text
 
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
README.md CHANGED
@@ -52,60 +52,176 @@ language:
52
  pipeline_tag: text-generation
53
  ---
54
 
55
- <h1 style='text-align: center '>BLOOM LM</h1>
56
- <h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model</em> </h2>
57
- <h3 style='text-align: center '>Model Card</h3>
58
- <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
59
 
60
 
61
  Version 1.0 / 26.May.2022
62
 
63
-
64
- # Model Card for Bloom-1b7
65
-
66
- <!-- Provide a quick summary of what the model is/does. -->
67
-
68
  ## Table of Contents
69
  1. [Model Details](#model-details)
70
  2. [Uses](#uses)
71
- 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
72
- 4. [Recommendations](#recommendations)
73
- 5. [Training Data](#training-data)
74
- 6. [Evaluation](#evaluation)
75
- 7. [Environmental Impact](#environmental-impact)
76
- 8. [Technical Specifications](#techincal-specifications)
77
- 9. [Citation](#citation)
78
- 10. [Glossary and Calculations](#glossary-and-calculations)
79
- 11. [More Information](#more-information)
80
- 12. [Model Card Authors](#model-card-authors)
81
- 13. [Model Card Contact](#model-card-contact)
82
-
83
- ## Model Details
84
-
85
- ### Model Description
86
  *This section provides information for anyone who wants to know about the model.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
- - **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
 
 
89
 
90
- * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
 
 
 
 
91
 
92
- - **Model Type:** Transformer-based Language Model
93
- - **Version:** 1.0.0
94
- - **Languages:** Multiple; see [training data](#training-data)
95
- - **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
96
- - **Release Date Estimate:** Monday, 11.July.2022
97
- - **Funded by:**
 
 
 
 
 
 
 
 
 
 
 
98
 
99
- * The French government.
 
100
 
101
- * Hugging Face ([website](https://huggingface.co)).
102
 
103
- * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  ## Uses
106
 
107
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
108
  It provides information for anyone considering using the model or who is affected by the model.*
 
 
 
 
109
 
110
  ### Intended Use
111
 
@@ -191,54 +307,16 @@ Intentionally using the model for harm, violating [human rights](#human-rights),
191
  - People and groups exposed to outputs of, or decisions based on, the LLM
192
 
193
  - People and groups whose original work is included in the LLM
194
-
195
-
196
-
197
- ## Bias, Risks, and Limitations
198
- *This section identifies foreseeable harms and misunderstandings.*
199
 
200
- Model may:
201
-
202
- - Overrepresent some viewpoints and underrepresent others
203
-
204
- - Contain stereotypes
205
-
206
- - Contain [personal information](#personal-data-and-information)
207
-
208
- - Generate:
209
-
210
- - Hateful, abusive, or violent language
211
-
212
- - Discriminatory or prejudicial language
213
-
214
- - Content that may not be appropriate for all settings, including sexual content
215
-
216
- - Make errors, including producing incorrect information as if it were factual
217
-
218
- - Generate irrelevant or repetitive outputs
219
-
220
-
221
- ### Recommendations
222
-
223
-
224
- *This section provides information on warnings and potential mitigations.*
225
-
226
- - Indirect users should be made aware when the content they're working with is created by the LLM.
227
-
228
- - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
229
-
230
- - Models pretrained with the LLM should include an updated Model Card.
231
-
232
- - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
233
-
234
-
235
-
236
 
237
  ## Training Data
238
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
239
 
240
 
241
-
 
242
 
243
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
244
 
@@ -258,8 +336,9 @@ The pie chart shows the distribution of languages in training data.
258
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
259
 
260
 
261
- **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.**
262
-
 
263
 
264
  | Niger Congo | Percentage | | Indic | Percentage |
265
  |----------------|------------ |------ |-----------|------------|
@@ -285,8 +364,9 @@ The pie chart shows the distribution of languages in training data.
285
  | Swahili | 0.02 |
286
  </details>
287
 
288
- **The following table shows the distribution of programming languages.**
289
-
 
290
 
291
  | Extension | Language | Number of files |
292
  |----------------|------------|-----------------|
@@ -316,10 +396,43 @@ The pie chart shows the distribution of languages in training data.
316
  | php5 | PHP | 166 |
317
  | php4 | PHP | 29 |
318
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319
 
320
  ## Evaluation
321
  *This section describes the evaluation protocols and provides the results.*
322
 
 
 
323
 
324
  ### Metrics
325
  *This section describes the different ways performance is calculated and why.*
@@ -357,119 +470,36 @@ As of 25.May.2022, 15:00 PST:
357
 
358
  (More evaluation scores forthcoming at the end of model training.)
359
 
360
- - [BLOOM Book](https://huggingface.co/spaces/bigscience/bloom-book): Read generations from BLOOM based on prompts provided by the community
361
-
362
-
363
-
364
- ## Environmental Impact
365
-
366
- The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
367
-
368
- **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
369
-
370
- **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
371
-
372
-
373
-
374
- ## Technical Specifications
375
- *This section provides information for people who work on model development.*
376
-
377
-
378
- Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
379
-
380
- **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
381
-
382
- * Decoder-only architecture
383
-
384
- * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
385
-
386
- * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
387
-
388
- * 1,722,408,960 parameters:
389
-
390
- * 513,802,240 embedding parameters
391
-
392
- * 24 layers, 16 attention heads
393
-
394
- * Hidden layers are 2048-dimensional
395
-
396
- * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
397
-
398
- **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
399
-
400
- **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
401
-
402
- * Hardware: 64 V100 16/32GB GPUs (16 nodes):
403
-
404
- * 4 GPUs per node
405
-
406
- * 40 CPUs per task
407
-
408
- * 1 task per node
409
-
410
- * CPU: AMD
411
-
412
- * CPU memory: 160GB per node
413
-
414
- * GPU memory: 64GB or 128GB (depending on node availability during training) per node
415
-
416
- * Inter-node connect: Omni-Path Architecture (OPA)
417
-
418
- * NCCL-communications network: a fully dedicated subnet
419
-
420
- * Disc IO network: shared network with other types of nodes
421
-
422
- * Software:
423
-
424
- * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
425
-
426
- * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
427
-
428
- * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
429
-
430
- * apex ([Github link](https://github.com/NVIDIA/apex))
431
-
432
- ### **Training**
433
-
434
- - Checkpoint size:
435
-
436
- - Fp16 weights: 2.6GB (# params * 2)
437
-
438
- - Full checkpoint with optimizer states: --
439
 
440
- - Training throughput: --
441
 
442
- - Number of epochs: 1
443
 
444
- - Dates:
445
-
446
- - Start: 11th March, 2022 11:42am PST
447
 
448
- - End: 20 May, 2022
 
449
 
450
- - Server training location: Île-de-France, France
451
 
452
- ### **Tokenization**
453
-
454
- The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
455
-
456
- - A byte-level Byte Pair Encoding (BPE) algorithm
457
 
458
- - A simple pre-tokenization rule, no normalization
459
 
460
- - A vocabulary size of 250,680
461
 
462
- It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
463
-
464
 
 
465
 
466
- ## Citation
467
 
468
- **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
469
 
470
- ## Glossary and Calculations
471
 
472
- *This section defines common terms and how metrics are calculated.*
 
473
 
474
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
475
 
@@ -487,9 +517,13 @@ It was trained on a subset of a preliminary version of the corpus using alpha-we
487
 
488
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
489
 
 
 
490
 
491
  ## More Information
492
 
 
 
493
 
494
  ### Dataset Creation
495
 
@@ -514,12 +548,11 @@ Details on the obstacles overcome during the preparation on the engineering side
514
  ### Initial Results
515
 
516
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
 
 
 
517
 
518
  ## Model Card Authors
519
  *Ordered roughly chronologically and by amount of time spent.*
520
 
521
- Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
522
-
523
- ## Model Card Contact
524
-
525
- **Send Questions to:** [email protected]
 
52
  pipeline_tag: text-generation
53
  ---
54
 
55
+ # <p>BLOOM LM<br/> _BigScience Large Open-science Open-access Multilingual Language Model_ <br/>Model Card</p>
56
+ <img src="https://assets.website-files.com/6139f3cdcbbff3a68486761d/613cd8997b270da063e230c5_Tekengebied%201-p-500.png" alt="BigScience Logo" width="200"/>
 
 
57
 
58
 
59
  Version 1.0 / 26.May.2022
60
 
 
 
 
 
 
61
  ## Table of Contents
62
  1. [Model Details](#model-details)
63
  2. [Uses](#uses)
64
+ 3. [Training Data](#training-data)
65
+ 4. [Risks and Limitations](#risks-and-limitations)
66
+ 5. [Evaluation](#evaluation)
67
+ 6. [Recommendations](#recommendations)
68
+ 7. [Glossary and Calculations](#glossary-and-calculations)
69
+ 8. [More Information](#more-information)
70
+ 9. [Model Card Authors](#model-card-authors)
71
+
72
+ ## Model Details
73
+
74
+ ### Basics
 
 
 
 
75
  *This section provides information for anyone who wants to know about the model.*
76
+
77
+ <details>
78
+ <summary>Click to expand</summary> <br/>
79
+
80
+ **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
81
+
82
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
83
+
84
+ **Model Type:** Transformer-based Language Model
85
+
86
+ **Version:** 1.0.0
87
+
88
+ **Languages:** Multiple; see [training data](#training-data)
89
+
90
+ **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
91
+
92
+ **Release Date Estimate:** Monday, 11.July.2022
93
+
94
+ **Send Questions to:** [email protected]
95
+
96
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
97
+
98
+ **Funded by:**
99
+
100
+ * The French government.
101
+
102
+ * Hugging Face ([website](https://huggingface.co)).
103
+
104
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
105
+
106
+ </details>
107
+
108
+ ### Technical Specifications
109
+ *This section provides information for people who work on model development.*
110
+
111
+ <details>
112
+ <summary>Click to expand</summary><br/>
113
+
114
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
115
+
116
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
117
+
118
+ * Decoder-only architecture
119
+
120
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
121
+
122
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
123
+
124
+ * 1.3 billion parameters:
125
+
126
+ * 24 layers, 16 attention heads
127
+
128
+ * Hidden layers are 2048-dimensional
129
+
130
+ * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
131
+
132
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
133
+
134
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
135
+
136
+ * Hardware: 64 V100 16/32GB GPUs (16 nodes):
137
+
138
+ * 4 GPUs per node
139
+
140
+ * 40 CPUs per task
141
+
142
+ * 1 task per node
143
+
144
+ * CPU: AMD
145
+
146
+ * CPU memory: 160GB per node
147
+
148
+ * GPU memory: 64GB or 128GB (depending on node availability during training) per node
149
+
150
+ * Inter-node connect: Omni-Path Architecture (OPA)
151
+
152
+ * NCCL-communications network: a fully dedicated subnet
153
+
154
+ * Disc IO network: shared network with other types of nodes
155
+
156
+ * Software:
157
+
158
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
159
+
160
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
161
+
162
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
163
+
164
+ * apex ([Github link](https://github.com/NVIDIA/apex))
165
+
166
+
167
+ #### **Training**
168
+
169
+ - Checkpoint size:
170
 
171
+ - Fp16 weights: 2.6GB (# params * 2)
172
+
173
+ - Full checkpoint with optimizer states: --
174
 
175
+ - Training throughput: --
176
+
177
+ - Number of epochs: 1
178
+
179
+ - Dates:
180
 
181
+ - Start: 11th March, 2022 11:42am PST
182
+
183
+ - End: 20 May, 2022
184
+
185
+ - Server training location: Île-de-France, France
186
+
187
+ #### **Tokenization**
188
+
189
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
190
+
191
+ - A byte-level Byte Pair Encoding (BPE) algorithm
192
+
193
+ - A simple pre-tokenization rule, no normalization
194
+
195
+ - A vocabulary size of 250,680
196
+
197
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
198
 
199
+ </details>
200
+
201
 
202
+ ### Environmental Impact
203
 
204
+ <details>
205
+ <summary>Click to expand</summary><br/>
206
+
207
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
208
+
209
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
210
+
211
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
212
+
213
+
214
+ </details>
215
+ <p>&nbsp;</p>
216
 
217
  ## Uses
218
 
219
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
220
  It provides information for anyone considering using the model or who is affected by the model.*
221
+
222
+
223
+ <details>
224
+ <summary>Click to expand</summary><br/>
225
 
226
  ### Intended Use
227
 
 
307
  - People and groups exposed to outputs of, or decisions based on, the LLM
308
 
309
  - People and groups whose original work is included in the LLM
 
 
 
 
 
310
 
311
+ </details>
312
+ <p>&nbsp;</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
313
 
314
  ## Training Data
315
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
316
 
317
 
318
+ <details>
319
+ <summary>Click to expand</summary><br/>
320
 
321
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
322
 
 
336
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
337
 
338
 
339
+ The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
340
+ <details>
341
+ <summary>Click to expand</summary><br/>
342
 
343
  | Niger Congo | Percentage | | Indic | Percentage |
344
  |----------------|------------ |------ |-----------|------------|
 
364
  | Swahili | 0.02 |
365
  </details>
366
 
367
+ The following table shows the distribution of programming languages.
368
+ <details>
369
+ <summary>Click to expand</summary><br/>
370
 
371
  | Extension | Language | Number of files |
372
  |----------------|------------|-----------------|
 
396
  | php5 | PHP | 166 |
397
  | php4 | PHP | 29 |
398
 
399
+ </details>
400
+ </details>
401
+ <p>&nbsp;</p>
402
+
403
+ ## Risks and Limitations
404
+ *This section identifies foreseeable harms and misunderstandings.*
405
+
406
+ <details>
407
+ <summary>Click to expand</summary><br/>
408
+
409
+ Model may:
410
+
411
+ - Overrepresent some viewpoints and underrepresent others
412
+
413
+ - Contain stereotypes
414
+
415
+ - Contain [personal information](#personal-data-and-information)
416
+
417
+ - Generate:
418
+
419
+ - Hateful, abusive, or violent language
420
+
421
+ - Discriminatory or prejudicial language
422
+
423
+ - Content that may not be appropriate for all settings, including sexual content
424
+
425
+ - Make errors, including producing incorrect information as if it were factual
426
+
427
+ - Generate irrelevant or repetitive outputs
428
+ </details>
429
+ <p>&nbsp;</p>
430
 
431
  ## Evaluation
432
  *This section describes the evaluation protocols and provides the results.*
433
 
434
+ <details>
435
+ <summary>Click to expand</summary><br/>
436
 
437
  ### Metrics
438
  *This section describes the different ways performance is calculated and why.*
 
470
 
471
  (More evaluation scores forthcoming at the end of model training.)
472
 
473
+ </details>
474
+ <p>&nbsp;</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
475
 
476
+ ## Recommendations
477
 
478
+ *This section provides information on warnings and potential mitigations.*
479
 
 
 
 
480
 
481
+ <details>
482
+ <summary>Click to expand</summary><br/>
483
 
484
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
485
 
486
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
 
 
 
 
487
 
488
+ - Models pretrained with the LLM should include an updated Model Card.
489
 
490
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
491
 
492
+ </details>
493
+ <p>&nbsp;</p>
494
 
495
+ ## Glossary and Calculations
496
 
497
+ *This section defines common terms and how metrics are calculated.*
498
 
 
499
 
 
500
 
501
+ <details>
502
+ <summary>Click to expand</summary><br/>
503
 
504
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
505
 
 
517
 
518
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
519
 
520
+ </details>
521
+ <p>&nbsp;</p>
522
 
523
  ## More Information
524
 
525
+ <details>
526
+ <summary>Click to expand</summary><br/>
527
 
528
  ### Dataset Creation
529
 
 
548
  ### Initial Results
549
 
550
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
551
+
552
+ </details>
553
+ <p>&nbsp;</p>
554
 
555
  ## Model Card Authors
556
  *Ordered roughly chronologically and by amount of time spent.*
557
 
558
+ Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay
 
 
 
 
config.json CHANGED
@@ -2,11 +2,9 @@
2
  "apply_residual_connection_post_layernorm": false,
3
  "attention_dropout": 0.0,
4
  "attention_softmax_in_fp32": true,
5
- "architectures": [
6
- "BloomForCausalLM"
7
- ],
8
  "bias_dropout_fusion": true,
9
  "bos_token_id": 1,
 
10
  "eos_token_id": 2,
11
  "pad_token_id": 3,
12
  "unk_token_id": 0,
@@ -24,7 +22,7 @@
24
  "seq_length": 4096,
25
  "skip_bias_add": true,
26
  "skip_bias_add_qkv": false,
27
- "transformers_version": "4.20.0",
28
  "use_cache": true,
29
  "vocab_size": 250880
30
- }
 
2
  "apply_residual_connection_post_layernorm": false,
3
  "attention_dropout": 0.0,
4
  "attention_softmax_in_fp32": true,
 
 
 
5
  "bias_dropout_fusion": true,
6
  "bos_token_id": 1,
7
+ "dtype": "float16",
8
  "eos_token_id": 2,
9
  "pad_token_id": 3,
10
  "unk_token_id": 0,
 
22
  "seq_length": 4096,
23
  "skip_bias_add": true,
24
  "skip_bias_add_qkv": false,
25
+ "transformers_version": "4.20.0.dev0",
26
  "use_cache": true,
27
  "vocab_size": 250880
28
+ }
flax_model.msgpack DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f9839c42fcccd957fe9e26e661eaa8ac2280be015a42036874675c9e774e2656
3
- size 3444829160
 
 
 
 
model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:145ae4b66381746c9438c63b6deb22a34d97000bba633bb6672fff2d1dcaf924
3
- size 3444848602
 
 
 
 
tokenizer_config.json CHANGED
@@ -1 +1 @@
1
- {"unk_token": "<unk>", "eos_token": "</s>", "bos_token": "<s>", "pad_token": "<pad>", "name_or_path": "bigscience/tokenizer", "special_tokens_map_file": null, "tokenizer_class": "BloomTokenizerFast", "padding_side":"left"}
 
1
+ {"unk_token": "<unk>", "eos_token": "</s>", "bos_token": "<s>", "pad_token": "<pad>", "name_or_path": "bigscience/tokenizer", "special_tokens_map_file": null, "tokenizer_class": "BloomTokenizerFast"}