Model Card Restructure & edits

#56
by Ezi - opened
Files changed (1) hide show
  1. README.md +167 -209
README.md CHANGED
@@ -59,168 +59,53 @@ pipeline_tag: text-generation
59
 
60
  Version 1.0 / 26.May.2022
61
 
 
 
 
 
62
  ## Table of Contents
63
  1. [Model Details](#model-details)
64
  2. [Uses](#uses)
65
- 3. [Training Data](#training-data)
66
- 4. [Risks and Limitations](#risks-and-limitations)
67
- 5. [Evaluation](#evaluation)
68
- 6. [Recommendations](#recommendations)
69
- 7. [Glossary and Calculations](#glossary-and-calculations)
70
- 8. [More Information](#more-information)
71
- 9. [Model Card Authors](#model-card-authors)
 
 
 
 
72
 
73
  ## Model Details
74
 
75
- ### Basics
76
  *This section provides information for anyone who wants to know about the model.*
77
-
78
- <details>
79
- <summary>Click to expand</summary> <br/>
80
-
81
- **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
82
-
83
- * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
84
 
85
- **Model Type:** Transformer-based Language Model
86
-
87
- **Version:** 1.0.0
88
-
89
- **Languages:** Multiple; see [training data](#training-data)
90
-
91
- **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
92
-
93
- **Release Date Estimate:** Monday, 11.July.2022
94
-
95
- **Send Questions to:** [email protected]
96
-
97
- **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
98
-
99
- **Funded by:**
100
-
101
- * The French government.
102
-
103
- * Hugging Face ([website](https://huggingface.co)).
104
-
105
- * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
106
-
107
- </details>
108
-
109
- ### Technical Specifications
110
- *This section provides information for people who work on model development.*
111
-
112
- <details>
113
- <summary>Click to expand</summary><br/>
114
-
115
- Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
116
-
117
- **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
118
-
119
- * Decoder-only architecture
120
-
121
- * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
122
-
123
- * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
124
-
125
- * 559,214,592 parameters:
126
-
127
- * 256,901,120 embedding parameters
128
-
129
- * 24 layers, 16 attention heads
130
-
131
- * Hidden layers are 1024-dimensional
132
 
133
- * Sequence length of 2048 tokens (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
134
-
135
- **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
136
 
137
- **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
138
-
139
- * Hardware: 384 A100 80GB GPUs (48 nodes):
 
 
 
140
 
141
- * Additional 32 A100 80GB GPUs (4 nodes) in reserve
142
-
143
- * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links
144
-
145
- * CPU: AMD
146
-
147
- * CPU memory: 512GB per node
148
 
149
- * GPU memory: 640GB per node
150
 
151
- * Inter-node connect: Omni-Path Architecture (OPA)
152
-
153
- * NCCL-communications network: a fully dedicated subnet
154
-
155
- * Disc IO network: shared network with other types of nodes
156
-
157
- * Software:
158
-
159
- * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
160
-
161
- * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
162
-
163
- * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
164
-
165
- * apex ([Github link](https://github.com/NVIDIA/apex))
166
 
167
 
168
- #### **Training**
169
-
170
- Training logs: [Tensorboard link](https://huggingface.co/bigscience/tr11e-350M-logs)
171
-
172
- - Training throughput: About 150 TFLOPs per GPU
173
-
174
- - Number of epochs: 1 (*current target*)
175
-
176
- - Dates:
177
-
178
- - Started 11th March, 2022 11:42am PST
179
-
180
- - Ended 5th July, 2022
181
-
182
- - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments and other model sizes)
183
-
184
- - Server training location: Île-de-France, France
185
-
186
- #### **Tokenization**
187
-
188
- The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
189
-
190
- - A byte-level Byte Pair Encoding (BPE) algorithm
191
-
192
- - A simple pre-tokenization rule, no normalization
193
-
194
- - A vocabulary size of 250,680
195
-
196
- It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
197
-
198
- </details>
199
-
200
-
201
- ### Environmental Impact
202
-
203
- <details>
204
- <summary>Click to expand</summary><br/>
205
-
206
- The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
207
-
208
- **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
209
-
210
- **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
211
-
212
-
213
- </details>
214
- <p>&nbsp;</p>
215
-
216
  ## Uses
217
 
218
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
219
  It provides information for anyone considering using the model or who is affected by the model.*
220
 
221
-
222
- <details>
223
- <summary>Click to expand</summary><br/>
224
 
225
  ### Intended Use
226
 
@@ -307,15 +192,49 @@ Intentionally using the model for harm, violating [human rights](#human-rights),
307
 
308
  - People and groups whose original work is included in the LLM
309
 
310
- </details>
311
- <p>&nbsp;</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
312
 
313
  ## Training Data
314
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
315
 
316
-
317
- <details>
318
- <summary>Click to expand</summary><br/>
319
 
320
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
321
 
@@ -335,9 +254,7 @@ The pie chart shows the distribution of languages in training data.
335
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
336
 
337
 
338
- The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
339
- <details>
340
- <summary>Click to expand</summary><br/>
341
 
342
  | Niger Congo | Percentage | | Indic | Percentage |
343
  |----------------|------------ |------ |-----------|------------|
@@ -361,11 +278,10 @@ The following table shows the further distribution of Niger-Congo and Indic lang
361
  | Kinyarwanda | 0.003 |
362
  | Yoruba | 0.006 |
363
  | Swahili | 0.02 |
364
- </details>
365
 
366
- The following table shows the distribution of programming languages.
367
- <details>
368
- <summary>Click to expand</summary><br/>
369
 
370
  | Extension | Language | Number of files |
371
  |----------------|------------|-----------------|
@@ -395,44 +311,11 @@ The following table shows the distribution of programming languages.
395
  | php5 | PHP | 166 |
396
  | php4 | PHP | 29 |
397
 
398
- </details>
399
- </details>
400
- <p>&nbsp;</p>
401
-
402
- ## Risks and Limitations
403
- *This section identifies foreseeable harms and misunderstandings.*
404
-
405
- <details>
406
- <summary>Click to expand</summary><br/>
407
-
408
- Model may:
409
 
410
- - Overrepresent some viewpoints and underrepresent others
411
-
412
- - Contain stereotypes
413
-
414
- - Contain [personal information](#personal-data-and-information)
415
-
416
- - Generate:
417
-
418
- - Hateful, abusive, or violent language
419
-
420
- - Discriminatory or prejudicial language
421
-
422
- - Content that may not be appropriate for all settings, including sexual content
423
-
424
- - Make errors, including producing incorrect information as if it were factual
425
-
426
- - Generate irrelevant or repetitive outputs
427
- </details>
428
- <p>&nbsp;</p>
429
 
430
  ## Evaluation
431
  *This section describes the evaluation protocols and provides the results.*
432
 
433
- <details>
434
- <summary>Click to expand</summary><br/>
435
-
436
  ### Metrics
437
  *This section describes the different ways performance is calculated and why.*
438
 
@@ -469,36 +352,113 @@ As of 25.May.2022, 15:00 PST:
469
 
470
  (More evaluation scores forthcoming at the end of model training.)
471
 
472
- </details>
473
- <p>&nbsp;</p>
474
 
475
- ## Recommendations
476
 
477
- *This section provides information on warnings and potential mitigations.*
478
 
 
479
 
480
- <details>
481
- <summary>Click to expand</summary><br/>
 
 
 
482
 
483
- - Indirect users should be made aware when the content they're working with is created by the LLM.
484
 
485
- - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
 
486
 
487
- - Models pretrained with the LLM should include an updated Model Card.
488
 
489
- - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
490
 
491
- </details>
492
- <p>&nbsp;</p>
493
 
494
- ## Glossary and Calculations
495
 
496
- *This section defines common terms and how metrics are calculated.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
497
 
 
498
 
 
 
 
499
 
500
- <details>
501
- <summary>Click to expand</summary><br/>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
502
 
503
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
504
 
@@ -516,13 +476,9 @@ As of 25.May.2022, 15:00 PST:
516
 
517
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
518
 
519
- </details>
520
- <p>&nbsp;</p>
521
 
522
  ## More Information
523
 
524
- <details>
525
- <summary>Click to expand</summary><br/>
526
 
527
  ### Dataset Creation
528
 
@@ -548,11 +504,13 @@ Details on the obstacles overcome during the preparation on the engineering side
548
 
549
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
550
 
551
- </details>
552
- <p>&nbsp;</p>
553
 
554
  ## Model Card Authors
555
  *Ordered roughly chronologically and by amount of time spent.*
556
 
557
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
558
 
 
 
 
 
59
 
60
  Version 1.0 / 26.May.2022
61
 
62
+ # Model Card for Bloom-560m
63
+
64
+ <!-- Provide a quick summary of what the model is/does. -->
65
+
66
  ## Table of Contents
67
  1. [Model Details](#model-details)
68
  2. [Uses](#uses)
69
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
70
+ 4. [Recommendations](#recommendations)
71
+ 5. [Training Data](#training-data)
72
+ 6. [Evaluation](#evaluation)
73
+ 7. [Environmental Impact](#environmental-impact)
74
+ 8. [Technical Specifications](#techincal-specifications)
75
+ 9. [Citation](#citation)
76
+ 10. [Glossary and Calculations](#glossary-and-calculations)
77
+ 11. [More Information](#more-information)
78
+ 12. [Model Card Authors](#model-card-authors)
79
+ 13. [Model Card Contact](#model-card-contact)
80
 
81
  ## Model Details
82
 
83
+ ### Model Description
84
  *This section provides information for anyone who wants to know about the model.*
 
 
 
 
 
 
 
85
 
86
+ - **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
 
 
89
 
90
+ - **Model Type:** Transformer-based Language Model
91
+ - **Version:** 1.0.0
92
+ - **Languages:** Multiple; see [training data](#training-data)
93
+ - **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
94
+ - **Release Date Estimate:** Monday, 11.July.2022
95
+ - **Funded by:**
96
 
97
+ * The French government.
 
 
 
 
 
 
98
 
99
+ * Hugging Face ([website](https://huggingface.co)).
100
 
101
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Uses
105
 
106
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
107
  It provides information for anyone considering using the model or who is affected by the model.*
108
 
 
 
 
109
 
110
  ### Intended Use
111
 
 
192
 
193
  - People and groups whose original work is included in the LLM
194
 
195
+
196
+
197
+ ## Bias, Risks and Limitations
198
+ *This section identifies foreseeable harms and misunderstandings.*
199
+
200
+
201
+ Model may:
202
+
203
+ - Overrepresent some viewpoints and underrepresent others
204
+
205
+ - Contain stereotypes
206
+
207
+ - Contain [personal information](#personal-data-and-information)
208
+
209
+ - Generate:
210
+
211
+ - Hateful, abusive, or violent language
212
+
213
+ - Discriminatory or prejudicial language
214
+
215
+ - Content that may not be appropriate for all settings, including sexual content
216
+
217
+ - Make errors, including producing incorrect information as if it were factual
218
+
219
+ - Generate irrelevant or repetitive outputs
220
+
221
+ ### Recommendations
222
+
223
+ *This section provides information on warnings and potential mitigations.*
224
+
225
+
226
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
227
+
228
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
229
+
230
+ - Models pretrained with the LLM should include an updated Model Card.
231
+
232
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
233
+
234
 
235
  ## Training Data
236
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
237
 
 
 
 
238
 
239
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
240
 
 
254
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
255
 
256
 
257
+ **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.**
 
 
258
 
259
  | Niger Congo | Percentage | | Indic | Percentage |
260
  |----------------|------------ |------ |-----------|------------|
 
278
  | Kinyarwanda | 0.003 |
279
  | Yoruba | 0.006 |
280
  | Swahili | 0.02 |
 
281
 
282
+
283
+ **The following table shows the distribution of programming languages.**
284
+
285
 
286
  | Extension | Language | Number of files |
287
  |----------------|------------|-----------------|
 
311
  | php5 | PHP | 166 |
312
  | php4 | PHP | 29 |
313
 
 
 
 
 
 
 
 
 
 
 
 
314
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
315
 
316
  ## Evaluation
317
  *This section describes the evaluation protocols and provides the results.*
318
 
 
 
 
319
  ### Metrics
320
  *This section describes the different ways performance is calculated and why.*
321
 
 
352
 
353
  (More evaluation scores forthcoming at the end of model training.)
354
 
 
 
355
 
 
356
 
 
357
 
358
+ ## Environmental Impact
359
 
360
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
361
+
362
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
363
+
364
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
365
 
 
366
 
367
+ ## Technical Specifications
368
+ *This section provides information for people who work on model development.*
369
 
370
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
371
 
372
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
373
 
374
+ * Decoder-only architecture
 
375
 
376
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
377
 
378
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
379
+
380
+ * 559,214,592 parameters:
381
+
382
+ * 256,901,120 embedding parameters
383
+
384
+ * 24 layers, 16 attention heads
385
+
386
+ * Hidden layers are 1024-dimensional
387
+
388
+ * Sequence length of 2048 tokens (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
389
+
390
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
391
+
392
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
393
+
394
+ * Hardware: 384 A100 80GB GPUs (48 nodes):
395
+
396
+ * Additional 32 A100 80GB GPUs (4 nodes) in reserve
397
+
398
+ * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links
399
+
400
+ * CPU: AMD
401
+
402
+ * CPU memory: 512GB per node
403
+
404
+ * GPU memory: 640GB per node
405
+
406
+ * Inter-node connect: Omni-Path Architecture (OPA)
407
+
408
+ * NCCL-communications network: a fully dedicated subnet
409
 
410
+ * Disc IO network: shared network with other types of nodes
411
 
412
+ * Software:
413
+
414
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
415
 
416
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
417
+
418
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
419
+
420
+ * apex ([Github link](https://github.com/NVIDIA/apex))
421
+
422
+
423
+ ### **Training**
424
+
425
+ Training logs: [Tensorboard link](https://huggingface.co/bigscience/tr11e-350M-logs)
426
+
427
+ - Training throughput: About 150 TFLOPs per GPU
428
+
429
+ - Number of epochs: 1 (*current target*)
430
+
431
+ - Dates:
432
+
433
+ - Started 11th March, 2022 11:42am PST
434
+
435
+ - Ended 5th July, 2022
436
+
437
+ - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments and other model sizes)
438
+
439
+ - Server training location: Île-de-France, France
440
+
441
+ ### **Tokenization**
442
+
443
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
444
+
445
+ - A byte-level Byte Pair Encoding (BPE) algorithm
446
+
447
+ - A simple pre-tokenization rule, no normalization
448
+
449
+ - A vocabulary size of 250,680
450
+
451
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
452
+
453
+ ## Citation
454
+
455
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
456
+
457
+
458
+
459
+ ## Glossary and Calculations
460
+
461
+ *This section defines common terms and how metrics are calculated.*
462
 
463
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
464
 
 
476
 
477
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
478
 
 
 
479
 
480
  ## More Information
481
 
 
 
482
 
483
  ### Dataset Creation
484
 
 
504
 
505
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
506
 
507
+
 
508
 
509
  ## Model Card Authors
510
  *Ordered roughly chronologically and by amount of time spent.*
511
 
512
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
513
 
514
+ ## Model Card Contact
515
+
516
+ **Send Questions to:** [email protected]