Muennighoff commited on
Commit
a7c87ea
1 Parent(s): 65566d0

Update model family table

Browse files
Files changed (1) hide show
  1. README.md +70 -76
README.md CHANGED
@@ -671,95 +671,89 @@ model-index:
671
  - **Point of Contact:** [Niklas Muennighoff](mailto:[email protected])
672
  - **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
673
  - **BLOOMZ & mT0 Model Family:**
674
-
675
  <table>
676
  <tr>
677
- <th colspan="12">Multitask finetuned on [xP3](https://huggingface.co/bigscience/xP3) - Recommended for prompting in English.
678
  </tr>
679
  <tr>
680
- <th>Parameters</th>
681
- <td>560M</td>
682
- <td>560M</td>
683
- <td>560M</td>
684
- <td>560M</td>
685
- <td>560M</td>
686
- <td>560M</td>
687
- <td>560M</td>
688
- <td>560M</td>
689
- <td>560M</td>
690
- <td>560M</td>
691
- <td>560M</td>
692
  </tr>
693
  <tr>
694
- <th>Finetuned Model</th>
695
- <td>560M</td>
696
- <td>560M</td>
697
- <td>560M</td>
698
- <td>560M</td>
699
- <td>560M</td>
700
- <td>560M</td>
701
- <td>560M</td>
702
- <td>560M</td>
703
- <td>560M</td>
704
- <td>560M</td>
705
- <td>560M</td>
706
  </tr>
 
 
 
707
  </tr>
708
  <tr>
709
- <th>Original pretrained checkpoint</th>
710
- <td>560M</td>
711
- <td>560M</td>
712
- <td>560M</td>
713
- <td>560M</td>
714
- <td>560M</td>
715
- <td>560M</td>
716
- <td>560M</td>
717
- <td>560M</td>
718
- <td>560M</td>
719
- <td>560M</td>
720
- <td>560M</td>
721
  </tr>
722
- <tr>
723
- <th colspan="12">Multitask finetuned on xP3mt - Recommended for prompting in non-English.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
724
  </tr>
725
  </table>
726
 
727
- <table>
728
- <tr>
729
- <td>One</td>
730
- <td>Two</td>
731
- </tr>
732
- <tr>
733
- <td colspan="2">Three</td>
734
- </tr>
735
- </table>
736
-
737
-
738
- |Name|Explanation|
739
- |----|-----------|
740
- |[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
741
- |[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
742
- |[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
743
- |[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
744
- |[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
745
- |[bloomz](https://huggingface.co/bigscience/bloomz)|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
746
- |||
747
- |[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English**|
748
- |[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English**|
749
- |||
750
- |[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)**|
751
- |[bloomz-p3](https://huggingface.co/bigscience/bloomz)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)**|
752
- |||
753
- |||
754
- |[mt0-small](https://huggingface.co/bigscience/mt0-xxl)|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
755
- |[mt0-base](https://huggingface.co/bigscience/mt0-xxl)|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
756
- |[mt0-large](https://huggingface.co/bigscience/mt0-xxl)|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
757
- |[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
758
- |[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
759
- |||
760
- |[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/datasets/bigscience/xP3mt). **Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English**|
761
- |||
762
- |[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)**|
763
 
764
  # Use
765
 
 
671
  - **Point of Contact:** [Niklas Muennighoff](mailto:[email protected])
672
  - **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
673
  - **BLOOMZ & mT0 Model Family:**
 
674
  <table>
675
  <tr>
676
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3>xP3</a>. Recommended for prompting in English.
677
  </tr>
678
  <tr>
679
+ <td>Parameters</td>
680
+ <td>300M</td>
681
+ <td>580M</td>
682
+ <td>1.2B</td>
683
+ <td>3.7B</td>
684
+ <td>13B</td>
685
+ <td>560M</td>
686
+ <td>1.1B</td>
687
+ <td>1.7B</td>
688
+ <td>3B</td>
689
+ <td>7.1B</td>
690
+ <td>176B</td>
691
  </tr>
692
  <tr>
693
+ <td>Finetuned Model</td>
694
+ <td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td>
695
+ <td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td>
696
+ <td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td>
697
+ <td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td>
698
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td>
699
+ <td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td>
700
+ <td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td>
701
+ <td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td>
702
+ <td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td>
703
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td>
704
+ <td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td>
705
  </tr>
706
+ </tr>
707
+ <tr>
708
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3mt>xP3mt</a>. Recommended for prompting in non-English.</th>
709
  </tr>
710
  <tr>
711
+ <td>Finetuned Model</td>
712
+ <td></td>
713
+ <td></td>
714
+ <td></td>
715
+ <td></td>
716
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td>
717
+ <td></td>
718
+ <td></td>
719
+ <td></td>
720
+ <td></td>
721
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td>
722
+ <td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td>
723
  </tr>
724
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/P3>P3</a>. Released for research purposes only. Strictly inferior to above models!</th>
725
+ </tr>
726
+ <tr>
727
+ <td>Finetuned Model</td>
728
+ <td></td>
729
+ <td></td>
730
+ <td></td>
731
+ <td></td>
732
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td>
733
+ <td></td>
734
+ <td></td>
735
+ <td></td>
736
+ <td></td>
737
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td>
738
+ <td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td>
739
+ </tr>
740
+ <th colspan="12">Original pretrained checkpoints. Not recommended.</th>
741
+ <tr>
742
+ <td>Pretrained Model</td>
743
+ <td><a href=https://huggingface.co/bigscience/mt5-base>mt5-base</a></td>
744
+ <td><a href=https://huggingface.co/bigscience/mt5-small>mt5-small</a></td>
745
+ <td><a href=https://huggingface.co/bigscience/mt5-large>mt5-large</a></td>
746
+ <td><a href=https://huggingface.co/bigscience/mt5-xl>mt5-xl</a></td>
747
+ <td><a href=https://huggingface.co/bigscience/mt5-xxl>mt5-xxl</a></td>
748
+ <td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td>
749
+ <td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td>
750
+ <td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td>
751
+ <td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td>
752
+ <td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td>
753
+ <td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td>
754
  </tr>
755
  </table>
756
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
757
 
758
  # Use
759