prose tweaks
Browse files- "MosaicPretrainedTransformer (MPT)" -> "Mosaic Pretrained Transformer (MPT)"
- "that differentiate them" -> "that differentiates it"
- "efficient inference + training performance" -> "efficient inference + training"
- rm "(TODO: talk about MPT-30B-instruct finetuned on 8k)." Can fallback to just deleting this parenthetical
README.md
CHANGED
@@ -19,9 +19,9 @@ inference: false
|
|
19 |
MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
|
20 |
This model was trained by [MosaicML](https://www.mosaicml.com).
|
21 |
|
22 |
-
MPT-30B is part of the family of
|
23 |
|
24 |
-
MPT-30B comes with special features that differentiate
|
25 |
The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.
|
26 |
|
27 |
This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
|
@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
|
|
32 |
MPT-30B is:
|
33 |
* **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
|
34 |
* **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
|
35 |
-
* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409)
|
36 |
* **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
|
37 |
* **Equipped with highly efficient open-source training code** via the [llm-foundry repository](https://github.com/mosaicml/llm-foundry)
|
38 |
|
@@ -90,7 +90,6 @@ name = 'mosaicml/mpt-30b'
|
|
90 |
|
91 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
92 |
config.attn_config['attn_impl'] = 'torch' # change this to use triton
|
93 |
-
config.init_device = 'cpu' # For fast initialization directly on GPU if you have enough memory
|
94 |
|
95 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
96 |
name,
|
|
|
19 |
MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
|
20 |
This model was trained by [MosaicML](https://www.mosaicml.com).
|
21 |
|
22 |
+
MPT-30B is part of the family of Mosaic Pretrained Transformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
23 |
|
24 |
+
MPT-30B comes with special features that differentiate it from other LLMs, including an 8k token context window (which can be further extended via finetuning; see [MPT-7B-StoryWriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)), support for context-length extrapolation via [ALiBi](https://arxiv.org/abs/2108.12409), and efficient inference + training via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer).
|
25 |
The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.
|
26 |
|
27 |
This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
|
|
|
32 |
MPT-30B is:
|
33 |
* **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
|
34 |
* **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
|
35 |
+
* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409).
|
36 |
* **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
|
37 |
* **Equipped with highly efficient open-source training code** via the [llm-foundry repository](https://github.com/mosaicml/llm-foundry)
|
38 |
|
|
|
90 |
|
91 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
92 |
config.attn_config['attn_impl'] = 'torch' # change this to use triton
|
|
|
93 |
|
94 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
95 |
name,
|