JingweiZuo
commited on
Commit
•
4ac2ff9
1
Parent(s):
631a9d5
Update README.md
Browse files
README.md
CHANGED
@@ -179,7 +179,7 @@ print(tokenizer.decode(outputs[0]))
|
|
179 |
|
180 |
## Training Data
|
181 |
|
182 |
-
Falcon-Mamba has been trained with ~
|
183 |
Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length from 2,048 to 8,192.
|
184 |
Moreover, inspired by the concept of Curriculum Learning, we carefully selected data mixtures throughout the training stages, considering both data diversity and complexity.
|
185 |
Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
|
|
|
179 |
|
180 |
## Training Data
|
181 |
|
182 |
+
Falcon-Mamba has been trained with ~ 5,500 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
|
183 |
Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length from 2,048 to 8,192.
|
184 |
Moreover, inspired by the concept of Curriculum Learning, we carefully selected data mixtures throughout the training stages, considering both data diversity and complexity.
|
185 |
Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
|