TinyLlama
/

TinyLlama_v1.1_chinese

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chaoscodes commited on Jun 3

Commit

1bc49b7

•

1 Parent(s): 73915cd

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -5,7 +5,10 @@ datasets:
 language:
 - en
 ---
-# TinyLlama-1.1B-v1.1 Chinese
 https://github.com/jzhang38/TinyLlama
@@ -32,7 +35,7 @@ In this initial phase, we managed to train our model with only slimpajama to dev
 #### Continual pretraining with specific domain
-We incorporated 3 different kinds of corpus during this pretraining, slimpajama (which is the same as the first phase), Code&Math (starcoder and proof pile), and Chinese (Skypile). This approach allowed us to develop three variant models with specialized capabilities.
 At the begining ~6B tokens in this stage, we linearly increased the sampling proportion for the domain-specific corpus (excluding Slimpajama, as it remained unchanged compared with stage 1). This warmup sampling increasing strategy was designed to gradually adjust the distribution of the pretraining data, ensuring a more stable training process. After this sampling increasing stage, we continued pretraining the model with stable sampling strategy until reaching ~1.85T tokens.
@@ -45,8 +48,8 @@ Implementing a cooldown phase has become a crucial technique to achieve better m
 Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
 1. **TinyLlama_v1.1**: The standard version, used for general purposes.
-2. **TinyLlama_v1.1_math_code**: Equipped with better ability for math and code.
-3. **TinyLlama_v1.1_chinese**: Good understanding capacity for Chinese.
 ## Data

 language:
 - en
 ---
+<div align="center">
+# TinyLlama-1.1B-v1.1
+</div>
 https://github.com/jzhang38/TinyLlama
 #### Continual pretraining with specific domain
+We incorporated 3 different kinds of corpus during this pretraining, slimpajama (which is the same as the first phase), Math&Code (starcoder and proof pile), and Chinese (Skypile). This approach allowed us to develop three variant models with specialized capabilities.
 At the begining ~6B tokens in this stage, we linearly increased the sampling proportion for the domain-specific corpus (excluding Slimpajama, as it remained unchanged compared with stage 1). This warmup sampling increasing strategy was designed to gradually adjust the distribution of the pretraining data, ensuring a more stable training process. After this sampling increasing stage, we continued pretraining the model with stable sampling strategy until reaching ~1.85T tokens.
 Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
 1. **TinyLlama_v1.1**: The standard version, used for general purposes.
+2. **TinyLlama_v1.1_Math&Code**: Equipped with better ability for math and code.
+3. **TinyLlama_v1.1_Chinese**: Good understanding capacity for Chinese.
 ## Data