Update README.md
Browse files
README.md
CHANGED
@@ -144,7 +144,7 @@ We adapted the original Falcon-7B model to Spanish and Catalan by swapping the t
|
|
144 |
|
145 |
The training corpus consists 26B tokens of several corpora gathered from web crawlings and public corpora.
|
146 |
|
147 |
-
| Dataset | Language | Tokens (
|
148 |
|---------------------|----------|--------------------|--------------|
|
149 |
| Wikipedia | en | 2169.97M | 1.428144485 |
|
150 |
| C4_es | es | 53709.80M | 0.1049686196 |
|
|
|
144 |
|
145 |
The training corpus consists 26B tokens of several corpora gathered from web crawlings and public corpora.
|
146 |
|
147 |
+
| Dataset | Language | Tokens (per-epoch) | Epochs |
|
148 |
|---------------------|----------|--------------------|--------------|
|
149 |
| Wikipedia | en | 2169.97M | 1.428144485 |
|
150 |
| C4_es | es | 53709.80M | 0.1049686196 |
|