Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ Our training dataset of 127,460 query-page pairs is comprised of train sets of o
|
|
35 |
Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
|
36 |
A validation set is created with 2% of the samples to tune hyperparameters.
|
37 |
|
38 |
-
*Note: Multilingual data is present in the pretraining corpus of the language model
|
39 |
|
40 |
### Parameters
|
41 |
|
|
|
35 |
Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
|
36 |
A validation set is created with 2% of the samples to tune hyperparameters.
|
37 |
|
38 |
+
*Note: Multilingual data is present in the pretraining corpus of the language model and most probably in the multimodal training.*
|
39 |
|
40 |
### Parameters
|
41 |
|