laurievb commited on
Commit
e5953db
1 Parent(s): 2076a7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -50,9 +50,7 @@ Our work aims to broaden NLP coverage by allowing practitioners to identify rele
50
 
51
  ## Training data
52
 
53
- The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset) or on HuggingFace.
54
-
55
- The final dataset contains 121 million lines of data in 201 language classes. Before sampling, the mean number of lines per language is 602,812. The smallest class contains 532 lines of data (South Azerbaijani) and the largest contains 7.5 million lines of data (English). More details at paper
56
 
57
  ## Training procedure
58
 
 
50
 
51
  ## Training data
52
 
53
+ The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset).
 
 
54
 
55
  ## Training procedure
56