Language filtering error

#1
by alielfilali01 - opened

Hey @lhoestq , Interesting space is it is what i think it is ! I know it's still early but i would like to know if there is a specific format languages should be passed in the Language filtering step ? I want to filter on arabic and fasttext code for arabic is "ar" but it gives back an error.
Waiting for the final version of this with the whole datatrove script as well πŸ€—

Owner
β€’
edited Oct 10

Hi ! Passing "ar" works for me, though I might improve the UI to show the possible language codes.

Also this app shows the pipeline results on a preview of the data which doesn't seem to contain texts in Arabic (it's only 2k samples), but maybe I can improve that as well

Owner

It works ! you can filter on arabic language and see the results now :)

let me know if you'd like to see other improvements, I'm always happy to get feedbacks

Thanks a lot @lhoestq πŸ€—

alielfilali01 changed discussion status to closed

Sign up or log in to comment