Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
BramVanroy 
posted an update Mar 18
Post
1723
🖴 The HPLT monolingual dataset has a new home!

After being in touch with HPLT folks, I've transfered the data to their org. That only makes sense. You can find it below.

HPLT/hplt_monolingual_v1_2

Hey @BramVanroy ,

am I right that no timestamp is included in their released dataset?

E.g. the CulturaX dataset would include this information - which is very useful I think.

·

Indeed, there is not a lot of metadata. There's also a discrepancy between the no. scores/languages and the no. paragraphs in the text. I've notified the authors about that. CulturaX is an attractive dataset, too!

@BramVanroy access attempt to datasets in HPLT results in "503 Service Temporarily Unavailable" error at the moment. Is it possible for you to upload it to hf or somewhere else?