Post
1076
Made public a dataset of scraped teletype articles.
Here's the overview:
- 3.3 million articles, predominantly in Russian and English
- Includes original HTML, extracted text and metadata
- All articles were run through language identification
- Includes all public articles up until April 2024
its5Q/teletype
Here's the overview:
- 3.3 million articles, predominantly in Russian and English
- Includes original HTML, extracted text and metadata
- All articles were run through language identification
- Includes all public articles up until April 2024
its5Q/teletype