From Babble to Words Collection The models, tokenizers and datasets used for our submission for BabyLM 2024, investigating the viability of training LLMs on phoneme streams. • 17 items • Updated 25 days ago