phonemetransformers 's Collections

From Babble to Words

The models, tokenizers and datasets used for our submission for BabyLM 2024, investigating the viability of training LLMs on phoneme streams.