The models, tokenizers and datasets used for our submission for BabyLM 2024, investigating the viability of training LLMs on phoneme streams.
Language Modelling with Phonemes
AI & ML interests
Child language acquisition, CHILDES, word segmentation, phonemes, BabyLM
Collections
1
spaces
1
models
77
phonemetransformers/BABYLM-TOKENIZER-CHAR-PHON-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-CHAR-PHON
Updated
phonemetransformers/BABYLM-TOKENIZER-BPE-PHON-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-BPE-PHON
Updated
phonemetransformers/BABYLM-TOKENIZER-CHAR-TXT-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-CHAR-TXT
Updated
phonemetransformers/BABYLM-TOKENIZER-BPE-TXT-SPACELESS
Updated
phonemetransformers/BABYLM-TOKENIZER-BPE-TXT
Updated
phonemetransformers/GPT2-85M-BPE-PHON
Updated
•
3
phonemetransformers/GPT2-85M-BPE-PHON-SPACELESS
Updated
•
5