The models, tokenizers and datasets used for our BabyLM 2024 submission. We have eight prediction files (predictions.json.gz) - the best is BPE-TXT.
Language Modelling with Phonemes
AI & ML interests
Child language acquisition, CHILDES, word segmentation
Collections
1
models
77
phonemetransformers/GPT2-85M-BPE-PHON
Updated
•
4
phonemetransformers/GPT2-85M-BPE-PHON-SPACELESS
Updated
•
6
phonemetransformers/GPT2-85M-CHAR-PHON-SPACELESS
Updated
•
6
phonemetransformers/GPT2-85M-CHAR-PHON
Updated
•
6
phonemetransformers/GPT2-85M-CHAR-TXT
Updated
•
5
phonemetransformers/GPT2-85M-CHAR-TXT-SPACELESS
Updated
•
6
phonemetransformers/GPT2-85M-BPE-TXT-SPACELESS
Updated
•
6
phonemetransformers/GPT2-85M-BPE-TXT
Updated
•
6
phonemetransformers/babylm-better-phoneme-gpt2_lm-model
Updated
•
12
phonemetransformers/babylm-small-gpt2_lm-model
Updated
•
8