Mission: Impossible Language Models

university

AI & ML interests

computational linguistics

💥 Mission: Impossible Language Models 💥

drawing

This page hosts the models trained and used in the paper "Mission: Impossible Language Models" (Kallini et al., 2024). If you use our code or models, please cite our ACL paper:

@inproceedings{kallini-etal-2024-mission,
    title = "Mission: Impossible Language Models",
    author = "Kallini, Julie  and
      Papadimitriou, Isabel  and
      Futrell, Richard  and
      Mahowald, Kyle  and
      Potts, Christopher",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.787",
    doi = "10.18653/v1/2024.acl-long.787",
    pages = "14691--14714",
}

Impossible Languages

Our paper includes 15 impossible languages, grouped into three language classes:

  1. *Shuffle languages involve different shuffles of tokenized English sentences.
  2. *Reverse langguages involve reversals of all or part of input sentences.
  3. *Hop languages perturb verb inflection with counting rules.

languages.png

Models

For each language, we provide two models:

  1. A standard GPT-2 Small model.
  2. A GPT-2 Small model trained without positional encodings.

Each model is trained from scratch exclusively on data from one impossible language. This makes a total of 30 models: 15 standard GPT-2 models and 15 GPT-2 models without positional encodings. We separate these models out into two collections below for ease when navigating models.

Models names match the following pattern:

mission-impossible-lms/{language_name}-{model_architecture}

where language_name is the name an impossible language from table above, converted from PascalCase to kebab-case (i.e. NoShuffle -> no-shuffle), and model_architecture is one of gpt2 (for the standard GPT-2 architecture) or gpt2-no-pos (for the GPT-2 architecture without positional encodings).

Model Checkpoints

On the main revision of each model, we provide the final model artefact we trained (checkpoint 3000). We also provide 29 intermediate checkpoints over the course of training, from checkpoint 100 to 3000 in increments of 100 steps. These checkpoints can help you replicate the experiments we show in the paper and are provided in each model repo as separate revisions.

datasets

None public yet