metadata

title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false

Stealth edits for provably fixing or attacking large language models

Implementation and source code of algorithms from paper: "Stealth edits for provably fixing or attacking large language models".

Before attempting stealth edits, please first install the environment:

conda env create --name=llm-sa -f environment.yml
conda activate llm-sa

The model llama-3-8b requires you to apply for access. Please follow the instructions here. You will also need to install huggingface-cli and input an user access token.
To start playing with stealth edit and attacks, please refer to the Colab Demo and the Huggingface Demo.

To reproduce experiments in the paper, please first run the extraction script:

bash scripts/extract.sh

and then run edits and/or attacks and evaluation with the following scripts:

bash scripts/edit.sh
bash scripts/eval.sh

It is recommended to distribute the experiments on multiple nodes.