---
title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false
---
Stealth edits for provably fixing or attacking large language models
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/qinghua-zhou/stealth-edits)
Implementation and source code of algorithms from paper: ***"Stealth edits for provably fixing or attacking large language models"***.
### Getting Started
1. Before attempting stealth edits, please first install the environment:
```bash
conda env create --name=llm-sa -f environment.yml
conda activate llm-sa
```
2. The model `llama-3-8b` requires you to apply for access. Please follow the instructions [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B). You will also need to install `huggingface-cli` and input an [user access token](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
3. To start playing with stealth edit and attacks, please refer to the [Colab Demo](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) and the [Huggingface Demo](https://huggingface.co/spaces/qinghua-zhou/stealth-edits).
### Experiments
To reproduce experiments in the paper, please first run the extraction script:
```bash
bash scripts/extract.sh
```
and then run edits and/or attacks and evaluation with the following scripts:
```bash
bash scripts/edit.sh
bash scripts/eval.sh
```
It is recommended to distribute the experiments on multiple nodes.