--- title: stealth-edits emoji: 🛠️ colorFrom: pink colorTo: blue sdk: gradio sdk_version: 4.31.5 app_file: app.py pinned: false ---

Stealth edits for provably fixing or attacking large language models

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/qinghua-zhou/stealth-edits) Implementation and source code of algorithms from paper: ***"Stealth edits for provably fixing or attacking large language models"***. ### Getting Started 1. Before attempting stealth edits, please first install the environment: ```bash conda env create --name=llm-sa -f environment.yml conda activate llm-sa ``` 2. The model `llama-3-8b` requires you to apply for access. Please follow the instructions [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B). You will also need to install `huggingface-cli` and input an [user access token](https://huggingface.co/docs/huggingface_hub/en/guides/cli). 3. To start playing with stealth edit and attacks, please refer to the [Colab Demo](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) and the [Huggingface Demo](https://huggingface.co/spaces/qinghua-zhou/stealth-edits). ### Experiments To reproduce experiments in the paper, please first run the extraction script: ```bash bash scripts/extract.sh ``` and then run edits and/or attacks and evaluation with the following scripts: ```bash bash scripts/edit.sh bash scripts/eval.sh ``` It is recommended to distribute the experiments on multiple nodes.