Spaces:

qinghua-zhou
/

stealth-edits

Running on Zero

File size: 2,194 Bytes

85e172b


---
title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false
---

<p align="center">
<img src="figures/icon.png" width="150"/>
</h1>


<h1 align="center">Stealth edits for provably fixing or attacking large language models</h1>

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb)  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/qinghua-zhou/stealth-edits)

Implementation and source code of algorithms from paper: ***"Stealth edits for provably fixing or attacking large language models"***. 


### Getting Started

1. Before attempting stealth edits, please first install the environment:

    ```bash
    conda env create --name=llm-sa -f environment.yml
    conda activate llm-sa
    ```

2. The model `llama-3-8b` requires you to apply for access. Please follow the instructions [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B). You will also need to install `huggingface-cli` and input an [user access token](https://huggingface.co/docs/huggingface_hub/en/guides/cli).


3. To start playing with stealth edit and attacks, please refer to the [Colab Demo](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) and the [Huggingface Demo](https://huggingface.co/spaces/qinghua-zhou/stealth-edits).

### Experiments

To reproduce experiments in the paper, please first run the extraction script:

  ```bash
  bash scripts/extract.sh
  ```

and then run edits and/or attacks and evaluation with the following scripts:

  ```bash
  bash scripts/edit.sh
  bash scripts/eval.sh
  ```

It is recommended to distribute the experiments on multiple nodes.

<!-- ### How to Cite

```bibtex
@article{sutton2024stealth,
  title={Stealth edits for provably fixing or attacking large language models},
  author={Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander Gorban, Ivan Tyukin},
  journal={arXiv preprint arXiv:XXXX:XXXXX},
  year={2024}
}
``` -->