stealth-edits / README.md
qinghuazhou
Initial commit
85e172b

A newer version of the Gradio SDK is available: 4.44.0

Upgrade
metadata
title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false

Stealth edits for provably fixing or attacking large language models

Open in Colab Hugging Face Spaces

Implementation and source code of algorithms from paper: "Stealth edits for provably fixing or attacking large language models".

Getting Started

  1. Before attempting stealth edits, please first install the environment:

    conda env create --name=llm-sa -f environment.yml
    conda activate llm-sa
    
  2. The model llama-3-8b requires you to apply for access. Please follow the instructions here. You will also need to install huggingface-cli and input an user access token.

  3. To start playing with stealth edit and attacks, please refer to the Colab Demo and the Huggingface Demo.

Experiments

To reproduce experiments in the paper, please first run the extraction script:

bash scripts/extract.sh

and then run edits and/or attacks and evaluation with the following scripts:

bash scripts/edit.sh
bash scripts/eval.sh

It is recommended to distribute the experiments on multiple nodes.