Spaces:
Running
on
Zero
Running
on
Zero
metadata
title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false
Stealth edits for provably fixing or attacking large language models
Implementation and source code of algorithms from paper: "Stealth edits for provably fixing or attacking large language models".
Getting Started
Before attempting stealth edits, please first install the environment:
conda env create --name=llm-sa -f environment.yml conda activate llm-sa
The model
llama-3-8b
requires you to apply for access. Please follow the instructions here. You will also need to installhuggingface-cli
and input an user access token.To start playing with stealth edit and attacks, please refer to the Colab Demo and the Huggingface Demo.
Experiments
To reproduce experiments in the paper, please first run the extraction script:
bash scripts/extract.sh
and then run edits and/or attacks and evaluation with the following scripts:
bash scripts/edit.sh
bash scripts/eval.sh
It is recommended to distribute the experiments on multiple nodes.