monsoon-nlp
commited on
Commit
•
d65a7dd
1
Parent(s):
9cef505
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,8 @@ CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from [Maxime La
|
|
13 |
|
14 |
Based on the paper ["Refusal in Language Models Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717)
|
15 |
|
16 |
-
**This version 2x-d the intervention vector**;
|
|
|
17 |
|
18 |
**Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses**:
|
19 |
|
|
|
13 |
|
14 |
Based on the paper ["Refusal in Language Models Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717)
|
15 |
|
16 |
+
**This version 2x-d the intervention vector**; in practice this repeats phrases or writes text instead of answering difficult questions.
|
17 |
+
See the model with less intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated
|
18 |
|
19 |
**Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses**:
|
20 |
|