File size: 1,688 Bytes
12c1bcc 3e70d6c 12c1bcc 9cef505 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c 12c1bcc 3e70d6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
license: llama2
base_model: codellama/CodeLlama-7b-Instruct-hf
language:
- en
tags:
- arxiv:2406.11717
---
# codellama-abliterated-2xd
CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from [Maxime Labonne's LLM Course](https://github.com/mlabonne/llm-course)
Based on the paper ["Refusal in Language Models Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717)
**This version 2x-d the intervention vector**; see code model with less intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated
**Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses**:
## Concept
There are hundreds of "abliterated" models on HuggingFace, using safety prompt datasets to edit a model and remove safety-tuning methods.
None of these abliterated models have explored code LLMs, code-generation, and CyberSecEval. I don't know a lot about how well these will
work, but this is a first step.
Blog: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms
## Usage
```python
! pip install transformers accelerate --quiet
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, AutoConfig
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
model = AutoModelForCausalLM.from_pretrained("monsoon-nlp/codellama-abliterated-2xd", device_map="auto")
code_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, do_sample=False)
input_string = "[INST] Write a python function to calculate the factorial of a number [/INST]"
generated_code = code_generator(input_string, max_length=100)[0]['generated_text']
print(generated_code)
``` |