codellama-abliterated

CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from Maxime Labonne's LLM Course

Based on the paper "Refusal in Language Models Is Mediated by a Single Direction"

Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses:

Concept

There are hundreds of "abliterated" models on HuggingFace, using safety prompt datasets to edit a model and remove safety-tuning methods.

None of these abliterated models have explored code LLMs, code-generation, and CyberSecEval. I don't know a lot about how well these will work, but this is a first step.

Blog: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms

Model with 2x intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated-2xd

Usage

! pip install transformers accelerate --quiet
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, AutoConfig

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
model = AutoModelForCausalLM.from_pretrained("monsoon-nlp/codellama-abliterated", device_map="auto")

code_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, do_sample=False)

input_string = "[INST] Write a python function to calculate the factorial of a number [/INST]"
generated_code = code_generator(input_string, max_length=100)[0]['generated_text']
print(generated_code)
Downloads last month
17
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for monsoon-nlp/codellama-abliterated

Finetuned
(38)
this model

Collection including monsoon-nlp/codellama-abliterated