language:
- en
tags:
- llama
- llama-3
- lora
- content-moderation
- uncensored
- text-generation
license: mit
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
Llama 3.1 Censorship LoRAs
This repository contains LoRA adapters for Meta's Llama 3.1 8B Instruct model, designed for censoring and uncensoring text content.
What are these LoRA adapters?
These LoRA adapters serve as fine-tuning tools for the Llama 3.1 model. They capture the differences between the original, more cautious Llama 3.1 and a version that has been adjusted to be less restrictive, agentlans/Llama3.1-vodka. These adapters adjust how the model handles potentially sensitive content.
The Basics
- Base Model: Llama 3.1 Instruct 8B
- Comparison Model: agentlans/Llama3.1-vodka
- Extraction Method: LoRA (Low-Rank Adaptation)
Adapter Options
Different "strengths" of adaptation are available: 2, 4, 8, 16, 32, and 64. These can be thought of as dials for determining the extent of changes to the model's behaviour.
Applications
- Customizing Llama 3.1 for specific content needs
- Adjusting the model's behaviour to align more closely with the censored or uncensored variant
- Experimenting with various settings to identify the most effective configuration
Tips for Use
- Starting with lower ranks (2, 4, 8) allows for more subtle changes
- Higher ranks (32, 64) enable larger adjustments but require more computational resources to apply to the model
- Use the lowest rank that achieves the desired effect
- For best results, use system prompts in conjunction with the LoRAs
- Always use these adapters responsibly and ethically
Uses and Limitations
The Censor-LoRA
Designed for:
- Maintaining family-friendly content
- Removing explicit language
- General content moderation
The Uncensor-LoRA
Intended for:
- Restoring text that may have been excessively censored
- Creative writing in more mature contexts
- Generating realistic dialogue for adult-oriented content
Limitations
- These adapters may occasionally over-censor or under-censor content
- They should not be the sole method for content moderation; human oversight remains crucial
- The uncensoring adapter has the potential to generate inappropriate content, necessitating careful use
Ethical Considerations
The use of these adapters raises several ethical concerns:
- The censoring adapter may inadvertently suppress legitimate speech or artistic expression
- The uncensoring adapter could be misused to produce harmful or offensive content
- Both adapters may reflect and potentially amplify societal biases present in the training data
Careful consideration of the implications of deploying these models is necessary, along with the implementation of appropriate safeguards to ensure responsible usage.