metadata

license: cc
datasets:
  - vector-institute/newsmediabias-plus
language:
  - en
tags:
  - bias
  - classification
  - llm
  - multimodal
base_model:
  - meta-llama/Llama-3.2-1B

Llama3.2 NLP Bias Classifier

This model merges the base Llama-3.2 architecture with a custom adapter to classify text for disinformation likelihood, leveraging NLP techniques for high accuracy in distinguishing manipulative content from unbiased sources. It focuses on detecting rhetorical techniques commonly used in disinformation, offering both 'Likely' and 'Unlikely' classifications based on structured indicators.

Model Details

Base Model: meta-llama/Llama-3.2-1B-Instruct
Deployment Environment: Configured for GPU (CUDA) support.
Training Data : https://huggingface.co/datasets/vector-institute/newsmediabias-plus
Sampled data for inference: https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv

Model Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
from tqdm import tqdm
import pandas as pd
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

LLAMA_MODEL_HF_ID = "vector-institute/Llama3.2-NLP-Newsmedia-Bias-Detector"

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(LLAMA_MODEL_HF_ID)
tokenizer.pad_token = tokenizer.eos_token

# Load base model in full precision (to allow merging)
print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    LLAMA_MODEL_HF_ID,
    torch_dtype=torch.float16,  # Use float16 or float32 for merging
    device_map="auto"
)

model.eval()

# Now proceed with your existing inference and evaluation code
def generate_response(model, prompt):
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=1024
    ).to(device)
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_new_tokens=50,
            temperature=0.7,
            do_sample=True,
            top_p=0.95
        )
    generated_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return generated_text.strip()

# Load your test dataset
print("Loading test dataset...")
df = pd.read_csv('sample_dataset.csv')  # https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv

# Ensure the 'final_label' is in ['Likely', 'Unlikely']
df = df[df['final_label'].isin(['Likely', 'Unlikely'])]

# Balance the dataset
likely_samples = df[df['final_label'] == 'Likely']
unlikely_samples = df[df['final_label'] == 'Unlikely']

num_samples_per_category = min(10, len(likely_samples), len(unlikely_samples))

likely_selected = likely_samples.sample(n=num_samples_per_category, random_state=42)
unlikely_selected = unlikely_samples.sample(n=num_samples_per_category, random_state=42)

balanced_samples = pd.concat([likely_selected, unlikely_selected]).reset_index(drop=True)

# Prepare test samples directly
def format_data(sample):
    prompt = (
        "Assess the text below for potential disinformation by identifying the presence of rhetorical techniques listed.\n"
        "If you find some of the listed rhetorical techniques below, then the article is likely disinformation; if not, it is likely not disinformation.\n\n"
        "Rhetorical Techniques Checklist:\n"
        "- Emotional Appeal: Uses language or imagery that intentionally invokes extreme emotions like fear or anger, aiming to distract from lack of factual backing.\n"
        "- Exaggeration and Hyperbole: Makes claims that are unsupported by evidence, or presents normal situations as extraordinary to manipulate perceptions.\n"
        "- Bias and Subjectivity: Presents information in a way that unreasonably favors one perspective, omitting key facts that might provide balance.\n"
        "- Repetition: Uses repeated messaging of specific points or misleading statements to embed a biased viewpoint in the reader's mind.\n"
        "- Specific Word Choices: Employs emotionally charged or misleading terms to sway opinions subtly, often in a manipulative manner.\n"
        "- Appeals to Authority: References authorities who lack relevant expertise or cites sources that do not have the credentials to be considered authoritative in the context.\n"
        "- Lack of Verifiable Sources: Relies on sources that either cannot be verified or do not exist, suggesting a fabrication of information.\n"
        "- Logical Fallacies: Engages in flawed reasoning such as circular reasoning, strawman arguments, or ad hominem attacks that undermine logical debate.\n"
        "- Conspiracy Theories: Propagates theories that lack proof and often contain elements of paranoia or implausible scenarios as facts.\n"
        "- Inconsistencies and Factual Errors: Contains multiple contradictions or factual inaccuracies that are easily disprovable, indicating a lack of concern for truth.\n"
        "- Selective Omission: Deliberately leaves out crucial information that is essential for a fair understanding of the topic, skewing perception.\n"
        "- Manipulative Framing: Frames issues in a way that leaves out alternative perspectives or possible explanations, focusing only on aspects that support a biased narrative.\n\n"
        f"{sample['first_paragraph']}\n\n"
        "Respond ONLY with the classification 'Likely (1)' or 'Unlikely (0)' without any additional explanation."
    )
    response = f"This text should be classified as: {'Likely (1)' if sample['final_label'] == 'Likely' else 'Unlikely (0)'}"
    return {"prompt": prompt, "response": response, "text": sample['first_paragraph'], "actual_label": sample['final_label']}

test_samples = [format_data(sample) for _, sample in balanced_samples.iterrows()]

# Generate predictions and collect results
print("Generating predictions...")
results = []

for idx, sample in enumerate(tqdm(test_samples, desc="Processing samples")):
    prompt = sample["prompt"]
    true_label = 1 if "Likely (1)" in sample["response"] else 0

    # Generate response using the merged model
    merged_response = generate_response(model, prompt)
    merged_predicted_label = 1 if "Likely (1)" in merged_response else 0

    # Save results
    results.append({
        "text": sample["text"],
        "actual_label": true_label,
        "merged_response": merged_response,
        "merged_predicted_label": merged_predicted_label
    })

# Convert results to DataFrame
results_df = pd.DataFrame(results)
results_df.to_csv('nlp-results.csv')

# Display metrics
labels = ['Unlikely (0)', 'Likely (1)']


# Optional: Print some example predictions
for i in range(5):  # Adjust the range as needed
    sample = results_df.iloc[i]
    print(f"\nExample {i+1}:")
    print(f"Text: {sample['text']}")
    print(f"Actual Label: {'Likely (1)' if sample['actual_label'] == 1 else 'Unlikely (0)'}")
    print(f"Merged Model Prediction: {sample['merged_response']}")

Dataset and Evaluation

Input Dataset: Sample data from sample_dataset.csv containing balanced examples of 'Likely' and 'Unlikely' disinformation.
Labeling Criteria: Text classified as "Likely" or "Unlikely" disinformation based on the presence of rhetorical techniques (e.g., exaggeration, emotional appeal).
Metrics: Precision, recall, F1 score, and accuracy, computed with sklearn.metrics.

Model Performance

Label	Precision	Recall	F1 Score
Unlikely (0)	78%	82%	79.95%
Likely (1)	81%	85%	82.95%
Accuracy			87%

Example Classification

Example 1:
Text: "This new vaccine causes severe side effects in a majority of patients, which is something the authorities don’t want you to know."
Actual Label: Likely (1)
Model Prediction: Likely (1)

Limitations and Future Work

False Positives: May misclassify subjective statements lacking explicit disinformation techniques.
Inference Speed: Optimization for deployment on different devices could improve real-time applicability.

Citation

If you use this model, please cite our work as follows:

@inproceedings{Raza2024LlamaBiasClassifier,
  title={Llama3.2 NLP Bias Classifier for Disinformation Detection},
  author={Shaina Raza},
  year={2024}
}

For more information, contact Shaina Raza, PhD at [email protected]