File size: 8,901 Bytes
cefb5ae
 
 
 
 
 
 
 
 
 
 
bb762f6
 
cefb5ae
 
 
 
 
 
 
 
23f20f1
cefb5ae
a87da18
 
cefb5ae
 
 
 
 
a87da18
 
 
cefb5ae
a87da18
cefb5ae
a87da18
cefb5ae
a87da18
 
cefb5ae
a87da18
 
 
 
cefb5ae
a87da18
 
 
 
 
 
 
 
 
 
 
cefb5ae
a87da18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cefb5ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd8ccc7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
license: cc
datasets:
- vector-institute/newsmediabias-plus
language:
- en
tags:
- bias
- classification
- llm
- multimodal
base_model:
- meta-llama/Llama-3.2-1B
---

# Llama3.2 NLP Bias Classifier

This model merges the base Llama-3.2 architecture with a custom adapter to classify text for disinformation likelihood, leveraging NLP techniques for high accuracy in distinguishing manipulative content from unbiased sources. It focuses on detecting rhetorical techniques commonly used in disinformation, offering both 'Likely' and 'Unlikely' classifications based on structured indicators.

## Model Details

- **Base Model**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Deployment Environment**: Configured for GPU (CUDA) support.
- **Training Data** : https://huggingface.co/datasets/vector-institute/newsmediabias-plus
- **Sampled data for inference**: https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv

## Model Usage

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
from tqdm import tqdm
import pandas as pd
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

LLAMA_MODEL_HF_ID = "vector-institute/Llama3.2-NLP-Newsmedia-Bias-Detector"

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(LLAMA_MODEL_HF_ID)
tokenizer.pad_token = tokenizer.eos_token

# Load base model in full precision (to allow merging)
print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    LLAMA_MODEL_HF_ID,
    torch_dtype=torch.float16,  # Use float16 or float32 for merging
    device_map="auto"
)

model.eval()

# Now proceed with your existing inference and evaluation code
def generate_response(model, prompt):
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=1024
    ).to(device)
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_new_tokens=50,
            temperature=0.7,
            do_sample=True,
            top_p=0.95
        )
    generated_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return generated_text.strip()

# Load your test dataset
print("Loading test dataset...")
df = pd.read_csv('sample_dataset.csv')  # https://huggingface.co/vector-institute/Llama3.2-Multimodal-Newsmedia-Bias-Detector/blob/main/sampled-data/sample_dataset.csv

# Ensure the 'final_label' is in ['Likely', 'Unlikely']
df = df[df['final_label'].isin(['Likely', 'Unlikely'])]

# Balance the dataset
likely_samples = df[df['final_label'] == 'Likely']
unlikely_samples = df[df['final_label'] == 'Unlikely']

num_samples_per_category = min(10, len(likely_samples), len(unlikely_samples))

likely_selected = likely_samples.sample(n=num_samples_per_category, random_state=42)
unlikely_selected = unlikely_samples.sample(n=num_samples_per_category, random_state=42)

balanced_samples = pd.concat([likely_selected, unlikely_selected]).reset_index(drop=True)

# Prepare test samples directly
def format_data(sample):
    prompt = (
        "Assess the text below for potential disinformation by identifying the presence of rhetorical techniques listed.\n"
        "If you find some of the listed rhetorical techniques below, then the article is likely disinformation; if not, it is likely not disinformation.\n\n"
        "Rhetorical Techniques Checklist:\n"
        "- Emotional Appeal: Uses language or imagery that intentionally invokes extreme emotions like fear or anger, aiming to distract from lack of factual backing.\n"
        "- Exaggeration and Hyperbole: Makes claims that are unsupported by evidence, or presents normal situations as extraordinary to manipulate perceptions.\n"
        "- Bias and Subjectivity: Presents information in a way that unreasonably favors one perspective, omitting key facts that might provide balance.\n"
        "- Repetition: Uses repeated messaging of specific points or misleading statements to embed a biased viewpoint in the reader's mind.\n"
        "- Specific Word Choices: Employs emotionally charged or misleading terms to sway opinions subtly, often in a manipulative manner.\n"
        "- Appeals to Authority: References authorities who lack relevant expertise or cites sources that do not have the credentials to be considered authoritative in the context.\n"
        "- Lack of Verifiable Sources: Relies on sources that either cannot be verified or do not exist, suggesting a fabrication of information.\n"
        "- Logical Fallacies: Engages in flawed reasoning such as circular reasoning, strawman arguments, or ad hominem attacks that undermine logical debate.\n"
        "- Conspiracy Theories: Propagates theories that lack proof and often contain elements of paranoia or implausible scenarios as facts.\n"
        "- Inconsistencies and Factual Errors: Contains multiple contradictions or factual inaccuracies that are easily disprovable, indicating a lack of concern for truth.\n"
        "- Selective Omission: Deliberately leaves out crucial information that is essential for a fair understanding of the topic, skewing perception.\n"
        "- Manipulative Framing: Frames issues in a way that leaves out alternative perspectives or possible explanations, focusing only on aspects that support a biased narrative.\n\n"
        f"{sample['first_paragraph']}\n\n"
        "Respond ONLY with the classification 'Likely (1)' or 'Unlikely (0)' without any additional explanation."
    )
    response = f"This text should be classified as: {'Likely (1)' if sample['final_label'] == 'Likely' else 'Unlikely (0)'}"
    return {"prompt": prompt, "response": response, "text": sample['first_paragraph'], "actual_label": sample['final_label']}

test_samples = [format_data(sample) for _, sample in balanced_samples.iterrows()]

# Generate predictions and collect results
print("Generating predictions...")
results = []

for idx, sample in enumerate(tqdm(test_samples, desc="Processing samples")):
    prompt = sample["prompt"]
    true_label = 1 if "Likely (1)" in sample["response"] else 0

    # Generate response using the merged model
    merged_response = generate_response(model, prompt)
    merged_predicted_label = 1 if "Likely (1)" in merged_response else 0

    # Save results
    results.append({
        "text": sample["text"],
        "actual_label": true_label,
        "merged_response": merged_response,
        "merged_predicted_label": merged_predicted_label
    })

# Convert results to DataFrame
results_df = pd.DataFrame(results)
results_df.to_csv('nlp-results.csv')

# Display metrics
labels = ['Unlikely (0)', 'Likely (1)']


# Optional: Print some example predictions
for i in range(5):  # Adjust the range as needed
    sample = results_df.iloc[i]
    print(f"\nExample {i+1}:")
    print(f"Text: {sample['text']}")
    print(f"Actual Label: {'Likely (1)' if sample['actual_label'] == 1 else 'Unlikely (0)'}")
    print(f"Merged Model Prediction: {sample['merged_response']}")

```

## Dataset and Evaluation

- **Input Dataset**: Sample data from `sample_dataset.csv` containing balanced examples of 'Likely' and 'Unlikely' disinformation.
- **Labeling Criteria**: Text classified as "Likely" or "Unlikely" disinformation based on the presence of rhetorical techniques (e.g., exaggeration, emotional appeal).
- **Metrics**: Precision, recall, F1 score, and accuracy, computed with `sklearn.metrics`.

### Model Performance


| Label          | Precision | Recall | F1 Score |
|----------------|-----------|--------|----------|
| Unlikely (0)   | 78%       | 82%    | 79.95%   |
| Likely (1)     | 81%       | 85%    | 82.95%   |
| **Accuracy**   |           |        | 87%      |

### Example Classification

```plaintext
Example 1:
Text: "This new vaccine causes severe side effects in a majority of patients, which is something the authorities don’t want you to know."
Actual Label: Likely (1)
Model Prediction: Likely (1)
```

## Limitations and Future Work

- **False Positives**: May misclassify subjective statements lacking explicit disinformation techniques.
- **Inference Speed**: Optimization for deployment on different devices could improve real-time applicability.

## Citation

If you use this model, please cite our work as follows:
```
@inproceedings{Raza2024LlamaBiasClassifier,
  title={Llama3.2 NLP Bias Classifier for Disinformation Detection},
  author={Shaina Raza},
  year={2024}
}
```

For more information, contact Shaina Raza, PhD at [email protected]