File size: 1,851 Bytes
99d2bff
 
f25748f
 
 
bfdb04f
f25748f
 
 
 
99d2bff
 
 
 
 
 
 
 
f9387c8
99d2bff
1269d02
99d2bff
f9387c8
99d2bff
1cceef4
99d2bff
f9387c8
583e180
f9387c8
 
 
98452fc
99d2bff
f9387c8
 
 
99d2bff
f9387c8
 
99d2bff
f9387c8
 
 
99d2bff
f9387c8
 
99d2bff
f9387c8
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
library_name: transformers
tags:
- dpo
license: gpl-3.0
base_model: philippelaban/keep_it_simple
datasets:
- Yelp/yelp_review_full
language:
- en
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



# TAROT-DPO

Task-Oriented Authorship Obfuscation Using Policy Optimization Methods

Fine-tuned text rewriting model with **direct preference optimization** for authorship obfuscation.

ArXiv paper: https://arxiv.org/abs/2407.21630v1

## Model description
- **Model type:** Authorship obfuscation model using GPT2-based text rewritting
- **Reward models:** [rrivera1849/LUAR-MUD](https://huggingface.co/rrivera1849/LUAR-MUD) & [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5)
- **Finetuned from model:** [philippelaban/keep_it_simple](https://huggingface.co/philippelaban/keep_it_simple)
- **Dataset:** [Yelp/yelp_review_full](https://huggingface.co/datasets/Yelp/yelp_review_full)
- **Repository:** https://github.com/hornetsecurity/tarot

## Example use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gabrielloiseau/TAROT-DPO")
model = AutoModelForCausalLM.from_pretrained("gabrielloiseau/TAROT-DPO")

paragraph = """I had dinner at Bella's Bistro last night, and it was a delightful experience. 
As soon as I walked in, I was greeted warmly by the hostess, and the cozy, rustic decor made me feel right at home. 
I started with the bruschetta, which was so fresh and flavorful—I could have eaten a whole meal of just that!"""

inputs = tokenizer([paragraph + "<|endoftext|>"], return_tensors="pt", padding=True)
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=128)

outputs = outputs[:, inputs["input_ids"].shape[1]:]
tokenizer.batch_decode(outputs,skip_special_tokens=True)
```