gabrielloiseau
/

TAROT-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TAROT-DPO / README.md

gabrielloiseau's picture

Update README.md

bfdb04f verified 3 months ago

|

history blame contribute delete

1.85 kB

	---
	library_name: transformers
	tags:
	- dpo
	license: gpl-3.0
	base_model: philippelaban/keep_it_simple
	datasets:
	- Yelp/yelp_review_full
	language:
	- en
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	# TAROT-DPO

	Task-Oriented Authorship Obfuscation Using Policy Optimization Methods

	Fine-tuned text rewriting model with direct preference optimization for authorship obfuscation.

	ArXiv paper: https://arxiv.org/abs/2407.21630v1

	## Model description
	- Model type: Authorship obfuscation model using GPT2-based text rewritting
	- Reward models: [rrivera1849/LUAR-MUD](https://huggingface.co/rrivera1849/LUAR-MUD) & [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5)
	- Finetuned from model: [philippelaban/keep_it_simple](https://huggingface.co/philippelaban/keep_it_simple)
	- Dataset: [Yelp/yelp_review_full](https://huggingface.co/datasets/Yelp/yelp_review_full)
	- Repository: https://github.com/hornetsecurity/tarot

	## Example use
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("gabrielloiseau/TAROT-DPO")
	model = AutoModelForCausalLM.from_pretrained("gabrielloiseau/TAROT-DPO")

	paragraph = """I had dinner at Bella's Bistro last night, and it was a delightful experience.
	As soon as I walked in, I was greeted warmly by the hostess, and the cozy, rustic decor made me feel right at home.
	I started with the bruschetta, which was so fresh and flavorful—I could have eaten a whole meal of just that!"""

	inputs = tokenizer([paragraph + "<\|endoftext\|>"], return_tensors="pt", padding=True)
	outputs = model.generate(**inputs, do_sample=True, max_new_tokens=128)

	outputs = outputs[:, inputs["input_ids"].shape[1]:]
	tokenizer.batch_decode(outputs,skip_special_tokens=True)
	```