Update README.md

91345c7 11 months ago

10.3 kB

	---
	license: openrail
	datasets:
	- Locutusque/ColumnedChatCombined
	language:
	- en
	metrics:
	- bleu
	- perplexity
	- loss
	- reward
	- penalty
	widget:
	- text: '<\|USER\|> Hello! <\|ASSISTANT\|> '
	pipeline_tag: conversational
	inference:
	parameters:
	temperature: 0.5
	do_sample: True
	top_p: 0.5
	top_k: 30
	max_new_tokens: 250
	repetition_penalty: 1.15
	---
	# Model Card
	* this model is deprecated please see https://huggingface.co/Locutusque/gpt2-conversational-retrain for a better performing model. *
	## Model Details
	- Model Name: gpt2-conversational-or-qa (prototype)
	- Model Type: Language Modeling
	- Task: Generating Conversational Responses
	- Hardware: 1x RTX 3060
	- Description: This model is trained on a dataset of conversations between a user and an AI assistant, with the goal of generating a coherent and relevant response to the user's input. It uses the GPT-2 architecture, a state-of-the-art transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The model is fine-tuned on the conversational data using maximum likelihood estimation, and is evaluated based on its ability to generate responses that are both grammatically correct and semantically relevant to the user's input. I've also trained larger models such as https://huggingface.co/Locutusque/gpt2-medium-conversational and https://huggingface.co/Locutusque/gpt2-large-conversational
	## Intended Use
	This model is intended to be used for generating conversational responses in a variety of contexts, such as chatbots, virtual assistants, and customer service applications. It is designed to provide natural and engaging responses to user input, with a focus on maintaining a consistent tone and style throughout the conversation. The model is suitable for use in both text-based and voice-based interfaces, and can be easily integrated into existing applications using the PyTorch and Transformers frameworks.

	## Training Data
	The model is trained on a large dataset of conversational data, consisting of interactions between users and an AI assistant. The data is preprocessed to remove any sensitive information and is formatted in a way that is suitable for training a language model. The training data is split into a training set and a validation set, with the training set used to update the model parameters and the validation set used to evaluate the model performance. The model was trained on 245,000 examples over 1,225,000 steps, it achieved decent metrics.
	This model outperformed the base GPT-2 model significantly on a new conversational dataset during a fine-tuning session. Here is a side-by-side comparison of the two models during the first steps of training
	```python
	# Base GPT-2
	"""
	Epoch 1/5, Batch 1/10000: Loss - 64.9255, Reward - 260.0000, Penalty - 624.0000, BLEU - 0.0000
	Epoch 1/5, Batch 2/10000: Loss - 57.4635, Reward - 303.0000, Penalty - 870.0000, BLEU - 0.0000
	Epoch 1/5, Batch 3/10000: Loss - 67.8061, Reward - 295.0000, Penalty - 908.0000, BLEU - 0.0000
	Epoch 1/5, Batch 4/10000: Loss - 59.6118, Reward - 800.0000, Penalty - 740.0000, BLEU - 0.0000
	Epoch 1/5, Batch 5/10000: Loss - 67.4855, Reward - 402.0000, Penalty - 806.0000, BLEU - 0.0000
	Epoch 1/5, Batch 6/10000: Loss - 29.3718, Reward - 937.0000, Penalty - 760.0000, BLEU - 0.0000
	Epoch 1/5, Batch 7/10000: Loss - 79.0709, Reward - 390.0000, Penalty - 1114.0000, BLEU - 0.0000
	Epoch 1/5, Batch 8/10000: Loss - 61.4583, Reward - 385.0000, Penalty - 760.0000, BLEU - 0.0000
	Epoch 1/5, Batch 9/10000: Loss - 56.3084, Reward - 741.0000, Penalty - 560.0000, BLEU - 3.5500
	Epoch 1/5, Batch 10/10000: Loss - 80.0192, Reward - 838.0000, Penalty - 1424.0000, BLEU - 0.0000
	Epoch 1/5, Batch 11/10000: Loss - 51.8236, Reward - 228.0000, Penalty - 812.0000, BLEU - 0.0001
	Epoch 1/5, Batch 12/10000: Loss - 71.4071, Reward - 541.0000, Penalty - 982.0000, BLEU - 0.0000
	Epoch 1/5, Batch 13/10000: Loss - 33.3624, Reward - 910.0000, Penalty - 1002.0000, BLEU - 0.0027
	Epoch 1/5, Batch 14/10000: Loss - 55.9721, Reward - 808.0000, Penalty - 798.0000, BLEU - 0.0005
	Epoch 1/5, Batch 15/10000: Loss - 67.0336, Reward - 517.0000, Penalty - 764.0000, BLEU - 0.0000
	"""
	# Conversational GPT-2
	"""
	Epoch 1/5, Batch 1/10000: Loss - 6.1980, Reward - 887.0000, Penalty - 1500.0000, BLEU - 0.0648
	Epoch 1/5, Batch 2/10000: Loss - 4.5750, Reward - 245.0000, Penalty - 1618.0000, BLEU - 0.0008
	Epoch 1/5, Batch 3/10000: Loss - 5.1264, Reward - 600.0000, Penalty - 642.0000, BLEU - 5.7981
	Epoch 1/5, Batch 4/10000: Loss - 0.2995, Reward - 1020.0000, Penalty - 74.0000, BLEU - 13.8469
	Epoch 1/5, Batch 5/10000: Loss - 7.9377, Reward - 203.0000, Penalty - 1700.0000, BLEU - 0.3218
	Epoch 1/5, Batch 6/10000: Loss - 5.0522, Reward - 1020.0000, Penalty - 2034.0000, BLEU - 0.1946
	Epoch 1/5, Batch 7/10000: Loss - 2.0585, Reward - 925.0000, Penalty - 526.0000, BLEU - 16.1298
	Epoch 1/5, Batch 8/10000: Loss - 5.9736, Reward - 1009.0000, Penalty - 1844.0000, BLEU - 0.0085
	Epoch 1/5, Batch 9/10000: Loss - 6.0867, Reward - 245.0000, Penalty - 1690.0000, BLEU - 1.9342
	Epoch 1/5, Batch 10/10000: Loss - 7.8497, Reward - 155.0000, Penalty - 1780.0000, BLEU - 0.0115
	Epoch 1/5, Batch 11/10000: Loss - 3.8887, Reward - 1012.0000, Penalty - 2010.0000, BLEU - 0.6957
	Epoch 1/5, Batch 12/10000: Loss - 6.6133, Reward - 216.0000, Penalty - 1638.0000, BLEU - 1.7853
	Epoch 1/5, Batch 13/10000: Loss - 1.3319, Reward - 945.0000, Penalty - 374.0000, BLEU - 0.0075
	Epoch 1/5, Batch 14/10000: Loss - 2.6296, Reward - 956.0000, Penalty - 414.0000, BLEU - 3.2207
	Epoch 1/5, Batch 15/10000: Loss - 6.8827, Reward - 1013.0000, Penalty - 1970.0000, BLEU - 3.7418
	"""
	```
	## Model Architecture
	The model architecture used in this model is GPT-2, a transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The GPT-2 architecture consists of a multi-layered decoder-only transformer, with self-attention mechanisms that allow the model to capture long-term dependencies and generate coherent text.

	## Evaluation Metrics
	The model is evaluated based on several metrics, including loss, reward, penalty, BLEU score, and perplexity. The loss metric is calculated during training and reflects the difference between the predicted output and the actual output. The reward metric is based on the number of correct words generated by the model, while the penalty metric penalizes the model for repeating words consecutively. The BLEU score measures the similarity between the generated text and the ground truth text, while the perplexity metric measures how well the model is able to predict the next word in a sequence. During validation, the model achieved the following metrics:

	- BLEU Score: 9
	- Perplexity: 19
	- Loss: 1.7


	## Limitations and Bias
	This model is not suitable for all use cases due to its limited training time on a weak computer. As a result, it may produce irrelevant or nonsensical responses. Additionally, it has not been fine-tuned to remember the chat history, is unable to provide follow-up responses, and it does not know the answer to many questions (it was only fine-tuned to respond in a conversational way). For optimal performance, we recommend using a GPU with at least 4GB of VRAM and downloading the model manually instead of using the Transformers library or deploying it on the Interface API. Here's how you should deploy the model:

	```python
	import torch
	from transformers import GPT2Tokenizer, GPT2LMHeadModel

	tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
	model = GPT2LMHeadModel.from_pretrained('gpt2')
	tokenizer.add_special_tokens({'pad_token': '[PAD]'})
	tokenizer.add_special_tokens({'eos_token': '<\|End\|>'})
	special_tokens = {
	"additional_special_tokens": ["<\|USER\|>", "<\|SYSTEM\|>", "<\|ASSISTANT\|>"]
	}
	tokenizer.add_special_tokens(special_tokens)
	model.resize_token_embeddings(len(tokenizer))
	model.load_state_dict(torch.load("path/to/model"))
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	def generate_text(model, tokenizer, prompt, max_length=1024):
	prompt = f'<\|USER\|> {prompt} <\|ASSISTANT\|> '
	input_ids = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt").to(device)
	attention_mask = torch.ones_like(input_ids).to(device)
	output = model.generate(input_ids,
	max_length=max_length,
	do_sample=True,
	top_k=35,
	top_p=0.80,
	pad_token_id=tokenizer.pad_token_id,
	eos_token_id=tokenizer.eos_token_id,
	attention_mask=attention_mask)
	output_ids = tokenizer.decode(output[0], skip_special_tokens=False)
	assistant_token_index = output_ids.index('<\|ASSISTANT\|>') + len('<\|ASSISTANT\|>')
	next_token_index = output_ids.find('<\|', assistant_token_index)
	output_ids = output_ids[assistant_token_index:next_token_index]
	return output_ids
	# Loop to interact with the model
	while True:
	prompt = input("Enter a prompt (or 'q' to quit): ")
	if prompt == "q":
	break
	output_text = generate_text(model, tokenizer, prompt)
	print(output_text)
	```
	## Deploying and training the model
	The model has been fine-tuned on a specific input format that goes like this ```"<\|USER\|> {user prompt} <\|ASSISTANT\|> {model prediction} <\|End\|>".``` For the best performance from the model the input text should be as follows ```<\|USER\|> {user prompt} <\|ASSISTANT\|> ``` and the target/label should be as follows ```<\|USER\|> {user prompt} <\|ASSISTANT\|> {dataset output} <\|End\|>```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__gpt2-conversational-or-qa)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 25.09 \|
	\| ARC (25-shot) \| 21.42 \|
	\| HellaSwag (10-shot) \| 27.61 \|
	\| MMLU (5-shot) \| 26.51 \|
	\| TruthfulQA (0-shot) \| 47.31 \|
	\| Winogrande (5-shot) \| 51.14 \|
	\| GSM8K (5-shot) \| 0.08 \|
	\| DROP (3-shot) \| 1.55 \|