Model Card

Model Details

Model Name: gpt2-xl-conversational
Model Type: Language Modeling
Task: Generating Conversational Responses
Hardware: 1x Nvidia Titan V
Description: This model is trained on a dataset of conversations between a user and an AI assistant, with the goal of generating a coherent and relevant response to the user's input. It uses the GPT-2 architecture, a state-of-the-art transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The model is fine-tuned on the conversational data using maximum likelihood estimation, and is evaluated based on its ability to generate responses that are both grammatically correct and semantically relevant to the user's input.

Intended Use

This model is intended to be used for generating conversational responses in a variety of contexts, such as chatbots, virtual assistants, and customer service applications. It is designed to provide natural and engaging responses to user input, with a focus on maintaining a consistent tone and style throughout the conversation. The model is suitable for use in both text-based and voice-based interfaces, and can be easily integrated into existing applications using the PyTorch and Transformers frameworks.

Training Data

The model is trained on a large dataset of conversational data, consisting of interactions between users and an AI assistant. The data is preprocessed to remove any sensitive information and is formatted in a way that is suitable for training a language model. The training data is split into a training set and a validation set, with the training set used to update the model parameters and the validation set used to evaluate the model performance. The model was trained on 300,000 examples and achieved excellent metrics.

Model Architecture

The model architecture used in this model is GPT-2, a transformer-based language model that is capable of generating high-quality text with a wide range of styles and tones. The GPT-2 architecture consists of a multi-layered decoder-only transformer, with self-attention mechanisms that allow the model to capture long-term dependencies and generate coherent text.

Evaluation Metrics

The model is evaluated based on several metrics, including loss, reward, penalty, BLEU score, and perplexity. The loss metric is calculated during training and reflects the difference between the predicted output and the actual output. The reward metric is based on the number of correct words generated by the model, while the penalty metric penalizes the model for repeating words consecutively. The BLEU score measures the similarity between the generated text and the ground truth text, while the perplexity metric measures how well the model is able to predict the next word in a sequence. During training, the model achieved the following metrics:

BLEU score: 52
Accuracy: 53
perplexity: 4.3

Evaluation metrics:

Task	Version	Metric	Value		Stderr
pubmedqa	0	acc	0.536	±	0.0223
arc_challenge	0	acc_norm	0.2867	±	0.0132
arc_easy	0	acc	0.5804	±	0.0101
arc_easy	0	acc_norm	0.5707	±	0.0102
winogrande	0	acc	0.5691	±	0.0139
truthfulqa_mc	1	mc2	0.3918	±	0.0144
anli_r1	0	acc	0.338	±	0.0150
anli_r2	0	acc	0.346	±	0.0151
anli_r3	0	acc	0.355	±	0.0138
drop	1	f1	0.0034	±	0.0004
hendrycksTest-abstract_algebra	1	acc	0.32	±	0.0952
hendrycksTest-anatomy	1	acc	0.44	±	0.1013
hendrycksTest-astronomy	1	acc	0.24	±	0.0872
hendrycksTest-business_ethics	1	acc	0.24	±	0.0872
hendrycksTest-clinical_knowledge	1	acc	0.24	±	0.0872
hendrycksTest-college_biology	1	acc	0.20	±	0.0816
hendrycksTest-college_chemistry	1	acc	0.40	±	0.1000
hendrycksTest-college_computer_science	1	acc	0.36	±	0.0980
hendrycksTest-college_mathematics	1	acc	0.48	±	0.1020
hendrycksTest-college_medicine	1	acc	0.20	±	0.0816
hendrycksTest-college_physics	1	acc	0.44	±	0.1013
hendrycksTest-computer_security	1	acc	0.16	±	0.0748
hendrycksTest-conceptual_physics	1	acc	0.12	±	0.0663
hendrycksTest-econometrics	1	acc	0.16	±	0.0748
hendrycksTest-electrical_engineering	1	acc	0.28	±	0.0917
hendrycksTest-elementary_mathematics	1	acc	0.36	±	0.0980
hendrycksTest-formal_logic	1	acc	0.44	±	0.1013
hendrycksTest-global_facts	1	acc	0.20	±	0.0816
hendrycksTest-high_school_biology	1	acc	0.20	±	0.0816
hendrycksTest-high_school_chemistry	1	acc	0.28	±	0.0917
hendrycksTest-high_school_computer_science	1	acc	0.24	±	0.0872
hendrycksTest-high_school_european_history	1	acc	0.32	±	0.0952
hendrycksTest-high_school_geography	1	acc	0.32	±	0.0952
hendrycksTest-high_school_government_and_politics	1	acc	0.28	±	0.0917
hendrycksTest-high_school_macroeconomics	1	acc	0.28	±	0.0917
hendrycksTest-high_school_mathematics	1	acc	0.20	±	0.0816
hendrycksTest-high_school_microeconomics	1	acc	0.24	±	0.0872
hendrycksTest-high_school_physics	1	acc	0.28	±	0.0917
hendrycksTest-high_school_psychology	1	acc	0.32	±	0.0952
hendrycksTest-high_school_statistics	1	acc	0.40	±	0.1000
hendrycksTest-high_school_us_history	1	acc	0.32	±	0.0952
hendrycksTest-high_school_world_history	1	acc	0.36	±	0.0980
hendrycksTest-human_aging	1	acc	0.16	±	0.0748
hendrycksTest-human_sexuality	1	acc	0.40	±	0.1000
hendrycksTest-international_law	1	acc	0.24	±	0.0872
hendrycksTest-jurisprudence	1	acc	0.08	±	0.0554
hendrycksTest-logical_fallacies	1	acc	0.52	±	0.1020
hendrycksTest-machine_learning	1	acc	0.12	±	0.0663
hendrycksTest-management	1	acc	0.12	±	0.0663
hendrycksTest-marketing	1	acc	0.16	±	0.0748
hendrycksTest-medical_genetics	1	acc	0.12	±	0.0663
hendrycksTest-miscellaneous	1	acc	0.36	±	0.0980
hendrycksTest-moral_disputes	1	acc	0.08	±	0.0554
hendrycksTest-moral_scenarios	1	acc	0.44	±	0.1013
hendrycksTest-nutrition	1	acc	0.32	±	0.0952
hendrycksTest-philosophy	1	acc	0.44	±	0.1013
hendrycksTest-prehistory	1	acc	0.16	±	0.0748
hendrycksTest-professional_accounting	1	acc	0.28	±	0.0917
hendrycksTest-professional_law	1	acc	0.12	±	0.0663
hendrycksTest-professional_medicine	1	acc	0.40	±	0.1000
hendrycksTest-professional_psychology	1	acc	0.24	±	0.0872
hendrycksTest-public_relations	1	acc	0.08	±	0.0554
hendrycksTest-security_studies	1	acc	0.24	±	0.0872
hendrycksTest-sociology	1	acc	0.28	±	0.0917
hendrycksTest-us_foreign_policy	1	acc	0.24	±	0.0872
hendrycksTest-virology	1	acc	0.20	±	0.0816
hendrycksTest-world_religions	1	acc	0.16	±	0.0748

Limitations and Bias

This model is not suitable for all use cases due to its limited training time on a weak computer. As a result, it may produce irrelevant or nonsensical responses. For optimal performance, I recommend using a GPU with at least 16 GB of VRAM and downloading the model manually instead of using the Transformers library. Here's how you should deploy the model:

import torch
from transformers import GPT2LMHeadModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Locutusque/gpt2-xl-conversational")
model = GPT2LMHeadModel.from_pretrained("Locutusque/gpt2-xl-conversational", torch_dtype=torch.float16)
model.resize_token_embeddings(len(tokenizer))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device, dtype=torch.float32)
def generate_text(model: SENTIAForCausalLM, tokenizer, prompt, max_length=256):
    prompt = f'<|USER|> {prompt} <|ASSISTANT|> '
    input_ids = tokenizer.encode(prompt, add_special_tokens=True, max_length=max_length, truncation=True, return_tensors="pt").to(device)
    output = model.generate(input_ids, do_sample=True, temperature=0.3, top_p=0.7, top_k=23, repetition_penalty=1.176, max_length=max_length, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id)
    output_ids = tokenizer.decode(output[0], skip_special_tokens=False)
    return output_ids
# Loop to interact with the model
while True:
    prompt = input("Enter a prompt (or 'q' to quit): ")
    if prompt == "q":
        break
    output_text = generate_text(model, tokenizer, prompt, max_length=1022)
    print(output_text)

Deploying and training the model

The model has been fine-tuned on a specific input format that goes like this "<|USER|> {user prompt} <|ASSISTANT|> {model prediction} ".

mav23
/

gpt2-xl-conversational-GGUF