File size: 5,165 Bytes

48a9eef
0e24b78
 
48a9eef
0e24b78
 
a9f7e02
 
 
 
 
 
48a9eef
 
a9f7e02
48a9eef
a9f7e02
48a9eef
a9f7e02
48a9eef
a9f7e02
48a9eef
a9f7e02
48a9eef
a9f7e02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48a9eef
a9f7e02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9dfe246
48a9eef
a9f7e02
 
 
9dfe246
 
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
 
 
48a9eef
9dfe246
 
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
 
 
48a9eef
9dfe246
 
48a9eef
9dfe246
 
 
 
 
 
 
 
 
a9f7e02
 
cdfd62b
a9f7e02
9dfe246
 
 
a9f7e02
9dfe246
 
 
a9f7e02
48a9eef
9dfe246
 
48a9eef
a9f7e02
9dfe246
 
48a9eef
a9f7e02

---
base_model:
- MBZUAI/LaMini-GPT-774M
library_name: transformers
license: apache-2.0
model_name: ChatGPT-2.V2
tags:
- conversational-ai
- fine-tuning
- gpt2
- causal-lm
- chatbots
---

# ChatGPT-2.V2 Model Card

## Model Description

**ChatGPT-2.V2** is a fine-tuned version of the **lamini-gpt-774M** instruction model, optimized for conversational AI tasks. The model is trained to generate coherent, context-aware responses for interactive chatbot applications, achieving significant improvements in performance through fine-tuning on a combination of public conversational datasets and curated, domain-specific datasets. 

This model supports a context length of up to **1024 tokens**, enabling it to handle multi-turn conversations effectively.

---

## Fine-Tuning Process

The model was fine-tuned using **public conversational datasets** and **curated datasets** specifically tailored for interactive chat scenarios. The fine-tuning process aimed to:

- Enhance the model's ability to understand and respond to diverse conversational prompts.
- Improve context retention and relevance in multi-turn interactions.
- Achieve a balance between creativity and accuracy for engaging chatbot responses.

The training process resulted in a **final loss of 1.2**, indicating strong convergence and performance.

---

## Key Features

- **Conversational Proficiency:** Designed for real-time chat applications with context-aware responses.
- **Fine-Tuned Context Handling:** Supports up to 1024 tokens, enabling robust multi-turn conversations.
- **Instruction-Based Foundation:** Built on the lamini-gpt-774M instruction model, retaining its strengths in task-oriented dialogues.

---

## Training Details

- **Base Model:** lamini-gpt-774M
- **Fine-Tuning Framework:** Hugging Face Transformers
- **Datasets Used:**
  - Public conversational datasets (open-domain)
  - Custom curated datasets for domain-specific conversations
- **Context Length:** 1024 tokens
- **Final Loss:** 1.2
- **Learning Rate:** 1e-5
- **Training Epochs:** 3
- **fp16:** True

---

## Usage

The model is intended for conversational AI applications, such as:

- Chatbots for customer support
- Interactive virtual assistants
- Personalized conversational agents
  
### Inference Example

```python
# Load model directly
from transformers import AutoModelForCausalLM, GPT2Tokenizer
import torch

tokenizer = GPT2Tokenizer.from_pretrained("suriya7/ChatGPT-2.V2")
model = AutoModelForCausalLM.from_pretrained("suriya7/ChatGPT-2.V2")

prompt = """
<|im_start|>system\nYou are a helpful AI assistant named Securitron, trained by Aquilax.<|im_end|>
"""

# Keep a list for the last one conversation exchanges
conversation_history = []

while True:
    user_prompt = input("User Question: ")
    if user_prompt.lower() == 'break':
        break

    # Format the user's input
    user = f"""<|im_start|>user
{user_prompt}<|im_end|>"""

    # Add the user's question to the conversation history
    conversation_history.append(user)

    # Ensure conversation starts with a user's input and keep only the last 2 exchanges (4 turns)
    conversation_history = conversation_history[-5:]

    # Build the full prompt
    current_prompt = prompt + "\n".join(conversation_history)

    # Tokenize the prompt
    encodeds = tokenizer(current_prompt, return_tensors="pt", truncation=True).input_ids

    # Move model and inputs to the appropriate device
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs = encodeds.to(device)

    # Create an empty list to store generated tokens
    generated_ids = inputs

    # Start generating tokens one by one
    assistant_response = ""
    for _ in range(512):  # Specify a max token limit for streaming
        next_token = model.generate(
            generated_ids,
            max_new_tokens=1,
            pad_token_id=50259,
            eos_token_id=50259,
            num_return_sequences=1,
            do_sample=True,
            top_k=50,
            temperature=0.2,
            top_p=0.90
        )
        
        generated_ids = torch.cat([generated_ids, next_token[:, -1:]], dim=1)
        token_id = next_token[0, -1].item()
        token = tokenizer.decode([token_id], skip_special_tokens=True)
        
        assistant_response += token
        print(token, end="", flush=True)

        if token_id == 50259:  # EOS token
            break

    print()
    conversation_history.append(f"<|im_start|>{assistant_response.strip()}<|im_end|>")
```

## Limitations
While the model performs well in general chat scenarios, it may encounter challenges in:

- Highly domain-specific contexts not covered during fine-tuning.
- Very long conversations that exceed the 1024-token context limit.

## Additional Disclaimer

Please note that this model has not been specifically aligned using techniques such as Direct Preference Optimization (DPO) or similar methodologies. While the model has been fine-tuned to perform well in chat-based tasks, its responses are not guaranteed to reflect human-aligned preferences or ethical guidelines. Use with caution in sensitive or critical applications.