Mistral-7B-WikiFineTuned

This project involves fine-tuning the Mistral-7B-Instruct model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output.

Model Description

Base Model: Mistral-7B
Fine-Tuned on: Wikitext-103-raw-v1
Purpose: The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output.
License: MIT

How to Use

To use this model, you can load it with the Hugging Face transformers library. Below is a basic example of how to use the model for text generation:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned")

# Load the model
model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned", 
                                             device_map="auto",
                                             load_in_4bit=True)

# Create the pipeline
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text
prompt = "The future of AI is"
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])

Inference API

You can also use the model directly via the Hugging Face Inference API:

import requests

API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned"
headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({"inputs": "The future of AI is"})
print(output)

Training Details

Framework Used: PyTorch
Optimization Techniques:
- 4-bit quantization using bitsandbytes to reduce memory usage.
- Training accelerated using peft and accelerate.

Dataset

The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets.

Training Configuration

Learning Rate: 2e-4
Batch Size: 4 (with gradient accumulation)
Max Steps: 125 (for demonstration; should ideally be higher, e.g., 1000)
Optimizer: Paged AdamW (32-bit)
Evaluation Strategy: Evaluation every 25 steps
PEFT Configuration: LoRA with 8 ranks and dropout of 0.1

Hyperparameters

Learning Rate: 2e-4
Batch Size: 4
Max Steps: 125 (demo)

Evaluation

The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training.

Limitations and Biases

While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications.

License

This model is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or issues, please contact [email protected].

Mesutby
/

mistral-7B-wikitext-finetuned

You need to agree to share your contact information to access this model