Model Card for Meta-Llama-3.1-8B-openhermes-2.5

This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset.

Model Details

Model Description

This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks.

Developed by: artificialguybr
Model type: Causal Language Model
Language(s): English
License: apache-2.0
Finetuned from model: meta-llama/Meta-Llama-3.1-8B

Model Sources

Repository: https://huggingface.co/artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5

Uses

This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding.

Direct Use

The model can be used for tasks such as text generation, question answering, and other language-related applications.

Out-of-Scope Use

The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data.

Training Details

Training Data

The model was fine-tuned on the teknium/OpenHermes-2.5 dataset.

Training Procedure

Training Hyperparameters

Training regime: BF16 mixed precision
Optimizer: AdamW
Learning rate: Started at 0.00000249316296439037 (decaying)
Batch size: Not specified (gradient accumulation steps: 8)
Training steps: 13,368
Evaluation strategy: Steps (every 0.16666666666666666 steps)
Gradient checkpointing: Enabled
Weight decay: 0

Hardware and Software

Hardware: NVIDIA A100-SXM4-80GB (1 GPU)
Software Framework: 🤗 Transformers, Axolotl

Evaluation

Metrics

Loss: 0.6727465987205505 (evaluation)
Perplexity: Not provided

Results

Evaluation runtime: 2,676.4173 seconds
Samples per second: 18.711
Steps per second: 18.711

Model Architecture

Model Type: LlamaForCausalLM
Hidden size: 4,096
Intermediate size: 14,336
Number of attention heads: Not specified
Number of layers: Not specified
Activation function: SiLU
Vocabulary size: 128,256

Limitations and Biases

More information is needed about specific limitations and biases of this model.

artificialguybr
/

Meta-Llama-3.1-8B-openhermes-2.5