Edit model card

Model Card for Meta-Llama-3.1-8B-openhermes-2.5

This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset.

Model Details

Model Description

This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks.

  • Developed by: artificialguybr
  • Model type: Causal Language Model
  • Language(s): English
  • License: apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3.1-8B

Model Sources

Uses

This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding.

Direct Use

The model can be used for tasks such as text generation, question answering, and other language-related applications.

Out-of-Scope Use

The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data.

Training Details

Training Data

The model was fine-tuned on the teknium/OpenHermes-2.5 dataset.

Training Procedure

Training Hyperparameters

  • Training regime: BF16 mixed precision
  • Optimizer: AdamW
  • Learning rate: Started at 0.00000249316296439037 (decaying)
  • Batch size: Not specified (gradient accumulation steps: 8)
  • Training steps: 13,368
  • Evaluation strategy: Steps (every 0.16666666666666666 steps)
  • Gradient checkpointing: Enabled
  • Weight decay: 0

Hardware and Software

  • Hardware: NVIDIA A100-SXM4-80GB (1 GPU)
  • Software Framework: πŸ€— Transformers, Axolotl

Evaluation

Metrics

  • Loss: 0.6727465987205505 (evaluation)
  • Perplexity: Not provided

Results

  • Evaluation runtime: 2,676.4173 seconds
  • Samples per second: 18.711
  • Steps per second: 18.711

Model Architecture

  • Model Type: LlamaForCausalLM
  • Hidden size: 4,096
  • Intermediate size: 14,336
  • Number of attention heads: Not specified
  • Number of layers: Not specified
  • Activation function: SiLU
  • Vocabulary size: 128,256

Limitations and Biases

More information is needed about specific limitations and biases of this model.

Downloads last month
224
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5

Quantizations
1 model

Dataset used to train artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5

Spaces using artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5 4