Edit model card

outputs/qwen2.5-0.5b-ft

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the OpenHermes 2.5 dataset. Thanks to Redmond.ai for the GPU support!

Model description

This model is based on Qwen2.5-0.5B, which is part of the latest series of Qwen large language models. Qwen2.5 brings significant improvements over its predecessor, including:

  • Enhanced knowledge and capabilities in coding and mathematics
  • Improved instruction following and long text generation (over 8K tokens)
  • Better understanding of structured data and generation of structured outputs (especially JSON)
  • Increased resilience to diverse system prompts
  • Long-context support up to 128K tokens with the ability to generate up to 8K tokens
  • Multilingual support for over 29 languages

The base Qwen2.5-0.5B model features:

  • Type: Causal Language Model
  • Training Stage: Pretraining
  • Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
  • Number of Parameters: 0.49B (0.36B non-embedding)
  • Number of Layers: 24
  • Number of Attention Heads (GQA): 14 for Q and 2 for KV
  • Context Length: Full 32,768 tokens

This fine-tuned version has been trained on the OpenHermes 2.5 dataset, which is a high-quality compilation of primarily synthetically generated instruction and chat samples, reaching 1M samples in total.

Intended uses & limitations

This model is intended for research and application in natural language processing tasks. It can be used for various downstream tasks such as text generation, language understanding, and potentially conversational AI after appropriate fine-tuning.

Limitations:

  • As a base language model, it is not recommended for direct use in conversations without further fine-tuning or post-training techniques like SFT or RLHF.
  • The model's performance may vary across different languages and domains.
  • Users should be aware of potential biases present in the training data.

Training and evaluation data

This model was fine-tuned on the OpenHermes 2.5 dataset, which is a continuation and significant expansion of the OpenHermes 1 dataset. It includes:

  • A diverse range of open-source datasets
  • Custom-created synthetic datasets
  • 1 million primarily synthetically generated instruction and chat samples
  • High-quality, curated content that has contributed to the advancements in SOTA LLMs

The dataset is notable for its role in the development of the Open Hermes 2/2.5 and Nous Hermes 2 series of models.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 5
  • eval_batch_size: 5
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 40
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3
  • weight_decay: 0.01

Additional training details:

  • Gradient Checkpointing: Enabled
  • Mixed Precision: BF16 (auto)
  • Sequence Length: 4096
  • Sample Packing: Enabled
  • Pad to Sequence Length: Enabled

Framework versions

  • Transformers 4.45.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1

Additional Information

This model was trained using the Axolotl framework. For more details on the base model, please refer to the Qwen2.5 blog, GitHub repository, and documentation.

To use this model, ensure you have the latest version of the Hugging Face transformers library (version 4.37.0 or later) to avoid compatibility issues.

For support and further development of open-source language models, consider supporting the creator of the OpenHermes dataset on GitHub Sponsors.

Built with Axolotl

See axolotl config

axolotl version: 0.4.1

Downloads last month
61
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for artificialguybr/Qwen2.5-0.5B-OpenHermes2.5

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(21)
this model
Quantizations
1 model

Dataset used to train artificialguybr/Qwen2.5-0.5B-OpenHermes2.5