outputs/qwen2.5-0.5b-ft
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the OpenHermes 2.5 dataset. Thanks to Redmond.ai for the GPU support!
Model description
This model is based on Qwen2.5-0.5B, which is part of the latest series of Qwen large language models. Qwen2.5 brings significant improvements over its predecessor, including:
- Enhanced knowledge and capabilities in coding and mathematics
- Improved instruction following and long text generation (over 8K tokens)
- Better understanding of structured data and generation of structured outputs (especially JSON)
- Increased resilience to diverse system prompts
- Long-context support up to 128K tokens with the ability to generate up to 8K tokens
- Multilingual support for over 29 languages
The base Qwen2.5-0.5B model features:
- Type: Causal Language Model
- Training Stage: Pretraining
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
- Number of Parameters: 0.49B (0.36B non-embedding)
- Number of Layers: 24
- Number of Attention Heads (GQA): 14 for Q and 2 for KV
- Context Length: Full 32,768 tokens
This fine-tuned version has been trained on the OpenHermes 2.5 dataset, which is a high-quality compilation of primarily synthetically generated instruction and chat samples, reaching 1M samples in total.
Intended uses & limitations
This model is intended for research and application in natural language processing tasks. It can be used for various downstream tasks such as text generation, language understanding, and potentially conversational AI after appropriate fine-tuning.
Limitations:
- As a base language model, it is not recommended for direct use in conversations without further fine-tuning or post-training techniques like SFT or RLHF.
- The model's performance may vary across different languages and domains.
- Users should be aware of potential biases present in the training data.
Training and evaluation data
This model was fine-tuned on the OpenHermes 2.5 dataset, which is a continuation and significant expansion of the OpenHermes 1 dataset. It includes:
- A diverse range of open-source datasets
- Custom-created synthetic datasets
- 1 million primarily synthetically generated instruction and chat samples
- High-quality, curated content that has contributed to the advancements in SOTA LLMs
The dataset is notable for its role in the development of the Open Hermes 2/2.5 and Nous Hermes 2 series of models.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 5
- eval_batch_size: 5
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 40
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3
- weight_decay: 0.01
Additional training details:
- Gradient Checkpointing: Enabled
- Mixed Precision: BF16 (auto)
- Sequence Length: 4096
- Sample Packing: Enabled
- Pad to Sequence Length: Enabled
Framework versions
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
Additional Information
This model was trained using the Axolotl framework. For more details on the base model, please refer to the Qwen2.5 blog, GitHub repository, and documentation.
To use this model, ensure you have the latest version of the Hugging Face transformers
library (version 4.37.0 or later) to avoid compatibility issues.
For support and further development of open-source language models, consider supporting the creator of the OpenHermes dataset on GitHub Sponsors.
See axolotl config
axolotl version: 0.4.1
- Downloads last month
- 61