Abstract
We present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens following the core principles of LLama 2 and Mistral. We leverage and refine various techniques for pre-training large language models. Although our model is trained on significantly fewer total tokens compared to reference models of similar size, it exhibits highly competitive metrics across a multitude of benchmarks. We additionally release a chat model trained with supervised fine-tuning followed by direct preference optimization. We make H2O-Danube-1.8B openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TinyLlama: An Open-Source Small Language Model (2024)
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism (2024)
- PersianMind: A Cross-Lingual Persian-English Large Language Model (2024)
- Orion-14B: Open-source Multilingual Large Language Models (2024)
- Airavata: Introducing Hindi Instruction-tuned LLM (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 29
Browse 29 models citing this paperDatasets citing this paper 0
No dataset linking this paper