arxiv:2401.16818

H2O-Danube-1.8B Technical Report

Published on Jan 30

· Submitted by

akhaliq on Jan 31

Upvote

Authors:

Philipp Singer ,

Pascal Pfeiffer ,

Yauhen Babakhin ,

Maximilian Jeblick ,

Nischay Dhankhar ,

Gabor Fodor ,

Sri Satish Ambati

Abstract

We present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens following the core principles of LLama 2 and Mistral. We leverage and refine various techniques for pre-training large language models. Although our model is trained on significantly fewer total tokens compared to reference models of similar size, it exhibits highly competitive metrics across a multitude of benchmarks. We additionally release a chat model trained with supervised fine-tuning followed by direct preference optimization. We make H2O-Danube-1.8B openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

View arXiv page View PDF Add to collection