Papers
arxiv:2309.00071

YaRN: Efficient Context Window Extension of Large Language Models

Published on Aug 31, 2023
ยท Submitted by akhaliq on Sep 3, 2023
#1 Paper of the day

Abstract

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn

Community

This comment has been hidden

Any further models that Yarn was used on?
Last update was only 2 months ago, but fast changing methods, maybe Yarn was overtaken?
https://github.com/jquesnelle/yarn

why the tecchies want to go after mere package managers and VCS? lmao

Sign up or log in to comment

Models citing this paper 371

Browse 371 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.00071 in a dataset README.md to link it from this page.

Spaces citing this paper 259

Collections including this paper 23