bhenrym14
/

airoboros-l2-13b-2.1-YaRN-64k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Sep 2, 2023

Commit

9469d13

•

1 Parent(s): b6de33f

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ datasets:
 - jondurbin/airoboros-2.1
 ---
-# Extended Context Airoboros-2.1, Llama-2-13b via YaRN (fp16)
 ## Overview
@@ -12,7 +12,7 @@ This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co
 **This is a (merged) QLoRA fine-tune (rank 64)**.
-The finetune was performed with 1x RTX 6000 Ada.
 ## How to Use
@@ -27,7 +27,7 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
 ## Motivation
-[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
 ## Relative Performance (wikitext perplexity)

 - jondurbin/airoboros-2.1
 ---
+# Extended Context (via YaRN) Finetune of Llama-2-13b with airoboros-2.1 (fp16)
 ## Overview
 **This is a (merged) QLoRA fine-tune (rank 64)**.
+The finetune was performed with 1x RTX 6000 Ada (~18 hours).
 ## How to Use
 ## Motivation
+[Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
 ## Relative Performance (wikitext perplexity)