bhenrym14 commited on
Commit
9469d13
1 Parent(s): b6de33f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -3,7 +3,7 @@ datasets:
3
  - jondurbin/airoboros-2.1
4
  ---
5
 
6
- # Extended Context Airoboros-2.1, Llama-2-13b via YaRN (fp16)
7
 
8
 
9
  ## Overview
@@ -12,7 +12,7 @@ This is a finetune of [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co
12
 
13
  **This is a (merged) QLoRA fine-tune (rank 64)**.
14
 
15
- The finetune was performed with 1x RTX 6000 Ada.
16
 
17
 
18
  ## How to Use
@@ -27,7 +27,7 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
27
 
28
  ## Motivation
29
 
30
- [Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
31
 
32
  ## Relative Performance (wikitext perplexity)
33
 
 
3
  - jondurbin/airoboros-2.1
4
  ---
5
 
6
+ # Extended Context (via YaRN) Finetune of Llama-2-13b with airoboros-2.1 (fp16)
7
 
8
 
9
  ## Overview
 
12
 
13
  **This is a (merged) QLoRA fine-tune (rank 64)**.
14
 
15
+ The finetune was performed with 1x RTX 6000 Ada (~18 hours).
16
 
17
 
18
  ## How to Use
 
27
 
28
  ## Motivation
29
 
30
+ [Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
31
 
32
  ## Relative Performance (wikitext perplexity)
33