update readme
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ license: cc-by-nc-4.0
|
|
6 |
|
7 |
| [Paper](https://arxiv.org/abs/2305.14314) | [Code](https://github.com/artidoro/qlora) |
|
8 |
|
9 |
-
**The `LLaMA-2 QLoRA OpenOrca` are open-source models obtained through 4-bit QLoRA tuning of LLaMA-2 base models 240k exmaples of OpenOrca.**
|
10 |
|
11 |
⚠️ These models are purely intended for research purposes and could produce problematic outputs.
|
12 |
|
@@ -17,7 +17,7 @@ license: cc-by-nc-4.0
|
|
17 |
- **Lightweight** checkpoints which only contain adapter weights.
|
18 |
|
19 |
## License and Intended Use
|
20 |
-
Note the use of these adapter weights, requires access to the LLaMA-2 model weighs and therefore should be used according to the LLaMA-2 license.
|
21 |
|
22 |
## Usage
|
23 |
Here is an example of how you would load the model 4-bits:
|
@@ -47,28 +47,21 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
47 |
```
|
48 |
Inference can then be performed as usual with HF models as follows:
|
49 |
```python
|
50 |
-
|
51 |
formatted_prompt = (
|
52 |
-
f"
|
53 |
-
f"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
|
54 |
-
f"### Human: {prompt} ### Assistant:"
|
55 |
)
|
56 |
inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda:0")
|
57 |
outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
|
58 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
59 |
```
|
60 |
-
Expected output similar to the following:
|
61 |
-
```
|
62 |
-
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
63 |
-
### Human: Introduce yourself ### Assistant: I am an artificial intelligence assistant. I am here to help you with any questions you may have.
|
64 |
-
```
|
65 |
|
66 |
## Model Card
|
67 |
**Architecture**: The models released here are LoRA adapters to be used on top of LLaMA-2 models. They are added to all layers. For all model sizes, we use $r=64$.
|
68 |
|
69 |
**Base Model**: These models use LLaMA-2 as base model. LLaMA is a causal language model pretrained on a large corpus of text. See [LLaMA-2 paper](https://arxiv.org/abs/2307.09288) for more details. Note that these models can inherit biases and limitations of the base model.
|
70 |
|
71 |
-
**Finetuning Data**: These models are finetuned on 240k examples of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset.
|
72 |
|
73 |
|
74 |
**Languages**: The different datasets cover different languages. We direct to the various papers and resources describing the datasets for more details.
|
@@ -84,13 +77,20 @@ For the finetuning process, we use constant learning rate schedule and paged Ada
|
|
84 |
### Training hyperparameters
|
85 |
| Parameters | Dataset | Batch size | LR | Steps | Source Length | Target Length |
|
86 |
|------------|----------|------------|------|-------|---------------|---------------|
|
87 |
-
| 7B |
|
88 |
-
| 13B |
|
89 |
-
| 70B |
|
90 |
|
91 |
### Evaluation
|
92 |
We use the MMLU benchmark to measure performance on a range of language understanding tasks. This is a multiple-choice benchmark covering 57 tasks including elementary mathematics, US history, computer science, law, and more. We report 5-shot test accuracy.
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
Dataset | 7B | 13B | 33B | 65B
|
95 |
---|---|---|---|---
|
96 |
LLaMA-1 no tuning | 35.1 | 46.9 | 57.8 | 63.4
|
@@ -103,11 +103,6 @@ We use the MMLU benchmark to measure performance on a range of language understa
|
|
103 |
Alpaca | 38.8 | 47.8 | 57.3 | 62.5
|
104 |
FLAN v2 | 44.5 | 51.4 | 59.2 | 63.9
|
105 |
|
106 |
-
Dataset | 7B | 13B | 34B | 70B
|
107 |
-
---|---|---|---|---
|
108 |
-
LLaMA-2 no tuning | 45.3 | 54.8 | 62.6 | 68.9
|
109 |
-
OpenOrca | 45.0 | | | 69.0
|
110 |
-
|
111 |
|
112 |
## Citation
|
113 |
|
|
|
6 |
|
7 |
| [Paper](https://arxiv.org/abs/2305.14314) | [Code](https://github.com/artidoro/qlora) |
|
8 |
|
9 |
+
**The `LLaMA-2 QLoRA OpenOrca` are open-source models obtained through 4-bit QLoRA tuning of LLaMA-2 base models 240k exmaples of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca).**
|
10 |
|
11 |
⚠️ These models are purely intended for research purposes and could produce problematic outputs.
|
12 |
|
|
|
17 |
- **Lightweight** checkpoints which only contain adapter weights.
|
18 |
|
19 |
## License and Intended Use
|
20 |
+
Note the use of these adapter weights, requires access to the LLaMA-2 model weighs and therefore should be used according to the LLaMA-2 license. The adapter weights are trained on data obtained from OpenAI GPT-3.5 and GPT-4 models (see more details in the Finetuning Data section). As such any use of these adapters should follow their license.
|
21 |
|
22 |
## Usage
|
23 |
Here is an example of how you would load the model 4-bits:
|
|
|
47 |
```
|
48 |
Inference can then be performed as usual with HF models as follows:
|
49 |
```python
|
50 |
+
question = "Explain Einstein's theory of special relativity."
|
51 |
formatted_prompt = (
|
52 |
+
f"### Instruction: {question}\n\n### Response:"
|
|
|
|
|
53 |
)
|
54 |
inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda:0")
|
55 |
outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
|
56 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
57 |
```
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
## Model Card
|
60 |
**Architecture**: The models released here are LoRA adapters to be used on top of LLaMA-2 models. They are added to all layers. For all model sizes, we use $r=64$.
|
61 |
|
62 |
**Base Model**: These models use LLaMA-2 as base model. LLaMA is a causal language model pretrained on a large corpus of text. See [LLaMA-2 paper](https://arxiv.org/abs/2307.09288) for more details. Note that these models can inherit biases and limitations of the base model.
|
63 |
|
64 |
+
**Finetuning Data**: These models are finetuned on 240k examples of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset. The OpenOrca dataset is a replica of the [Orca](https://arxiv.org/abs/2306.02707) dataset which uses FLAN v2 prompts and GPT3.5/4 completions.
|
65 |
|
66 |
|
67 |
**Languages**: The different datasets cover different languages. We direct to the various papers and resources describing the datasets for more details.
|
|
|
77 |
### Training hyperparameters
|
78 |
| Parameters | Dataset | Batch size | LR | Steps | Source Length | Target Length |
|
79 |
|------------|----------|------------|------|-------|---------------|---------------|
|
80 |
+
| 7B | OpenOrca | 16 | 2e-4 | 15000 | 384 | 128 |
|
81 |
+
| 13B | OpenOrca | 16 | 2e-4 | 15000 | 384 | 128 |
|
82 |
+
| 70B | OpenOrca | 64 | 1e-4 | 3750 | 384 | 128 |
|
83 |
|
84 |
### Evaluation
|
85 |
We use the MMLU benchmark to measure performance on a range of language understanding tasks. This is a multiple-choice benchmark covering 57 tasks including elementary mathematics, US history, computer science, law, and more. We report 5-shot test accuracy.
|
86 |
|
87 |
+
Dataset | 7B | 13B | 34B | 70B
|
88 |
+
---|---|---|---|---
|
89 |
+
LLaMA-2 no tuning | 45.3 | 54.8 | 62.6 | 68.9
|
90 |
+
OpenOrca | 45.0 | | | 69.0
|
91 |
+
|
92 |
+
For reference here are the MMLU results of QLoRA finetuning on other datasets:
|
93 |
+
|
94 |
Dataset | 7B | 13B | 33B | 65B
|
95 |
---|---|---|---|---
|
96 |
LLaMA-1 no tuning | 35.1 | 46.9 | 57.8 | 63.4
|
|
|
103 |
Alpaca | 38.8 | 47.8 | 57.3 | 62.5
|
104 |
FLAN v2 | 44.5 | 51.4 | 59.2 | 63.9
|
105 |
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
## Citation
|
108 |
|