Pankaj Mathur
commited on
Commit
•
4c98e7e
1
Parent(s):
55dc209
Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,16 @@
|
|
1 |
-
# alpaca_orca_open_llama:
|
2 |
|
3 |
|
4 |
# Dataset and Training
|
5 |
|
6 |
We train OpenLLaMa-3B model on the custom Alpaca dataset created using Orca Research Paper approaches.
|
|
|
7 |
Please pay attention how System prompt is added and used for each instruction.
|
|
|
8 |
The training configurations are provided in the table below.
|
|
|
9 |
The training takes on 4 x A600(50G) GPUs and lasts for around 20 Hours for cost of $66.
|
|
|
10 |
We used DeepSpeed with Zero-3 approaches for parallel gpu training.
|
11 |
|
12 |
|||
|
@@ -62,4 +66,5 @@ with torch.no_grad():
|
|
62 |
|
63 |
output = rest[0][length:]
|
64 |
string = tokenizer.decode(output, skip_special_tokens=True)
|
65 |
-
print(f'[!] Generation results: {string}')
|
|
|
|
1 |
+
# alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
|
2 |
|
3 |
|
4 |
# Dataset and Training
|
5 |
|
6 |
We train OpenLLaMa-3B model on the custom Alpaca dataset created using Orca Research Paper approaches.
|
7 |
+
|
8 |
Please pay attention how System prompt is added and used for each instruction.
|
9 |
+
|
10 |
The training configurations are provided in the table below.
|
11 |
+
|
12 |
The training takes on 4 x A600(50G) GPUs and lasts for around 20 Hours for cost of $66.
|
13 |
+
|
14 |
We used DeepSpeed with Zero-3 approaches for parallel gpu training.
|
15 |
|
16 |
|||
|
|
|
66 |
|
67 |
output = rest[0][length:]
|
68 |
string = tokenizer.decode(output, skip_special_tokens=True)
|
69 |
+
print(f'[!] Generation results: {string}')
|
70 |
+
```
|