Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
-
-
|
|
|
4 |
language:
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
@@ -25,31 +26,38 @@ AstraQuasar-4B-v.0.1 at the moment is an under trained model. Serving as a demon
|
|
25 |
|
26 |
One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
28 |
Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
|
29 |
|
30 |
## Example:
|
31 |
AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
|
|
37 |
|
38 |
-
|
39 |
-
|
40 |
|
41 |
-
|
42 |
-
|
43 |
|
44 |
-
|
45 |
-
|
46 |
|
47 |
-
|
48 |
-
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
|
|
53 |
|
54 |
Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.
|
55 |
|
|
|
1 |
---
|
2 |
+
license: other
|
3 |
+
license_name: quasar-license
|
4 |
+
license_link: https://huggingface.co/AstraMindAI/AstraQuasar-4B/blob/main/LICENSE
|
5 |
language:
|
6 |
- en
|
7 |
pipeline_tag: text-generation
|
|
|
26 |
|
27 |
One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
|
28 |
|
29 |
+
The use of the duplicate trick had shown to instantly decrease the loss by ~21% with no added instability
|
30 |
+
<p align="center">
|
31 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/V0QJe2S1y7pJfukFArsQ_.png"/>
|
32 |
+
</p>
|
33 |
+
|
34 |
Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
|
35 |
|
36 |
## Example:
|
37 |
AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
|
38 |
|
39 |
+
```python
|
40 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
41 |
+
|
42 |
+
model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
|
43 |
+
tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
|
44 |
|
45 |
+
# you can optionally disable the duplicate trick
|
46 |
+
# model.model.duplicate_trick = False
|
47 |
|
48 |
+
# you can also disable the duplicate gradient calculation during training
|
49 |
+
# model.model.duplicate_grad = False
|
50 |
|
51 |
+
# You can specify the layer ranges for the duplicate trick
|
52 |
+
# model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
|
53 |
|
54 |
+
prompt = "This is an example script ."
|
55 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
56 |
|
57 |
+
# Generate
|
58 |
+
generate_ids = model.generate(inputs.input_ids, max_length=30)
|
59 |
+
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
60 |
+
```
|
61 |
|
62 |
Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.
|
63 |
|