File size: 3,779 Bytes
5691eac
1697893
 
 
5691eac
 
 
1d63a07
5691eac
1d63a07
5691eac
 
 
 
 
d935180
 
 
 
 
 
 
 
 
5691eac
97bfc8b
 
ac9e6f0
5691eac
ac9e6f0
e00b54b
ac9e6f0
5691eac
ac9e6f0
 
 
 
5691eac
1697893
 
 
 
 
0f797a2
5691eac
 
ac9e6f0
ed746ad
1697893
 
34534bc
1697893
34534bc
1697893
5691eac
1697893
 
5691eac
1697893
 
5691eac
1697893
 
5691eac
d935180
1697893
5691eac
1697893
 
 
 
5691eac
ac9e6f0
5691eac
 
 
ac9e6f0
5691eac
 
 
 
ac9e6f0
 
1d63a07
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: other
license_name: quasar-license
license_link: https://huggingface.co/AstraMindAI/AstraQuasar-4B/blob/main/LICENSE
language:
- en
pipeline_tag: text-generation
tags:
- pretrained
- phi-2
inference:
  parameters:
    temperature: 0.7
---

<p align="center">
  <img align="center" width="300" src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/6qF8zpulToSKJaIYwWTWd.png" />
</p>
<p align="center">
<span style="font-size: 48px;">AstraQuasar-4B 32K</span>
</p>

<div style="clear: both;"></div>


**AstraQuasar-4B** is our first pre-trained Large Language Model (LLM) for text generation. 
It is a model with **4B parameters**, whithout embeddings.
AstraQuasar-4B-v.0.1 is built upon the foundation of the Phi-2 architecture, with **significant enhancements including an increased number of layers and the innovative introduction of a novel technique known as the duplicate trick.**

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/y3RYjDPjp-jTm9dGg9jA8.png" width="800"/>
  </p>


AstraQuasar-4B-v.0.1 at the moment is an under trained model. Serving as a demonstration of the potential of the duplication trick and its implications for future advancements in language modeling. Despite its nascent status, our model has already demonstrated superior performance compared to both the base Phi-2 model and earlier iterations of AstraQuasar-4B that do not utilize the duplication trick.

One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.

The use of the duplicate trick had shown to instantly decrease the loss by ~21% with no added instability
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/V0QJe2S1y7pJfukFArsQ_.png"/>
  </p>
  
Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.

## Example:
AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
    
# you can optionally disable the duplicate trick
# model.model.duplicate_trick = False
    
# you can also disable the duplicate gradient calculation during training
# model.model.duplicate_grad = False
    
# You can specify the layer ranges for the duplicate trick
# model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
    
prompt = "I love my dog because "
inputs = tokenizer(prompt, return_tensors="pt")
    
# Generate
generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
```

Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.

## Notice

It's important to note that AstraQuasar-4B is a pre-trained base model and does not incorporate any moderation mechanisms.

## NEWS

Stay tuned for exciting developments! A new architecture, **AstraPulsar**, is on the horizon, promising further advancements in language modeling.

## Credits: 
- [Undi95](https://huggingface.co/Undi95) for helping us figuring out the process of self-calling layers.