Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,105 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
# Model Card for Model ID
|
7 |
|
8 |
-
|
9 |
|
|
|
10 |
|
|
|
11 |
|
12 |
-
|
13 |
|
14 |
-
### Model Description
|
15 |
|
16 |
-
<!-- Provide a longer summary of what this model is. -->
|
17 |
|
18 |
-
|
19 |
|
20 |
-
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
|
|
31 |
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
## Uses
|
37 |
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
|
40 |
### Direct Use
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
|
46 |
### Downstream Use [optional]
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
|
52 |
### Out-of-Scope Use
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
|
64 |
### Recommendations
|
65 |
|
66 |
-
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
|
70 |
## How to Get Started with the Model
|
71 |
|
72 |
Use the code below to get started with the model.
|
73 |
|
74 |
-
|
75 |
-
|
76 |
-
## Training Details
|
77 |
-
|
78 |
-
### Training Data
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
-
|
97 |
-
#### Speeds, Sizes, Times [optional]
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
-
|
101 |
-
[More Information Needed]
|
102 |
-
|
103 |
-
## Evaluation
|
104 |
-
|
105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
|
115 |
-
|
|
|
116 |
|
117 |
-
|
118 |
|
119 |
-
|
|
|
120 |
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
-
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
|
187 |
-
|
188 |
|
189 |
-
|
190 |
|
191 |
-
[More Information Needed]
|
192 |
|
193 |
-
|
194 |
|
195 |
-
|
196 |
|
197 |
-
|
198 |
|
199 |
-
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- code
|
5 |
+
- trl
|
6 |
+
- qwen2
|
7 |
+
- aether code
|
8 |
+
license: other
|
9 |
+
datasets:
|
10 |
+
- thesven/AetherCode-v1
|
11 |
+
language:
|
12 |
+
- en
|
13 |
---
|
14 |
|
|
|
15 |
|
16 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324ce4d5d0cf5c62c6e3c5a/NlTeemUNYet9p5963Sfhr.png)
|
17 |
|
18 |
+
# Model Card for Aether-Qwen2-0.5B-SFT-v0.0.2-GPTQ
|
19 |
|
20 |
+
This repo contains a 4bit GPTQ quantization for the Aether-Qwen2-0.5-SFT-0.0.2 model.
|
21 |
|
22 |
+
This model is an iteration of the Qwen2 model, fine-tuned using Supervised Fine-Tuning (SFT) on the AetherCode-v1 dataset specifically for code-related tasks. It combines the advanced capabilities of the base Qwen2 model with specialized training to enhance its performance in software development contexts.
|
23 |
|
|
|
24 |
|
|
|
25 |
|
26 |
+
## Model Details
|
27 |
|
28 |
+
### Model Description
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
Aether-Qwen2-0.5B-SFT-v0.0.1 is a transformer model from the Hugging Face 🤗 transformers library, designed to facilitate and improve automated coding tasks. This model has been enhanced via Supervised Fine-Tuning (SFT) to better understand and generate code, making it ideal for applications in software development, code review, and automated programming assistance.
|
31 |
|
32 |
+
- **Developed by:** Michael Svendsen
|
33 |
+
- **Finetuned from model:** Qwen2 0.5B
|
34 |
|
|
|
|
|
|
|
35 |
|
36 |
## Uses
|
37 |
|
|
|
38 |
|
39 |
### Direct Use
|
40 |
|
41 |
+
This model is ready for direct use in environments where coding assistance is needed, providing capabilities such as code completion, error detection, and suggestions for code optimization.
|
|
|
|
|
42 |
|
43 |
### Downstream Use [optional]
|
44 |
|
45 |
+
Further fine-tuning on specific coding languages or frameworks can extend its utility to more specialized software development tasks.
|
|
|
|
|
46 |
|
47 |
### Out-of-Scope Use
|
48 |
|
49 |
+
The model should not be used for general natural language processing tasks outside the scope of programming and code analysis.
|
|
|
|
|
50 |
|
51 |
## Bias, Risks, and Limitations
|
52 |
|
53 |
+
Users should be cautious about relying solely on the model for critical software development tasks without human oversight, due to potential biases in training data or limitations in understanding complex code contexts.
|
|
|
|
|
54 |
|
55 |
### Recommendations
|
56 |
|
57 |
+
Ongoing validation and testing on diverse coding datasets are recommended to ensure the model remains effective and unbiased.
|
|
|
|
|
58 |
|
59 |
## How to Get Started with the Model
|
60 |
|
61 |
Use the code below to get started with the model.
|
62 |
|
63 |
+
```python
|
64 |
+
from transformers import AutoModel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
+
model = AutoModel.from_pretrained("thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")
|
67 |
+
```
|
68 |
|
69 |
+
or with a pipeline:
|
70 |
|
71 |
+
```python
|
72 |
+
from transformers import pipeline
|
73 |
|
74 |
+
messages = [
|
75 |
+
{"role": "system", "content": "You are a helpful software development assistant"},
|
76 |
+
{"role": "user", "content": "can you write a python function that adds 3 numbers together?"},
|
77 |
+
]
|
78 |
+
pipe = pipeline("text-generation", model="thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")
|
79 |
+
print(pipe(messages))
|
80 |
+
```
|
81 |
|
82 |
+
### Prompt Template:
|
83 |
+
```python
|
84 |
+
<|im_start|>system
|
85 |
+
{system}<|im_end|>
|
86 |
+
<|im_start|>user
|
87 |
+
{user}<|im_end|>
|
88 |
+
<|im_start|>assistant
|
89 |
+
{assistant}
|
90 |
+
```
|
91 |
|
92 |
+
## Training Details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
+
### Training Data
|
95 |
|
96 |
+
The model was trained using the 5star split from the AetherCode-v1 dataset, designed for enhancing coding-related AI capabilities.
|
97 |
|
|
|
98 |
|
99 |
+
### Training Procedure
|
100 |
|
101 |
+
Training regime: The model was trained for 3 epochs on an RTX 4500 using Supervised Fine-Tuning (SFT)
|
102 |
|
103 |
+
#### Preprocessing [optional]
|
104 |
|
105 |
+
Standard preprocessing techniques were applied to prepare the code data for training.
|