mav23 commited on
Commit
7deafd3
1 Parent(s): 066c675

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +226 -0
  3. cybertron-v4-qw7b-mgs.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ cybertron-v4-qw7b-mgs.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: qwen
4
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
+ datasets:
6
+ - Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1
7
+ base_model:
8
+ - Qwen/Qwen2.5-7B-Instruct
9
+ library_name: transformers
10
+ tags:
11
+ - generated_from_trainer
12
+ language:
13
+ - en
14
+ model-index:
15
+ - name: cybertron-v4-qw7B-MGS
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: IFEval (0-Shot)
22
+ type: HuggingFaceH4/ifeval
23
+ args:
24
+ num_few_shot: 0
25
+ metrics:
26
+ - type: inst_level_strict_acc and prompt_level_strict_acc
27
+ value: 62.64
28
+ name: strict accuracy
29
+ source:
30
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
31
+ name: Open LLM Leaderboard
32
+ - task:
33
+ type: text-generation
34
+ name: Text Generation
35
+ dataset:
36
+ name: BBH (3-Shot)
37
+ type: BBH
38
+ args:
39
+ num_few_shot: 3
40
+ metrics:
41
+ - type: acc_norm
42
+ value: 37.04
43
+ name: normalized accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: MATH Lvl 5 (4-Shot)
52
+ type: hendrycks/competition_math
53
+ args:
54
+ num_few_shot: 4
55
+ metrics:
56
+ - type: exact_match
57
+ value: 27.72
58
+ name: exact match
59
+ source:
60
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: GPQA (0-shot)
67
+ type: Idavidrein/gpqa
68
+ args:
69
+ num_few_shot: 0
70
+ metrics:
71
+ - type: acc_norm
72
+ value: 8.05
73
+ name: acc_norm
74
+ source:
75
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
76
+ name: Open LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: MuSR (0-shot)
82
+ type: TAUR-Lab/MuSR
83
+ args:
84
+ num_few_shot: 0
85
+ metrics:
86
+ - type: acc_norm
87
+ value: 13.2
88
+ name: acc_norm
89
+ source:
90
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: MMLU-PRO (5-shot)
97
+ type: TIGER-Lab/MMLU-Pro
98
+ config: main
99
+ split: test
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 38.59
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
108
+ name: Open LLM Leaderboard
109
+ ---
110
+
111
+ # cybertron-v4-qw7B-MGS
112
+
113
+ **WE ARE BACK** Cybertron v4, #1 LLM in its class. Based on the amazing Qwen2.5 7B
114
+
115
+ **Scoring #1 LLM of 7B and 8B at 30.10.2024.**
116
+
117
+ ![cybertron-v4-MGS](https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS/resolve/main/cybertron_v4MGS.png)
118
+
119
+ Here we use our novel approach called `MGS`. Its up to you to figure out what it means.
120
+
121
+ Cybertron V4 went thru SFT over `Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1`
122
+
123
+ ## Quantz
124
+ Avaialble at https://huggingface.co/bartowski/cybertron-v4-qw7B-MGS-GGUF
125
+
126
+ ## MGS
127
+ Being fair:
128
+
129
+ https://arxiv.org/pdf/2410.21228
130
+
131
+ MGS, among other things.. a strategy of tackling corpora forgetful.
132
+
133
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
134
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__cybertron-v4-qw7B-MGS)
135
+
136
+ | Metric |Value|
137
+ |-------------------|----:|
138
+ |Avg. |31.21|
139
+ |IFEval (0-Shot) |62.64|
140
+ |BBH (3-Shot) |37.04|
141
+ |MATH Lvl 5 (4-Shot)|27.72|
142
+ |GPQA (0-shot) | 8.05|
143
+ |MuSR (0-shot) |13.20|
144
+ |MMLU-PRO (5-shot) |38.59|
145
+
146
+ ## Try Cybertron v4!
147
+
148
+ Thanks to @rombodawg for contributing with a free to use Inference space hosted at:
149
+
150
+ https://huggingface.co/spaces/rombodawg/Try_fblgit_cybertron-v4-qw7B-MGS
151
+
152
+ ## Training procedure
153
+ 1 Epoch as usual.
154
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
155
+
156
+ ### Training hyperparameters
157
+
158
+ The following hyperparameters were used during training:
159
+ - seed: 42
160
+ - distributed_type: multi-GPU
161
+ - num_devices: 8
162
+ - total_train_batch_size: 128
163
+ - total_eval_batch_size: 16
164
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
165
+ - num_epochs: 1
166
+
167
+ ### Training results
168
+
169
+ | Training Loss | Epoch | Step | Validation Loss |
170
+ |:-------------:|:------:|:----:|:---------------:|
171
+ | 0.7405 | 0.0007 | 1 | 0.5760 |
172
+ | 0.6146 | 0.0502 | 71 | 0.5045 |
173
+ | 0.5908 | 0.1003 | 142 | 0.4930 |
174
+ | 0.5669 | 0.1505 | 213 | 0.4854 |
175
+ | 0.5575 | 0.2007 | 284 | 0.4811 |
176
+ | 0.535 | 0.2508 | 355 | 0.4765 |
177
+ | 0.5161 | 0.3010 | 426 | 0.4736 |
178
+ | 0.5268 | 0.3511 | 497 | 0.4726 |
179
+ | 0.5119 | 0.4013 | 568 | 0.4701 |
180
+ | 0.5329 | 0.4515 | 639 | 0.4687 |
181
+ | 0.5167 | 0.5016 | 710 | 0.4673 |
182
+ | 0.5105 | 0.5518 | 781 | 0.4660 |
183
+ | 0.5203 | 0.6020 | 852 | 0.4653 |
184
+ | 0.5035 | 0.6521 | 923 | 0.4646 |
185
+ | 0.4903 | 0.7023 | 994 | 0.4641 |
186
+ | 0.5031 | 0.7525 | 1065 | 0.4628 |
187
+ | 0.5147 | 0.8026 | 1136 | 0.4629 |
188
+ | 0.5037 | 0.8528 | 1207 | 0.4620 |
189
+ | 0.5029 | 0.9029 | 1278 | 0.4620 |
190
+ | 0.492 | 0.9531 | 1349 | 0.4621 |
191
+
192
+
193
+ ### Framework versions
194
+
195
+ - PEFT 0.13.2
196
+ - Transformers 4.45.2
197
+ - Pytorch 2.3.0+cu121
198
+ - Datasets 3.0.1
199
+ - Tokenizers 0.20.1
200
+
201
+ ## Citations
202
+ ```
203
+ @misc{thebeagle-v2,
204
+ title={TheBeagle v2: MGS},
205
+ author={Xavier Murias},
206
+ year={2024},
207
+ publisher = {HuggingFace},
208
+ journal = {HuggingFace repository},
209
+ howpublished = {\url{https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS}},
210
+ }
211
+
212
+ @misc{qwen2.5,
213
+ title = {Qwen2.5: A Party of Foundation Models},
214
+ url = {https://qwenlm.github.io/blog/qwen2.5/},
215
+ author = {Qwen Team},
216
+ month = {September},
217
+ year = {2024}
218
+ }
219
+
220
+ @article{qwen2,
221
+ title={Qwen2 Technical Report},
222
+ author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
223
+ journal={arXiv preprint arXiv:2407.10671},
224
+ year={2024}
225
+ }
226
+ ```
cybertron-v4-qw7b-mgs.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a547d9f7478dec0db228471d2ed09ea3e6d4480872fdc81225499271bb1b88ba
3
+ size 4431391168