Crystalcareai commited on
Commit
81c3cba
1 Parent(s): 1f10a29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -64
README.md CHANGED
@@ -1,15 +1,60 @@
1
- ---
2
  license: other
3
- base_model: Qwen/Qwen2-72B
 
 
 
4
  tags:
5
  - generated_from_trainer
6
- model-index:
7
- - name: qwen2-72b
8
- results: []
 
 
 
 
 
 
 
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
  <details><summary>See axolotl config</summary>
@@ -395,60 +440,3 @@ special_tokens:
395
 
396
  ```
397
 
398
- </details><br>
399
-
400
- # qwen2-72b
401
-
402
- This model is a fine-tuned version of [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B) on the None dataset.
403
- It achieves the following results on the evaluation set:
404
- - Loss: 0.4737
405
-
406
- ## Model description
407
-
408
- More information needed
409
-
410
- ## Intended uses & limitations
411
-
412
- More information needed
413
-
414
- ## Training and evaluation data
415
-
416
- More information needed
417
-
418
- ## Training procedure
419
-
420
- ### Training hyperparameters
421
-
422
- The following hyperparameters were used during training:
423
- - learning_rate: 1e-05
424
- - train_batch_size: 1
425
- - eval_batch_size: 1
426
- - seed: 42
427
- - distributed_type: multi-GPU
428
- - num_devices: 8
429
- - gradient_accumulation_steps: 8
430
- - total_train_batch_size: 64
431
- - total_eval_batch_size: 8
432
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
433
- - lr_scheduler_type: cosine
434
- - lr_scheduler_warmup_steps: 10
435
- - num_epochs: 3
436
-
437
- ### Training results
438
-
439
- | Training Loss | Epoch | Step | Validation Loss |
440
- |:-------------:|:------:|:----:|:---------------:|
441
- | 0.5458 | 0.0007 | 1 | 0.5673 |
442
- | 0.4634 | 0.5003 | 741 | 0.4655 |
443
- | 0.4466 | 1.0007 | 1482 | 0.4550 |
444
- | 0.3817 | 1.4835 | 2223 | 0.4587 |
445
- | 0.386 | 1.9838 | 2964 | 0.4540 |
446
- | 0.3258 | 2.4664 | 3705 | 0.4737 |
447
-
448
-
449
- ### Framework versions
450
-
451
- - Transformers 4.40.2
452
- - Pytorch 2.2.2+cu121
453
- - Datasets 2.19.1
454
- - Tokenizers 0.19.1
 
 
1
  license: other
2
+ license_name: tongyi-qianwen
3
+ license_link: >-
4
+ https://huggingface.co/Qwen/Qwen1.5-110B/blob/main/LICENSE
5
+ base_model: Qwen/Qwen1.5-110B
6
  tags:
7
  - generated_from_trainer
8
+ - axolotl
9
+ datasets:
10
+ - cognitivecomputations/Dolphin-2.9
11
+ - teknium/OpenHermes-2.5
12
+ - m-a-p/CodeFeedback-Filtered-Instruction
13
+ - cognitivecomputations/dolphin-coder
14
+ - cognitivecomputations/samantha-data
15
+ - microsoft/orca-math-word-problems-200k
16
+ - Locutusque/function-calling-chatml
17
+ - internlm/Agent-FLAN
18
  ---
19
 
20
+ # Dolphin 2.9.2 Qwen2 72B 🐬
21
+
22
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
23
+
24
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
25
+ Discord: https://discord.gg/cognitivecomputations
26
+
27
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
28
+
29
+ Our appreciation for the sponsors of Dolphin 2.9.2:
30
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
31
+
32
+ This model is based on Qwen2-72b, and is governed by [tongyi-qianwen license](LICENSE)
33
+
34
+ The base model has 32k context, and the full-weight fine-tuning was with 16k sequence length.
35
+
36
+ This model was trained FFT on parameters selected by [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py), using ChatML prompt template format.
37
+
38
+ example:
39
+
40
+ ```
41
+ <|im_start|>system
42
+ You are Dolphin, a helpful AI assistant.<|im_end|>
43
+ <|im_start|>user
44
+ {prompt}<|im_end|>
45
+ <|im_start|>assistant
46
+
47
+ ```
48
+
49
+ Dolphin-2.9.2 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
50
+
51
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
52
+
53
+ Dolphin is licensed according to Qwen's tongyi-qianwen license. We grant permission for any use, including commercial, that falls within accordance with said license. Dolphin was trained on data generated from GPT4, among other models.
54
+
55
+ ## Evals
56
+
57
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/U86Zu-MzLq4rECJRAAvgq.png)
58
 
59
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
60
  <details><summary>See axolotl config</summary>
 
440
 
441
  ```
442