TeeZee commited on
Commit
dba771d
1 Parent(s): dfda984

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+
8
+ ---
9
+ # 2x_bagel-34b-v0.2
10
+
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
+
13
+ ## Merge Details
14
+ ### Merge Method
15
+
16
+ This model was merged using the passthrough merge method.
17
+
18
+ ### Models Merged
19
+
20
+ The following models were included in the merge:
21
+ * ./jondurbin_bagel-34b-v0.2
22
+
23
+ ### Configuration
24
+
25
+ The following YAML configuration was used to produce this model:
26
+
27
+ ```yaml
28
+ dtype: float32
29
+ merge_method: passthrough
30
+ slices:
31
+ - sources:
32
+ - layer_range: [0, 20]
33
+ model: "jondurbin_bagel-34b-v0.2"
34
+ parameters:
35
+ scale:
36
+ - filter: q_proj
37
+ value: 0.7071067812
38
+ - filter: k_proj
39
+ value: 0.7071067812
40
+ - value: 1
41
+ - sources:
42
+ - layer_range: [10, 30]
43
+ model: "jondurbin_bagel-34b-v0.2"
44
+ parameters:
45
+ scale:
46
+ - filter: q_proj
47
+ value: 0.7071067812
48
+ - filter: k_proj
49
+ value: 0.7071067812
50
+ - value: 1
51
+ - sources:
52
+ - layer_range: [20, 40]
53
+ model: "jondurbin_bagel-34b-v0.2"
54
+ parameters:
55
+ scale:
56
+ - filter: q_proj
57
+ value: 0.7071067812
58
+ - filter: k_proj
59
+ value: 0.7071067812
60
+ - value: 1
61
+ - sources:
62
+ - layer_range: [30, 50]
63
+ model: "jondurbin_bagel-34b-v0.2"
64
+ parameters:
65
+ scale:
66
+ - filter: q_proj
67
+ value: 0.7071067812
68
+ - filter: k_proj
69
+ value: 0.7071067812
70
+ - value: 1
71
+ - sources:
72
+ - layer_range: [40, 60]
73
+ model: "jondurbin_bagel-34b-v0.2"
74
+ parameters:
75
+ scale:
76
+ - filter: q_proj
77
+ value: 0.7071067812
78
+ - filter: k_proj
79
+ value: 0.7071067812
80
+ - value: 1
81
+ name: 2xbagel_fp32
82
+ ---
83
+ dtype: bfloat16
84
+ merge_method: passthrough
85
+ slices:
86
+ - sources:
87
+ - layer_range: [0, 100]
88
+ model: 2xbagel_fp32
89
+ name: bagel_new
90
+
91
+ ```