Josephgflowers
commited on
Commit
•
bdb6c63
1
Parent(s):
28e0db9
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,10 @@ Continued training for healing consisted of around 58860 steps full training on
|
|
19 |
|
20 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/DoIsuqN_p_9fx0f1v5XUb.png)
|
21 |
|
22 |
-
|
|
|
|
|
|
|
23 |
|
24 |
Each of these modules is integrated into the model’s modified decoder layer (ModifiedLlamaDecoderLayer). Here’s a high-level outline of the sequence in which they operate within the decoder:
|
25 |
|
@@ -40,7 +43,8 @@ Let’s break down how these components contribute to the model’s overall perf
|
|
40 |
|
41 |
2.
|
42 |
|
43 |
-
|
|
|
44 |
|
45 |
|
46 |
Adaptive RMSNorm
|
@@ -75,7 +79,9 @@ Effect on Model: SEBlock helps the model emphasize or suppress specific features
|
|
75 |
|
76 |
Performance Impact: Boosts the model’s expressiveness by allowing it to dynamically adjust which features are most relevant for each input. This helps improve generalization, especially when handling varied inputs with different feature relevances, such as conversations with shifting topics.
|
77 |
|
78 |
-
3.
|
|
|
|
|
79 |
|
80 |
When these components work together, they create a model that is both flexible and context-aware. Here’s how they synergize and improve model performance:
|
81 |
|
@@ -87,7 +93,9 @@ Improved Stability and Efficiency: Adaptive RMSNorm stabilizes the model’s nor
|
|
87 |
|
88 |
Feature Recalibration and Channel Adaptation: SEBlock and Adaptive RMSNorm adapt the feature importance dynamically, giving the model a refined ability to select relevant information across channels and tokens. This can enhance interpretability and generalization across different types of inputs.
|
89 |
|
90 |
-
4.
|
|
|
|
|
91 |
|
92 |
Accuracy and Generalization: The adaptive and context-sensitive adjustments should help the model generalize better to unseen data, as it dynamically adapts to different contexts and feature relevances.
|
93 |
|
|
|
19 |
|
20 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/DoIsuqN_p_9fx0f1v5XUb.png)
|
21 |
|
22 |
+
|
23 |
+
1.
|
24 |
+
|
25 |
+
Overall Flow in the Model
|
26 |
|
27 |
Each of these modules is integrated into the model’s modified decoder layer (ModifiedLlamaDecoderLayer). Here’s a high-level outline of the sequence in which they operate within the decoder:
|
28 |
|
|
|
43 |
|
44 |
2.
|
45 |
|
46 |
+
|
47 |
+
Component-Level Contributions
|
48 |
|
49 |
|
50 |
Adaptive RMSNorm
|
|
|
79 |
|
80 |
Performance Impact: Boosts the model’s expressiveness by allowing it to dynamically adjust which features are most relevant for each input. This helps improve generalization, especially when handling varied inputs with different feature relevances, such as conversations with shifting topics.
|
81 |
|
82 |
+
3.
|
83 |
+
|
84 |
+
Combined Effects and Benefits on the Model
|
85 |
|
86 |
When these components work together, they create a model that is both flexible and context-aware. Here’s how they synergize and improve model performance:
|
87 |
|
|
|
93 |
|
94 |
Feature Recalibration and Channel Adaptation: SEBlock and Adaptive RMSNorm adapt the feature importance dynamically, giving the model a refined ability to select relevant information across channels and tokens. This can enhance interpretability and generalization across different types of inputs.
|
95 |
|
96 |
+
4.
|
97 |
+
|
98 |
+
Expected Performance Improvements
|
99 |
|
100 |
Accuracy and Generalization: The adaptive and context-sensitive adjustments should help the model generalize better to unseen data, as it dynamically adapts to different contexts and feature relevances.
|
101 |
|