Merge branch 'main' of https://huggingface.co/THUDM/glm-4-9b

Files changed (3) hide show

LICENSE CHANGED Viewed

@@ -45,7 +45,7 @@ The glm-4-9b License
 2. License
-Subject to the terms and conditions of this License, Licensor hereby grants you a non-exclusive, worldwide, irrevocable, non-sublicensable, revocable, photo-free copyright license.
 This license allows you to use all open source models in this repository for free for academic research. For users who wish to use the models for commercial purposes, please do so [here](https://open.bigmodel.cn/mla/form)
 Complete registration. Registered users are free to use this model for commercial activities, but must comply with all terms and conditions of this license.
 The copyright notice and this license notice shall be included in all copies or substantial portions of the Software.

 2. License
+Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
 This license allows you to use all open source models in this repository for free for academic research. For users who wish to use the models for commercial purposes, please do so [here](https://open.bigmodel.cn/mla/form)
 Complete registration. Registered users are free to use this model for commercial activities, but must comply with all terms and conditions of this license.
 The copyright notice and this license notice shall be included in all copies or substantial portions of the Software.

README.md CHANGED Viewed

@@ -2,15 +2,15 @@
 license: other
 license_name: glm-4
 license_link: https://huggingface.co/THUDM/glm-4-9b/LICENSE
 language:
-  - zh
-  - en
 tags:
-  - glm
-  - chatglm
-  - thudm
 inference: false
 ---
 # GLM-4-9B
@@ -62,5 +62,4 @@ GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
   pages={320--335},
   year={2022}
 }
-```

 license: other
 license_name: glm-4
 license_link: https://huggingface.co/THUDM/glm-4-9b/LICENSE
 language:
+- zh
+- en
 tags:
+- glm
+- chatglm
+- thudm
 inference: false
+pipeline_tag: text-generation
 ---
 # GLM-4-9B
   pages={320--335},
   year={2022}
 }
+```

modeling_chatglm.py CHANGED Viewed

@@ -253,15 +253,12 @@ class CoreAttention(torch.nn.Module):
             # This is actually dropping out entire tokens to attend to, which might
             # seem a bit unusual, but is taken from the original Transformer paper.
             attention_probs = self.attention_dropout(attention_probs)
-            # =========================
-            # Context layer. [sq, b, hp]
-            # =========================
-            # value_layer -> context layer.
-            # [sk, b, np, hn] --> [b, np, sq, hn]
             # context layer shape: [b, np, sq, hn]
-            output_size = (value_layer.size(1), value_layer.size(2), query_layer.size(0), value_layer.size(3))
             # change view [b * np, sk, hn]
             value_layer = value_layer.view(output_size[0] * output_size[1], value_layer.size(2), -1)
             # change view [b * np, sq, sk]

             # This is actually dropping out entire tokens to attend to, which might
             # seem a bit unusual, but is taken from the original Transformer paper.
             attention_probs = self.attention_dropout(attention_probs)
+            # query layer shape: [b * np, sq, hn]
+            # value layer shape: [b, np, sk, hn]
+            # attention shape: [b, np, sq, sk]
             # context layer shape: [b, np, sq, hn]
+            output_size = (value_layer.size(0), value_layer.size(1), query_layer.size(1), value_layer.size(3))
             # change view [b * np, sk, hn]
             value_layer = value_layer.view(output_size[0] * output_size[1], value_layer.size(2), -1)
             # change view [b * np, sq, sk]