Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
This model is compatible with tensor parallelism. The RHT runs per-GPU instead of across GPUs. q, k, v, up, and gate are split along the output channel, and o and down are split along the input channel.
|
2 |
+
This model has slightly worse quality than the non "TP8" model.
|