at676 commited on
Commit
d975892
1 Parent(s): e9b0a07

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ This model is compatible with tensor parallelism. The RHT runs per-GPU instead of across GPUs. q, k, v, up, and gate are split along the output channel, and o and down are split along the input channel.
2
+ This model has slightly worse quality than the non "TP8" model.