relaxml
/

Llama-3.1-405B-Instruct-QTIP-4Bit-TP8

Model card Files Files and versions Community

Llama-3.1-405B-Instruct-QTIP-4Bit-TP8 / README.md

at676's picture

Create README.md

d975892 verified 30 days ago

|

history blame contribute delete

267 Bytes

	This model is compatible with tensor parallelism. The RHT runs per-GPU instead of across GPUs. q, k, v, up, and gate are split along the output channel, and o and down are split along the input channel.
	This model has slightly worse quality than the non "TP8" model.