You can try to convert DeepSeek-V2.5 or Llama-3.1-Nemotron-70B-Instruct-HF?
You can try to convert
DeepSeek-V2.5 or
Llama-3.1-Nemotron-70B-Instruct-HF?
We with my friend wanna do this with LLama-3.1-70B-Instruct (we have resources to finetune), but I don't know how to quantize 70B model to 1.58 bit.
@medmekk
do you have a some script or anything to convert a model from llama arch to llamabitnet arch? I mean, efficient quantization of linear layers.
To convert a model from llama arch to llama bitnet you can just check this PR : https://github.com/huggingface/nanotron/pull/180/files, check how the linears are defined TensorParallelColumnLinearBitNet and TensorParallelRowLinearBitNet (don't worry about the two linear row & column types, it has to do with Tensor parallelism) with the fake quantization. Also you can check this handbook from microsoft it helped me a lot : https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf