HF1BitLLM/Llama3-8B-1.58-100B-tokens · You can try to convert DeepSeek-V2.5 or Llama-3.1-Nemotron-70B-Instruct-HF?

28 days ago

You can try to convert
DeepSeek-V2.5 or
Llama-3.1-Nemotron-70B-Instruct-HF?

28 days ago

We with my friend wanna do this with LLama-3.1-70B-Instruct (we have resources to finetune), but I don't know how to quantize 70B model to 1.58 bit.
@medmekk do you have a some script or anything to convert a model from llama arch to llamabitnet arch? I mean, efficient quantization of linear layers.

medmekk

Hugging Face 1Bit LLMs org 26 days ago

To convert a model from llama arch to llama bitnet you can just check this PR : https://github.com/huggingface/nanotron/pull/180/files, check how the linears are defined TensorParallelColumnLinearBitNet and TensorParallelRowLinearBitNet (don't worry about the two linear row & column types, it has to do with Tensor parallelism) with the fake quantization. Also you can check this handbook from microsoft it helped me a lot : https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf