Navaneeth Malingan's picture
2 10

Navaneeth Malingan

nivu

AI & ML interests

None yet

Recent Activity

Organizations

nivu's activity

upvoted an article 2 months ago
view article
Article

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

β€’ 119
liked a Space 3 months ago
Reacted to macadeliccc's post with πŸ‘ 10 months ago
view post
Post
Benefits of imatrix quantization in place of quip#

Quip-# is a quantization method proposed by [Cornell-RelaxML](https://github.com/Cornell-RelaxML) that claims tremendous performance gains using only 2-bit precision.

RelaxML proposes that quantizing a model from 16 bit to 2 bit precision they can utilize Llama-2-70B on a single 24GB GPU.

QuIP# aims to revolutionize model quantization through a blend of incoherence processing and advanced lattice codebooks. By switching to a Hadamard transform-based incoherence approach, QuIP# enhances GPU efficiency, making weight matrices more Gaussian-like and ideal for quantization with its improved lattice codebooks.

This new method has already seen some adoption by projects like llama.cpp. The use of the Quip-# methodology has been implemented in the form of imatrix calculations. The importance matrix is calculated from a dataset such as wiki.train.raw and will output the perplexity on the given dataset.

This interim step can improve the results of the quantized model. If you would like to explore this process for yourself:

llama.cpp - https://github.com/ggerganov/llama.cpp/
Quip# paper - https://cornell-relaxml.github.io/quip-sharp/
AutoQuip# colab - https://colab.research.google.com/drive/1rPDvcticCekw8VPNjDbh_UcivVBzgwEW?usp=sharing

Other impressive quantization projects to watch:
+ AQLM
https://github.com/Vahe1994/AQLM
https://arxiv.org/abs/2401.06118
liked a Space about 3 years ago