Navaneeth Malingan

nivu

https://nivu.me

navneetnivu07

AI & ML interests

None yet

Recent Activity

updated a model about 2 months ago

nivu/gemma2-Code-Instruct-Finetune-test

View all activity

Organizations

nivu's activity

updated a model about 2 months ago

nivu/gemma2-Code-Instruct-Finetune-test

Text Generation • Updated Oct 4 • 12

upvoted an article 2 months ago

Article

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Apr 19

• 119

liked a Space 3 months ago

Running

🌍

aaditya/Llama3-OpenBioLLM-70B

Text Generation • Updated May 9 • 9.53k • 351

liked a Space 4 months ago

Running on CPU Upgrade

295

🥇

Open Medical-LLM Leaderboard

liked a Space 9 months ago

Runtime error

📚

KadhalTensor

Reacted to macadeliccc's post with 👍 10 months ago

Post

Benefits of imatrix quantization in place of quip#

Quip-# is a quantization method proposed by [Cornell-RelaxML](https://github.com/Cornell-RelaxML) that claims tremendous performance gains using only 2-bit precision.

RelaxML proposes that quantizing a model from 16 bit to 2 bit precision they can utilize Llama-2-70B on a single 24GB GPU.

QuIP# aims to revolutionize model quantization through a blend of incoherence processing and advanced lattice codebooks. By switching to a Hadamard transform-based incoherence approach, QuIP# enhances GPU efficiency, making weight matrices more Gaussian-like and ideal for quantization with its improved lattice codebooks.

This new method has already seen some adoption by projects like llama.cpp. The use of the Quip-# methodology has been implemented in the form of imatrix calculations. The importance matrix is calculated from a dataset such as wiki.train.raw and will output the perplexity on the given dataset.

This interim step can improve the results of the quantized model. If you would like to explore this process for yourself:

llama.cpp - https://github.com/ggerganov/llama.cpp/
Quip# paper - https://cornell-relaxml.github.io/quip-sharp/
AutoQuip# colab - https://colab.research.google.com/drive/1rPDvcticCekw8VPNjDbh_UcivVBzgwEW?usp=sharing

Other impressive quantization projects to watch:
+ AQLM
https://github.com/Vahe1994/AQLM
https://arxiv.org/abs/2401.06118