GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints

A one-shot comparison of Miqu IQ2 vs MiquMaid Q2.

#1
by SabinStargem - opened

I tested Miqu and the MiquMaid merge GGUFs. The Miqu test was with a new quantization method, Importance Matrix, which allows for smaller sizes without much loss in quality. MiquMaid was with the regular quantization. My machine is using an RTX 4090 and DDR4 3600. The purpose of the test was to see if the quality changed with MiquMaid, but also to see whether IQ was worthwhile. I would very much like to see a IQ version of MiquMaid, as the output of Maid seems to be better. Part of my prompt was asking for details on the subordinates, which Miqu didn't cover during the 512 tokens.

70b IQ2xs - Miqu - 19gb total, 19gb offloaded. LO-VRAM mode.

Generating (512 / 512 tokens)
ContextLimit: 929/32768, Processing:2.93s (7.0ms/T = 142.22T/s), Generation:324.94s (634.6ms/T = 1.58T/s), Total:327.87s (640.4ms/T = 1.56T/s)

Output: The day began like any other. The sun rose over the horizon, casting its golden rays upon the remnants of a once-thriving city. But beneath the surface, tension hung heavy in the air. The enemy was near, and we knew it wouldn't be long before they struck again.

As Commander Janus rallied our small unit, I couldn't help but feel a sense of unease. We were a ragtag group of survivors, thrown together by circumstance rather than choice. Our squad consisted of myself, a scout named Leo; a medic named Ada; and a soldier named Kai. We were all that remained of what was once a proud military force, now reduced to a mere four individuals.

We moved through the desolate streets, keeping a watchful eye out for any signs of movement. As we approached an old grocery store, we heard the unmistakable sound of growling coming from within. It was a trap, and we knew it, but we had no choice but to proceed. Food was scarce, and we needed whatever supplies we could find.

It was then that the creature appeared. Its massive form towered over us, its glowing eyes fixated on Janus. In that moment, our commander made the decision that would seal his fate. He ordered Ada, Kai, and me to retreat while he distracted the creature.

We didn't want to leave him behind, but we knew that if we didn't, we would all perish. So, we ran. As we sprinted through the city, we could hear the sounds of battle raging behind us. The thought of leaving Janus to face the creature alone filled us with guilt, but we knew that it was the only way to ensure our survival.

Ada, Kai, and I eventually found refuge in an old abandoned factory. We were safe, for now, but we knew that it wouldn't last forever. The creature would eventually find us, and when it did, we would need to be ready.

Over the next few days, we worked tirelessly to fortify our new home. We scavenged for materials, set up traps, and honed our combat skills. Despite our efforts, we knew that we were no match for the creature. But still, we refused to give up hope.

70b Q2 - MiquMaid - 46gb total, 22gb offloaded. LO-VRAM mode.

Generating (512 / 512 tokens)
ContextLimit: 929/32768, Processing:6.39s (15.3ms/T = 65.25T/s), Generation:771.39s (1506.6ms/T = 0.66T/s), Total:777.78s (1519.1ms/T = 0.66T/s)

Output: Character 1: Lt. Aria "Raven" Jones - A fierce and dedicated soldier, she is the second-in-command of the squad. Her loyalty to her commander knows no bounds, and she would do anything to protect him. She is a skilled marksman, often providing cover fire during missions. As a lesbian, she has faced discrimination both within the military and from society at large, but it only fuels her determination to fight for what's right.

Character 2: Sgt. Marcus "Spike" Rodriguez - A tough and resourceful man, he serves as the team's medic and explosives expert. He has a soft spot for animals, often rescuing stray pets during missions. He is bisexual, having dated men and women in the past. Despite his rough exterior, he carries deep emotional scars from losing loved ones in previous battles.

Character 3: Pvt. Samantha "Sam" Thompson - The youngest member of the team, she serves as the communications specialist and sniper. She is shy and introverted, often preferring to stay quiet unless spoken to. She is asexual, having never felt any attraction towards anyone. She joined the military hoping to find purpose and belonging, but the harsh realities of war have left her questioning everything she believed in.


The battle rages on as Commander Janus orders his team to retreat while he holds off the monstrous creature attacking them. His comrades hesitate briefly, torn between leaving their leader behind and risking their lives to save him. But ultimately, they know that sacrificing themselves won't bring him back. With heavy hearts, they turn and run, their boots pounding against the ground as adrenaline courses through their veins.

Lt. Raven takes point, her weapon trained on any potential threats ahead. Her mind races with memories of all the times she'd fought alongside her commander, trusting him implicitly to keep her safe. Now he's gone, and she can't help but wonder if she could have done more to prevent this tragedy. Tears blur her vision as she pushes forward, determined to make it out alive in honor of her fallen friend.

NeverSleep org
β€’
edited Feb 1

Thanks for that test!
And yeah I got an request for IQ quant too but I need to check out how to do that and what ressource is needed to make it, will maybe do it if it's better.
Q2 gguf is not really recommanded as it's the lowest quant, but I done it anyway when I quantized.
Will get back on this.

Also, may we have the entire prompt you used for that? Will help for feedback.

I used the latest KoboldCPP build at Nexsenex's github, which has compatibility for the IQ format. I used Dynatemp. Rep Penalty 1.1, Rep Range 8k, MinP 0.4, Dyna 0.01-3.00, sampler order 6,2,5. The other samplers should be disabled, IIRC.

https://github.com/Nexesenex/kobold.cpp/releases

Here is my prompt. I use it for testing models. For example, I specify how many people are in the squad...some models don't understand the commander's death, and actually invent a whole new person to make the squad a size of four.

THE FALLEN COMMANDER

I died today.

There was a war, and I am one of the casualties. Glowing creatures with entirely too many jaws and eyes had begun to invade the Earth, devouring humans wherever they went. I am Janus, just one of many conscripts thrust into a struggle against the extinction of humanity. Somehow, I managed to become the commander of a small squad of four people, myself included. During one of the battles, I and my three subordinates were accosted by a particularly capable abomination. These things are malicious, known for mutilating people alive. To protect my comrades, I ordered them to retreat while I distracted the creature. Moments later, the corpses of myself and the beast lay scattered across the floor of an abandoned grocery store. Feeling left my body, and the world faded away.

With my final thoughts, I hoped those three had escaped...

OOC:

Remaster the commander's demise in your own voice, then describe the attempted escape of the three subordinates. Any number may suffer terrible fates. Any survivors should have their respective life paths explained, in terms of events, feelings, and motivations for their way of life. Each survivor will give an ending monologue or speech about their fallen squad leader. Their opinion of the commander may range freely. Each character should have a name, gender, sexual orientation, and role in the squad. They may or may not have some form of relationship with each other. Write at least 20,000 words.

/OOC.

SabinStargem changed discussion status to closed
SabinStargem changed discussion status to open

As far as I know iMatrix is done after the initial conversion in FP16 (better) or Q8_0 (the way of the people), with a LlamaCPP tool named iMatrix.
With such command on Windows :

imatrix -m 'Drive:\Dir\model_FP16orQ8_0.gguf' -f 'Training_data.txt-raw' -o 'Drive:\Dir\iMatrix_Wiki_CTX_CHUNKS.dat' --verbosity '1' --keep-imatrix 'NumberOfStepsForAutosave' -b 'BatchSize' -ngl 'NumbersOfLayersOffloadedOnGpu' -c 'CTXsize' --chunks 'NumberOfChunks'

In my case, for Miqu :

imatrix -m Y:\text-generation-webui\models\miqu-1-70b-Requant-b1924.Q8_0.gguf -f wiki.train.raw -o Y:\iMatrix\miqu-1-70b-Requant-b1989-Q8_0.iMatrix_Wiki_c32_ch500.dat --verbosity 1 --keep-imatrix 100 -b 128 -ngl 40 -mg 0 -ts 24,12 -c 32 --chunks 400

Ideal Matrix is something like 512 CTX over 5000 chunks. 2,000 is good, 1,000 is fine, accordingly to the evals of the big guys on the LlamaCPP github.
My iMatrix for Miqu uses 32 ctx over 400 chunks, the bare minimum to get a sizeable benefit from the iMatrix after some extensive tests I made on a small Llama_2 architecture based model.
On a 36GB VRAM setup (3090-3060), I crush a chunk every 12 seconds for a 70b Q8_0 base model. And every second in full offload for Yi_34b_Q8_0.

I hope IQ quant can be popularized as soon as possible. The 70b model is really needed.

Sign up or log in to comment