gobean/WizardLM-2-7B.llamafile

This is a llamafile for WizardLM-2-7B. Converted and tested on 4/15/2024. Safetensors came from microsoft's hf, quantized with llama.cpp, zipaligned with llamafile.

The q3-k-l sized quant is under 4gb if you want something to share with your windows-only users. Quality is higher than the average high school student.

Instructions to run q3-k-l on Windows: Just download, add '.exe' to the filename, and open it. Bypass all friendly Microsoft warnings about using your own computer. It doesn't need network access, completely local.

Put it on a keychain! Share with friends! Perfect gift for significant other!

other usage notes: Anything larger than the q3-k-l is going to be over 4gb and won't run as an .exe in Windows. You'll need to use WSL, or another operating system.

WSL: If you get the error about APE, and the recommended command

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop'

doesn't work, the WSLInterop file might be named something else. I had success with

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop-late'

If that fails too, just navigate to /proc/sys/fs/binfmt_msc and see what files look like WSLInterop and echo a -1 to whatever they're called by changing that part of the recommended command.

size note: Use q8_0, it's good.

-= Llamafile =-

Llamafiles are a standalone executable that run an LLM server locally on a variety of operating systems including FreeBSD, Windows, Windows via WSL, Linux, and Mac. The same file works everywhere, I've tested several of these on FreeBSD, Windows, Windows via WSL, and Linux. You just download the .llamafile, (chmod +x or rename to .exe as needed), run it, open the chat interface in a browser, and interact. Options can be passed in to expose the api etc. See their docs for details.

Mozilla Blog Announcement for Llamafile

Windows: I tried the q3-k-l, it works.
FreeBSD note: Yes, it actually works on a fresh install of FreeBSD.