tsunemoto
/

TinyLlama-1.1B-Chat-v0.6-x8-MoE-GGUF

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Tsunemoto GGUF's of TinyLlama-1.1B-Chat-v0.6-x8-MoE

This is a GGUF quantization of TinyLlama-1.1B-Chat-v0.6-x8-MoE.

Original Repo Link:

Original Repository

Original Model Card:

x8 MoE of https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6

Downloads last month: 187

GGUF

Model size

6.43B params

Architecture

llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API

Unable to determine this model's library. Check the docs .