Model Depot - ONNX
Collection
Leading Models packaged in ONNX format optimized for use with AI PCs
•
20 items
•
Updated
tiny-llama-chat-onnx is an ONNX int4 quantized version of TinyLlama-Chat, providing a very fast, very small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
tiny-llama-chat is the official chat finetuned version of tiny-llama.
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0