GGUF
Inference Endpoints
conversational
aashish1904's picture
Upload README.md with huggingface_hub
a878ffa verified
metadata
license: apache-2.0
datasets:
  - arcee-ai/EvolKit-20k
base_model:
  - Qwen/Qwen2.5-1.5B

QuantFactory Banner

QuantFactory/EVA-D-Qwen2.5-1.5B-v0.0-GGUF

This is quantized version of EVA-UNIT-01/EVA-D-Qwen2.5-1.5B-v0.0 created using llama.cpp

Original Model Card

EVA-D Qwen2.5-1.5B v0.0

An experimental online logit distillation of EVA-Qwen2.5-14B-v0.1 into Qwen2.5-1.5B. Should work as a RP/storywriting specialist, but don't expect superb performance from it, due to it's small size. All in all, it was a fun experiment to do.

Note: using quantized KV cache with Qwen2.5 is not recommended and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.

Prompt format is ChatML.


Recommended sampler values:

  • Temperature: 1
  • Min-P: 0.02

Recommended SillyTavern presets (via CalamitousFelicitousness):


Distillation data:

  • Arcee.AI's EvolKit-20k dataset, which is specifically made for knowledge distillation purposes.

Training time and hardware:

  • 1.8 hours on 8xA100 SXM, provided by Garg

Model was trained by Kearm and Auri.

Special thanks:

  • to Garg for generously providing 8xA100 SXM node for this experiment!
  • to Arcee.AI for creating DistillKit and EvolKit-20k dataset, which were used to create this model.
  • and to Allura-org for support and feedback on EVA models.