Safetensors
qwen2
Edit model card

EVA-D Qwen2.5-1.5B v0.0

An experimental online logit distillation of EVA-Qwen2.5-14B-v0.1 into Qwen2.5-1.5B. Should work as a RP/storywriting specialist, but don't expect superb performance from it, due to it's small size. All in all, it was a fun experiment to do.

Note: using quantized KV cache with Qwen2.5 is not recommended and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.

Prompt format is ChatML.


Recommended sampler values:

  • Temperature: 1
  • Min-P: 0.02

Recommended SillyTavern presets (via CalamitousFelicitousness):


Distillation data:

  • Arcee.AI's EvolKit-20k dataset, which is specifically made for knowledge distillation purposes.

Training time and hardware:

  • 1.8 hours on 8xA100 SXM, provided by Garg

Model was trained by Kearm and Auri.

Special thanks:

  • to Garg for generously providing 8xA100 SXM node for this experiment!
  • to Arcee.AI for creating DistillKit and EvolKit-20k dataset, which were used to create this model.
  • and to Allura-org for support and feedback on EVA models.
Downloads last month
29
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for EVA-UNIT-01/EVA-D-Qwen2.5-1.5B-v0.0

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(24)
this model

Dataset used to train EVA-UNIT-01/EVA-D-Qwen2.5-1.5B-v0.0

Collection including EVA-UNIT-01/EVA-D-Qwen2.5-1.5B-v0.0