KnutJaegersberg
/

Deita-34b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

datasets?

#1

by ehartford - opened May 17

May 17

In which datasets is this trained?

KnutJaegersberg

Owner May 22

the same as allways, the 6k version of Deita research paper, but I tried to filter out Chinese records.
I've linked the dataset now.

KnutJaegersberg changed discussion status to closed May 22

May 22

I don't see a link to the Deita research paper

KnutJaegersberg

Owner May 22

I've linked to the github in the dataset

KnutJaegersberg

Owner May 22

https://github.com/hkust-nlp/deita

KnutJaegersberg

Owner May 22

I've picked Deita because it performs well for its seize, is based on mostly multiturn conversations and those are very long. It's very flexible, when I can I try to fine tune over the maximum context length my system can bear. It's practical.

KnutJaegersberg

Owner May 22

it's an AI filtered subset of ultrachat, I think.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment