datasets?

#1
by ehartford - opened

In which datasets is this trained?

the same as allways, the 6k version of Deita research paper, but I tried to filter out Chinese records.
I've linked the dataset now.

KnutJaegersberg changed discussion status to closed

I don't see a link to the Deita research paper

I've linked to the github in the dataset

I've picked Deita because it performs well for its seize, is based on mostly multiturn conversations and those are very long. It's very flexible, when I can I try to fine tune over the maximum context length my system can bear. It's practical.

it's an AI filtered subset of ultrachat, I think.

Sign up or log in to comment