datasets?
#1
by
ehartford
- opened
In which datasets is this trained?
the same as allways, the 6k version of Deita research paper, but I tried to filter out Chinese records.
I've linked the dataset now.
KnutJaegersberg
changed discussion status to
closed
I don't see a link to the Deita research paper
I've linked to the github in the dataset
I've picked Deita because it performs well for its seize, is based on mostly multiturn conversations and those are very long. It's very flexible, when I can I try to fine tune over the maximum context length my system can bear. It's practical.
it's an AI filtered subset of ultrachat, I think.