Difference from MK2
#3
by
mrfakename
- opened
Hi, how is this different from the mk2 version?
This version was trained on longer sequences (16384 tokens vs. 4192 tokens), in addition to this I processed the individual stories in the datasets into 16k token sequences whereas for mk1 they were left plain resulting in them being trimmed.
PocketDoc
changed discussion status to
closed
PocketDoc
changed discussion status to
open