@Sentdex on Hugging Face: "Working through the Reddit dataset, one thing that occurs to me is we pretty…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sentdex

posted an update Feb 20

Post

Working through the Reddit dataset, one thing that occurs to me is we pretty much always train LLMs to be a conversation between 2 parties like Bot/Human or Instruction/Response.

It seems far more common with internet data that we have multi-speaker/group discussions with a dynamic number of speakers. This also seems to be more realistic to the real world too and requires a bit more understanding to model.

Is there some research into this? I have some ideas of how I'd like to implement it, but I wonder if some work has already been done here?

Dewa

Feb 20

Aya-101 can help

pranav-deshpande

Feb 27

I was thinking exactly the same thing when ChatGPT first came out! I have run some minor experiments with causal language modeling by having a fixed number of users/speakers and then instruct fine-tuning the base/foundational model. "Dynamic number of speakers" sounds interesting, though! Maybe there is a clever way to inject new tokens into the vocabulary to achieve this.

Would love to contribute tothis initiative.

eliotz

Feb 28

I would imagine a method similar to Mistral's router could work (RL policy rewarding equal distribution between models)

Also there's this paper from pre-LLM craze that might be helpful. Would be interesting to see it implimented with more powerful language models

https://arxiv.org/pdf/1907.05507.pdf

D4ve-R

Mar 1

This paper Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models did some research into this.

Tonic

Apr 25

i've had "best" results mushing everything into a single context window with a single "final"/"next" answer , i think i remember @teknium saying they often do that and they may have published that research , but i cant speak for them, i just remember them saying that and feeling validated :-)

In this post