It seems far more common with internet data that we have multi-speaker/group discussions with a dynamic number of speakers. This also seems to be more realistic to the real world too and requires a bit more understanding to model.
Is there some research into this? I have some ideas of how I'd like to implement it, but I wonder if some work has already been done here?