desc_act?

#5
by jackboot - opened

I notice that 128g is used without desc_act. In oobaboga tests it was proven that group size alone caused higher perplexity than no grouping at all. I realize that using groups + desc_act makes the model less compatible with autogptq and other hardware, but this would be the worst of both worlds. The higher memory usage and the higher perplexity.

I think that uploading one model with only desc_act and no groups and another model with desc_act + 128g would be ideal to cover both cases and still have some benefit.

What do you think? I know it's not a big difference and I'm happy this exists at all.

Sign up or log in to comment