Model breaks at full context

by MarinaraSpaghetti - opened Jan 15

Jan 15

This model holds so much potential, but sadly, it breaks for me. I’m using the 6.5 quant at full 32k context and it spews nonsense (like repeating one letter, for example). This happens with basically all MOE models aside from the basic Mixtral Instruct. Has anyone else faced the same issue? I’m using Oobabooga for loading, SillyTavern as frontend and I only use Temperature and Min P to control the output. Thank you in advance for help!

bartowski

Owner Jan 15

Saw I assume your post on Reddit, looks like even though they set the config.json value to 32k it might have been a stretch, the base models are all much lower so it's odd they'd push it so far with the merge.. shame it's not working out for huge context! Wish I could provide further help but I think it's just not meant for it

MarinaraSpaghetti

Jan 17

Yes, that was my post! The authors of the model reached our to me on Reddit and let me know that the context for the model is 8k - they will add this to their model card info. Thank you for your reply regardless, super sweet of you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment