Update modeling_moe_mistral.py
#1
by
bjoernp
- opened
No description provided.
Hi, I implemented it originally to follow https://github.com/stanford-futuredata/megablocks/blob/main/megablocks/layers/router.py#L57 which does softmax and then topk. Not sure which one is correct. Do you get better results with it?
Currently looking like better scores:
winogrande: 0.8019 -> 0.824
truthfulqa_mc2: 0.4406 -> 0.4855
arc_challenge: 0.6314 -> 0.6638
bjoernp
changed pull request status to
merged