Loss function?
#10
by
narvind2003
- opened
My understanding is that the MoE model uses the same LM Loss like previous transformers. Is there any other aux losses used?
Please clarify or point me to the right file in the megablocks src. Thank you!
If you have a look at the official implementation (https://github.com/huggingface/transformers/blob/39acfe84ba330fda3ae72c083284a04cac8ac9e0/src/transformers/models/mixtral/modeling_mixtral.py#L76) you'll find some info :)
bjoernp
changed discussion status to
closed