"DolphinForCausalLM" OR "Qwen2ForCausalLM" ?

#3
by alielfilali01 - opened

Hey guys, this work is much appreciated and a great job i guess on the finetuning data.

What i don't understand is why introducing an architecture named as "DolphinForCausalLM" when it is not a novel arch nor pretrained but rather a finetuned version of Qwen2. I'm sure you have a good reason so basically my question is : WHAT IS THE DIFFERENCE BETWEEN "DolphinForCausalLM" AND "Qwen2ForCausalLM" ?

Side note : Dolphin serie of models is already there and i believe Cognitive Computations (https://huggingface.co/cognitivecomputations) are the one in charge of it. The current naming causes a little bit of confusion.

Best πŸ€—

Sorry, but it seems that it is a new arch. You can read the code here: https://huggingface.co/NexaAIDev/Dolphin/blob/main/modeling_dolphin.py
It has 2 decoder models in one LM, one as an encoder.

Nexa AI org
β€’
edited Sep 8

Hi, we are editing the name of the model to avoid the confusion haha. But the model is indeed a new arch. So, modelling file can't be the qwen2forcausalLM

Sign up or log in to comment