Why gte model performs better than other models?

#2
by stevewyl - opened

I'm curious about how the GTE model achieves SOTA (State-of-the-Art) performance on a small-sized model. It seems that I couldn't find any related research papers. Could you please provide a brief introduction?

Sign up or log in to comment