I'm curious about how the GTE model achieves SOTA (State-of-the-Art) performance on a small-sized model. It seems that I couldn't find any related research papers. Could you please provide a brief introduction?
· Sign up or log in to comment