Do you plan to open-source the training code?
#1
by
adol01
- opened
This model is really good; it would be great if it could be open-sourced.
In the short term, we do not plan to open-source the training code. Our main focus remains on how to build better and more efficient models, which we will then open-source to the community.
The MLM pre-training code is adapted from Hugging Face code (run_mlm.py) to fit the large dataset, without too many additional modifications for optimization.
The contrastive learning code is similar to texttron/tevatron/, nomic-ai/contrastors, or FlagOpen/FlagEmbedding.