Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is further trained on top of scibert-base using masked language modeling loss (MLM). The corpus is roughly abstracts from 270,000 earth science-based publications.

The tokenizer used is AutoTokenizer, which is trained on the same corpus.

Stay tuned for further downstream task tests and updates to the model.

in the works

  • MLM + NSP task loss
  • Add more data sources for training
  • Test using downstream tasks
Downloads last month
17
Safetensors
Model size
110M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.