mnli
This model is a fine-tuned version of allenai/scibert_scivocab_uncased on the GLUE MNLI dataset. It achieves the following results on the evaluation set:
- Loss: 0.4917
- Accuracy: 0.8345
Model description
This is the pretrained model presented in SciBERT: A Pretrained Language Model for Scientific Text, which is a BERT model trained on scientific text, then finetuned on GLUE MNLI for zero-shot classification.
The training corpus was papers taken from Semantic Scholar. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus.
Intended uses & limitations
Zero-shot classification of scientific texts. Note that this model is outperformed by multiple models and was uploaded for research purposes. For actually classifying scientific text, I recommend looking into Deberta v3 Large tuned on MNLI which according to my benchmark on abstracts performs best at current date (7/10/22).
Training and evaluation data
GLUE MNLI
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.22.2
- Pytorch 1.11.0+cu113
- Datasets 2.5.1
- Tokenizers 0.12.1
If using these models, please cite the following paper:
@inproceedings{beltagy-etal-2019-scibert,
title = "SciBERT: A Pretrained Language Model for Scientific Text",
author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
booktitle = "EMNLP",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1371"
}
- Downloads last month
- 13
Model tree for kauffinger/scibert_scivocab_uncased-mnli
Base model
allenai/scibert_scivocab_uncased