raynardj
/

roberta-pubmed

Inference Endpoints

Model card Files Files and versions Community

roberta-pubmed / README.md

raynardj's picture

Update README.md

58d6399 about 3 years ago

|

history blame contribute delete

1.32 kB

	---
	language:
	- en
	tags:
	- pubmed
	- cancer
	- gene
	- clinical trial
	- bioinformatic
	license: apache-2.0
	datasets:
	- pubmed
	widget:
	- text: "The <mask> effects of hyperatomarin"
	---

	# Roberta-Base fine-tuned on [PubMed](https://pubmed.ncbi.nlm.nih.gov/) Abstract
	> We limit the training textual data to the following [MeSH](https://www.ncbi.nlm.nih.gov/mesh/)
	* All the child MeSH of ```Biomarkers, Tumor(D014408)```, including things like ```Carcinoembryonic Antigen(D002272)```
	* All the child MeSH of ```Carcinoma(D002277)```, including things like all kinds of carcinoma: like ```Carcinoma, Lewis Lung(D018827)``` etc. around 80 kinds of carcinoma
	* All the child MeSH of ```Clinical Trial(D016439)```
	* The training text file amounts to 531Mb
	## Training
	* Trained on language modeling task, with ```mlm_probability=0.15```, on 2 Tesla V100 32G
	```python
	training_args = TrainingArguments(
	output_dir=config.save, #select model path for checkpoint
	overwrite_output_dir=True,
	num_train_epochs=3,
	per_device_train_batch_size=30,
	per_device_eval_batch_size=60,
	evaluation_strategy= 'steps',
	save_total_limit=2,
	eval_steps=250,
	metric_for_best_model='eval_loss',
	greater_is_better=False,
	load_best_model_at_end =True,
	prediction_loss_only=True,
	report_to = "none")
	```