--- license: unknown datasets: - ncbi/pubmed language: - en metrics: - f1 base_model: - microsoft/deberta-v3-base pipeline_tag: token-classification tags: - NER - phenotypes - diseases - bio - classification --- **Model Summary and Training Details** ### Model Architecture - **Base Model**: `microsoft/deberta-v3-base` - **Task**: Token Classification for Named Entity Recognition (NER) with a focus on disease entities. - **Number of Labels**: 3 (O, B-Disease, I-Disease) ### Dataset - **Dataset**: NCBI Disease Corpus - **Description**: The NCBI Disease corpus is a specialized medical dataset that includes 793 PubMed abstracts. It is structured to help in identifying disease mentions within scientific literature, and each mention is annotated with disease concepts from the MeSH (Medical Subject Headings) or OMIM (Online Mendelian Inheritance in Man) databases. - **Split**: - Training Set: 593 abstracts - Development (Validation) Set: 100 abstracts - Test Set: 100 abstracts ### Training Details - **Training Steps**: The model was trained using a cross-entropy loss function for token classification tasks. To optimize performance, we used gradient accumulation to achieve a stable loss and improve resource efficiency. - **Gradient Accumulation**: 2 steps - **Batch Size**: 8 - **Device**: Trained on a GPU if available, using mixed-precision training for better performance. ### Optimizer and Learning Rate Scheduler - **Optimizer**: AdamW - **Learning Rate**: 1e-5 - **Betas**: (0.9, 0.999) - **Epsilon**: 1e-8 - **Learning Rate Scheduler**: Cosine Scheduler with Warmup - **Warmup Steps**: 10% of total training steps - **Total Training Steps**: Calculated as `len(train_loader) * num_epochs` ### Epochs and Validation - **Epochs**: 5 - **Training and Validation Loss**: The model achieved a stable loss over 5 epochs, with the best validation loss recorded. The best model based on validation loss was saved for evaluation. ### Evaluation and Performance - **Test Dataset F1 Score**: 0.9772 - **Evaluation Metric**: F1 score, which indicates the balance between precision and recall, was used as the primary metric to assess the model’s performance.