ai-soco-c++-roberta-small-clas
Model description
ai-soco-c++-roberta-small
model fine-tuned on AI-SOCO task.
How to use
You can use the model directly after tokenizing the text using the provided tokenizer with the model files.
Limitations and bias
The model is limited to C++ programming language only.
Training data
The model initialized from ai-soco-c++-roberta-small
model and trained using AI-SOCO dataset to do text classification.
Training procedure
The model trained on Google Colab platform using V100 GPU for 10 epochs, 32 batch size, 512 max sequence length (sequences larger than 512 were truncated). Each continues 4 spaces were converted to a single tab character (\t
) before tokenization.
Eval results
The model achieved 93.19%/92.88% accuracy on AI-SOCO task and ranked in the 4th place.
BibTeX entry and citation info
@inproceedings{ai-soco-2020-fire,
title = "Overview of the {PAN@FIRE} 2020 Task on {Authorship Identification of SOurce COde (AI-SOCO)}",
author = "Fadel, Ali and Musleh, Husam and Tuffaha, Ibraheem and Al-Ayyoub, Mahmoud and Jararweh, Yaser and Benkhelifa, Elhadj and Rosso, Paolo",
booktitle = "Proceedings of The 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020)",
year = "2020"
}