Edit model card

GitHub issues classifier (using zero shot classification)

Predicts wether a statement is a feature request, issue/bug or question

This model was trained using the Zero-shot classifier distillation method with the BART-large-mnli model as teacher model, to train a classifier on Github issues from the Github Issues Prediction dataset

Labels

As per the dataset Kaggle competition, the classifier predicts wether an issue is a bug, feature or question. After playing around with different labels pre-training I've used a different mapping of labels that yielded better predictions (see notebook here for details), labels being

  • issue
  • feature request
  • question

Training data

  • 15k of Github issues titles ("unlabeled_titles_simple.txt")
  • Hypothesis used: "This request is a {}"
  • Teacher model used: valhalla/distilbart-mnli-12-1
  • Studend model used: distilbert-base-uncased

Results

Agreement of student and teacher predictions: 94.82%

See this notebook for more info on feature engineering choice made

How to train using your own dataset

Acknowledgements

Downloads last month
6,105
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.