Edit model card

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("noumanjavaid/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - bert - model - tensorflow - tokenizers 14 -1_models_bert_model_tensorflow
0 tokenizer - tokenizers - tokenization - token - tokens 2078 0_tokenizer_tokenizers_tokenization_token
1 pytorch - tensorflow - modelingutilspy - attributeerror - runtimeerror 1886 1_pytorch_tensorflow_modelingutilspy_attributeerror
2 trainertrain - trainer - trainers - training - evaluateduringtraining 696 2_trainertrain_trainer_trainers_training
3 summarization - summaries - summary - examples - sentencepiece 636 3_summarization_summaries_summary_examples
4 gpt2tokenizer - gpt2 - gpt2tokenizerfast - gpt2model - gpt 452 4_gpt2tokenizer_gpt2_gpt2tokenizerfast_gpt2model
5 modelcard - modelcards - card - model - models 451 5_modelcard_modelcards_card_model
6 typos - typo - fix - correction - fixed 446 6_typos_typo_fix_correction
7 readmemd - readmetxt - readme - modelcard - file 284 7_readmemd_readmetxt_readme_modelcard
8 t5 - t5model - t5base - tf - t5large 256 8_t5_t5model_t5base_tf
9 longformer - longformers - longformerformultiplechoice - longformertokenizerfast - attentions 254 9_longformer_longformers_longformerformultiplechoice_longformertokenizerfast
10 seq2seq - seq2seqtrainer - seq2seqdataset - seq2seqfinetunepy - runseq2seq 223 10_seq2seq_seq2seqtrainer_seq2seqdataset_seq2seqfinetunepy
11 pipeline - pipelines - ner - pipelinesentimentanalysis - nerpipeline 209 11_pipeline_pipelines_ner_pipelinesentimentanalysis
12 ci - testing - tests - test - testgeneratefp16 175 12_ci_testing_tests_test
13 deprecate - deprecation - deprecated - warnings - warning 136 13_deprecate_deprecation_deprecated_warnings
14 onnxonnxruntime - onnx - 04onnxexport - 04onnxexportipynb - benchmarkonnxexport 129 14_onnxonnxruntime_onnx_04onnxexport_04onnxexportipynb
15 datacollatorforlanguagemodelingfile - datacollatorforlanguagemodeling - datacollatorforlanguagemodelling - datacollatorforpermutationlanguagemodeling - labelsmoothingfactor 100 15_datacollatorforlanguagemodelingfile_datacollatorforlanguagemodeling_datacollatorforlanguagemodelling_datacollatorforpermutationlanguagemodeling
16 deberta - debertav2 - debertav2initpy - debertatokenizer - debertav2xxlargemnli 73 16_deberta_debertav2_debertav2initpy_debertatokenizer
17 benchmark - benchmarking - benchmarks - comparison - results 63 17_benchmark_benchmarking_benchmarks_comparison
18 generationbeamsearchpy - generatebeamsearch - generatebeamsearchoutputs - beamsearch - nonbeamsearch 61 18_generationbeamsearchpy_generatebeamsearch_generatebeamsearchoutputs_beamsearch
19 wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue 56 19_wandbproject_wandb_wandbcallback_wandbdisabled
20 wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 54 20_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc
21 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 49 21_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
22 configpath - configs - config - configuration - modelconfigs 49 22_configpath_configs_config_configuration
23 logging - logs - log - logger - loggingfirststep 45 23_logging_logs_log_logger
24 cachedir - cache - cachedpath - caching - cached 34 24_cachedir_cache_cachedpath_caching
25 electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification 34 25_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
26 layoutlm - layoutlmtokenizer - layout - layoutlmbaseuncased - tf 23 26_layoutlm_layoutlmtokenizer_layout_layoutlmbaseuncased
27 dict - dictstr - returndict - parse - arguments 17 27_dict_dictstr_returndict_parse
28 pplm - pr - deprecated - variable - ppl 17 28_pplm_pr_deprecated_variable

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.37
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 3.0.1
  • Transformers: 4.41.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.