davanstrien's picture
davanstrien HF staff
Add BERTopic model
a4d0823
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

hub_issues_topocs

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/hub_issues_topocs")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 156
  • Number of training documents: 6427
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 model - version - training - add - base 10 Outlier Topic
0 yes - upscaling - embeddings - dir - 18 1785 Yes Upscaling VAE Embeddings
1 images - image - img2img - generated - black 218 Image Distortion Investigation
2 languages - language - chinese - support - multilingual 169 Multilingual Language Support
3 request - thesis - checker - request request - work 103 DOI request and thesis checker
4 bloom - 176b - bloomz - bert - 7b1 95 Bloom inference on BERT
5 api - inference api - hosted - inference - hosted inference 80 Configuring Inference API
6 report report - report - reports - look - awesome 78 Awesome Reports
7 use model - run model - model run - model use - tune model 73 Use model instructions
8 request access - access request - access - request - request requesting 65 Access Request Solution
9 colab - google - google colab - model google - collab 64 "Running Galactica on Colab"
10 json - config json - config - json file - file named 62 JSON configuration files
11 load model - load - model working - unable load - unable 60 "Model loading issues"
12 text - text generation - words - truncated - generation 57 Text Generation Techniques
13 label - labels - tags - classifier - entity 57 Document Labels
14 data - model dataset - dataset - train model - used train 55 Model Training Data
15 issue report - issue - report - 论文 - artists 55 Ethical Issues in Artists' Legal Discussion
16 loading - loading model - error loading - model error - load model 55 Model Loading Errors
17 error error - error - 500 error - connection - unknown error 49 Error 500 Connection
18 train model - train - trained - model did - model trained 46 Training models in Arabic
19 stable diffusion - diffusion - stable - diffusion v1 - diffusion webui 46 Stable Diffusion Downloads
20 question - answers - questions - tts - double 45 Question about Fig.2c
21 length - max - maximum - limit - sequence length 45 Length Limits and Token Length
22 model model - model architecture - generator - architecture - type 42 Model Architecture
23 commercial - license - commercial use - license license - mit 41 Commercial Use License
24 transformers - transformer - sentence transformers - sentence - using transformers 40 Issues with sentence transformers
25 huggingface - hugging face - hugging - face - using hugging 40 Hugging Face model usage
26 legal - legal issue - issue report - issue - report 40 Legal Issues Reports
27 v2 - v3 - anime - wav2vec2 - virus 40 Anime Virus Detection Vae
28 tutorials - thread - tricks - 26 - tips 39 Stable Diffusion 26+ Tutorials
29 difference - fp16 - dpm - opus - opus mt 39 Difference between phase1 and phase2
30 tokenizer - using from_pretrained - loading - error loading - load 37 Tokenizer Loading Error
31 output - extraction - truncated - summaries - outputs 37 Output Extraction
32 attribute - object - attributeerror - typeerror - string 36 AttributeError in object attributes
33 ckpt file - ckpt - file ckpt - file - ckpt files 36 CKPT file location
34 dataset dataset - dataset - source dataset - datasets - source 36 dataset source semantic search
35 size - mismatch - discrepancy - vocab size - dimensionality 36 Size Mismatch Discrepancy
36 license - license license - permission - agreement - licence 36 License Agreement
37 model card - card - card model - building model - building 35 Model Card Typos
38 demo - space - spaces - gradio - cause 35 Troubleshooting Gradio Demo
39 commercially - does model - commercial - model used - usable 34 Commercial Usability of AI Model
40 automatic1111 - webui - automatic - ui - web ui 33 Automatic1111 WebUI
41 import - transformers - module - failed - export 33 ImportError in Transformers Module
42 example - examples - example use - prompt example - usage example 33 Example Usage
43 audio - noise - spectrogram - second - speaker 33 Audio Transcription and Conversion
44 cool - love - idea - amazing - great 32 "cool and amazing"
45 language model - language - kenlm - lm - multilingual 32 Language Model Inference with KenLM
46 really - nice - cool - love - amazing 32 amazing model
47 sagemaker - endpoint - deployment - deploy - amazon 32 Deploying SageMaker Endpoints
48 training training - training - training steps - general - video 31 "Training Steps Video"
49 tokenizer - problems - masked - tokenizer tokenizer - tokens 31 Tokenizer Problems
50 sd - sd2 - sd sd - does support - wd 30 Using SD with Different Versions
51 test - testing - sampler - discussion - split 30 Testing Sampler Discussion
52 argument - unexpected - keyword - typeerror - got 30 Unexpected keyword argument TypeError
53 float - runtimeerror expected - runtimeerror - expected - type 30 RuntimeErrors with Float and Half Types
54 dataset used - dataset - dataset dataset - used fine - used 28 Dataset Usage
55 json - json file - model architecture - inconsistency - architecture 28 JSON file inconsistency
56 usage - project - app - macos - usage questions 28 Usage with Sherpa
57 reproduce - results - result - civitai - reproducing results 28 Reproduce Result Difficulty
58 gene - cell - question generation - generation - geneformer 27 Gene Embedding Generation
59 gpu - gpus - multiple - gpu run - model multiple 27 Multi-GPU Model Execution
60 tokenizer use - wlop - mean - token - webui version 26 Tokenizer for Cantonese
61 model fine - tuning model - fine tuning - fine - tuning 26 Fine-Tuning the Model
62 model training - training model - training - redshift - model model 26 Model Training
63 bot - discord - tesla - chat - character 26 Tesla Discord Bot 2021
64 work - doesn work - doesn - dont - does appear 26 Non-functional potty lora
65 use use - use - best - way use - methods 26 Best ways to use
66 report card - metadata - card - report - 26 Metadata Report Card
67 guide - instructions - guidance - prompt - cost 25 Fine-tuning guide instructions
68 code - finetuning code - finetuning - fine tuning - tuning 25 Fine-tuning Code Sample
69 dataset - custom dataset - dataset fine - custom - fine tuning 25 Custom dataset fine-tuning
70 safetensors - safetensor - version - version safetensors - safetensor version 25 SafeTensors Version Inquiry
71 model based - task model - model changes - bring - v7 25 Model Description and Changes
72 weights - weight - flax - diffusers weights - load weights 25 Outdated Flax Weights
73 style - modern - mode - new - dark mode 24 Style in Modern Technology
74 convert - format - trying convert - safetensors - converter 24 Safetensors conversion error
75 checkpoint - save - checkpoint file - checkpoints - restore 24 Checkpoint Safety Restore
76 t5 - flan t5 - flan - google flan - xxl 23 T5 vs Flan-T5 Differences
77 download model - model load - download - load - model download 23 "Model Download"
78 access access - access - access need - need access - need 23 Access Request Assistance
79 model details - details model - details - information model - model access 23 Model Details
80 job - excellent - nice - great - congrats 23 Job Well Done
81 onnx - conversion - onnx conversion - convert - torchscript 22 ONNX Conversion Implementation
82 git - repository - repo - cloning - slow 22 Git repository cloning issues
83 online - 50 - 200 - buy - annotator 22 Buy Medications Online
84 access - request access - acces request - access request - request 22 Access Request
85 cuda - cuda memory - memory - cuda error - memory cuda 22 CUDA memory out of error
86 api model - api - inference api - model api - trying use 22 API Model Errors
87 training data - data training - data - training dataset - training 22 Data Training Examples
88 pipeline - valid - pipe - sentence similarity - similarity 21 Pipeline error analysis
89 tensor - tensors - device - expected - size 21 Tensor size mismatch errors
90 in_silico_perturber - eos_token_id - switch - 64 - encoder 21 Error in decoder generation
91 pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model 21 Missing pytorch_model.bin file
92 404 - url - https - https huggingface - resolve 21 404 error Huggingface documents
93 requirements - acess - feature request - request request - feature 21 System Requirements Access
94 info - technical - details - information - detailed 21 Technical Details Inquiry
95 hello - hi - good - translates - 100 20 Greetings and Translations
96 accuracy - drop - compatibility - precision - half precision 20 Accuracy Drop in Precision
97 access request - request access - access - request - new 20 Access Request
98 file missing - log - filenotfounderror - location - sorry 20 File Not Found
99 model card - card - link model - link - example model 20 Broken link in model
100 python - kernel - 10 - pytorch - talks 20 Python usage and errors
101 bug - fix - racist - possible bug - thing 19 Bug Fix with Racist Bug
102 training code - code training - code - share - share training 19 "Training Code Sharing"
103 license - accept - license license - model accept - indication 19 Model License
104 gpt - protgpt2 - 6b - jt - gpt jt 19 GPT-JT-6B-v1 Abilities
105 report report - report - - - 19 Multiple Reports on Topic
106 tuning fine - tune fine - fine - fine tuning - tuning 18 Fine-tuning for domain adaptation
107 inpaint model - inpaint - ix - size model - model pruned 18 Inpaint Model
108 config file - config - tokenizer config - files config - file 18 Config File Troubleshooting
109 sample code - example - sample - copied - error example 18 Issues with sample code
110 nsfw - nsfw content - content - disable - safety 18 NSFW Content Filtering
111 length - summary - longformer - summary length - text length 18 Length of Summaries
112 access download - access - download - access access - download working 18 Access Download
113 thank - thanks - just want - pretty - request thank 18 Thank you efforts
114 sd v1 - v1 - ema ckpt - sd - ema 18 Access to sd-v1-4-full-ema.ckpt
115 padding_side - tokens - token - cls token - token id 18 Padding and token discrepancy
116 amd - vram - gb - gpu - 448 17 "AMD GPU compatibility"
117 dataset - pretraining - dataset dataset - datasets - request dataset 17 Dataset Pretraining
118 version - ggml version - version ggml - ggml - pytorch version 17 "Version Possibility"
119 memory - leak - a100 - cuda memory - memory google 17 Memory-related Issues
120 trigger - words - word - trigger word - semantic 17 Trigger words and semantic search
121 result - results - output - score - ways 16 Visualizing Inference Results
122 sd - tested - sd sd - lora training - ui 16 Stable Diffusion LORA Training
123 ckpt file - bin - convert - weights - dreambooth 16 Convert Diffusion Diffusers to CKPT
124 need help - help - help help - need - started 16 Need Help Getting Started
125 keyerror - key - exception error - key error - codegen 16 KeyError Troubleshooting
126 controlnet - control - a1111 - installed - model embedding 16 ControlNet not working
127 implementation - issue - solved - np - experiencing 16 Implementation Issue Fix
128 runtimeerror - time series - everytime - process runtimeerror - try run 16 Time Series Runtime Error
129 use use - use - use readme - use diffusers - tk 15 How to use Diffusers
130 training dataset - dataset used - used dataset - nli - used training 15 Training Dataset Used
131 yaml files - colab pc - install run - diffusion google - train custom 15 Stable Diffusion Tutorials
132 spam - deleted - removed - delete - contact 15 Removal of Spam Discussion
133 details training - details - training - details details - details info 14 Training Details
134 hyper parameters - hyper - parameters - provide - provide training 14 Hyperparameter Optimization
135 fine tune - tune - ner - fine - emotions 14 Fine-tune Sentence Embeddings
136 model using - using model - examples - question lora - models used 14 Inkpunk Diffusion model
137 error running - running - running example - usage code - code 14 Error running example code
138 difference - alpaca - model difference - original model - difference model 14 Model Differences
139 install - locally - know install - run local - mini 14 "How to install locally"
140 training script - script - script training - sharing training - midi 13 Training Script
141 model file - missing model - corrupt - file model - file missing 13 Model File Issues
142 error help - help error - help - solve - try 13 Error Help
143 hardware - hardware requirements - requirements - gpu inference - requirements fine 13 Hardware Requirements for Inference
144 update - updated - channel - expired - new update 13 update query status
145 negative - negative prompt - negative prompts - prompts - prompt 13 "Negative Prompt Function"
146 unable run - unable - run unable - run - human 13 Unable to run on local machine
147 injection - nmkd gui - nmkd - tutorial videos - gui 12 Stable Diffusion Tutorial Videos
148 download download - download - request acces - know download - fim 12 "Download Instructions"
149 transformers - sentence transformers - huggingface transformers - different results - usage 12 Transformer Usage Discrepancy
150 link - broken link - broken - documentation - expired 11 Broken links and documentation
151 broke - padding - dead - kenlm - dropout 11 "Dead KenLM Finetuning"
152 training question - question training - training process - question regarding - question 11 Training Process Question
153 dataset training - training data - training dataset - data training - custom dataset 11 Training Data Quality
154 download - download download - possible download - hd 18 - hd 11 Troubleshooting download errors

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6