Add BERTopic model

a4d0823 over 1 year ago

16.5 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# hub_issues_topocs

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("davanstrien/hub_issues_topocs")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 156
	* Number of training documents: 6427

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| model - version - training - add - base \| 10 \| Outlier Topic \|
	\| 0 \| yes - upscaling - embeddings - dir - 18 \| 1785 \| Yes Upscaling VAE Embeddings \|
	\| 1 \| images - image - img2img - generated - black \| 218 \| Image Distortion Investigation \|
	\| 2 \| languages - language - chinese - support - multilingual \| 169 \| Multilingual Language Support \|
	\| 3 \| request - thesis - checker - request request - work \| 103 \| DOI request and thesis checker \|
	\| 4 \| bloom - 176b - bloomz - bert - 7b1 \| 95 \| Bloom inference on BERT \|
	\| 5 \| api - inference api - hosted - inference - hosted inference \| 80 \| Configuring Inference API \|
	\| 6 \| report report - report - reports - look - awesome \| 78 \| Awesome Reports \|
	\| 7 \| use model - run model - model run - model use - tune model \| 73 \| Use model instructions \|
	\| 8 \| request access - access request - access - request - request requesting \| 65 \| Access Request Solution \|
	\| 9 \| colab - google - google colab - model google - collab \| 64 \| "Running Galactica on Colab" \|
	\| 10 \| json - config json - config - json file - file named \| 62 \| JSON configuration files \|
	\| 11 \| load model - load - model working - unable load - unable \| 60 \| "Model loading issues" \|
	\| 12 \| text - text generation - words - truncated - generation \| 57 \| Text Generation Techniques \|
	\| 13 \| label - labels - tags - classifier - entity \| 57 \| Document Labels \|
	\| 14 \| data - model dataset - dataset - train model - used train \| 55 \| Model Training Data \|
	\| 15 \| issue report - issue - report - 论文 - artists \| 55 \| Ethical Issues in Artists' Legal Discussion \|
	\| 16 \| loading - loading model - error loading - model error - load model \| 55 \| Model Loading Errors \|
	\| 17 \| error error - error - 500 error - connection - unknown error \| 49 \| Error 500 Connection \|
	\| 18 \| train model - train - trained - model did - model trained \| 46 \| Training models in Arabic \|
	\| 19 \| stable diffusion - diffusion - stable - diffusion v1 - diffusion webui \| 46 \| Stable Diffusion Downloads \|
	\| 20 \| question - answers - questions - tts - double \| 45 \| Question about Fig.2c \|
	\| 21 \| length - max - maximum - limit - sequence length \| 45 \| Length Limits and Token Length \|
	\| 22 \| model model - model architecture - generator - architecture - type \| 42 \| Model Architecture \|
	\| 23 \| commercial - license - commercial use - license license - mit \| 41 \| Commercial Use License \|
	\| 24 \| transformers - transformer - sentence transformers - sentence - using transformers \| 40 \| Issues with sentence transformers \|
	\| 25 \| huggingface - hugging face - hugging - face - using hugging \| 40 \| Hugging Face model usage \|
	\| 26 \| legal - legal issue - issue report - issue - report \| 40 \| Legal Issues Reports \|
	\| 27 \| v2 - v3 - anime - wav2vec2 - virus \| 40 \| Anime Virus Detection Vae \|
	\| 28 \| tutorials - thread - tricks - 26 - tips \| 39 \| Stable Diffusion 26+ Tutorials \|
	\| 29 \| difference - fp16 - dpm - opus - opus mt \| 39 \| Difference between phase1 and phase2 \|
	\| 30 \| tokenizer - using from_pretrained - loading - error loading - load \| 37 \| Tokenizer Loading Error \|
	\| 31 \| output - extraction - truncated - summaries - outputs \| 37 \| Output Extraction \|
	\| 32 \| attribute - object - attributeerror - typeerror - string \| 36 \| AttributeError in object attributes \|
	\| 33 \| ckpt file - ckpt - file ckpt - file - ckpt files \| 36 \| CKPT file location \|
	\| 34 \| dataset dataset - dataset - source dataset - datasets - source \| 36 \| dataset source semantic search \|
	\| 35 \| size - mismatch - discrepancy - vocab size - dimensionality \| 36 \| Size Mismatch Discrepancy \|
	\| 36 \| license - license license - permission - agreement - licence \| 36 \| License Agreement \|
	\| 37 \| model card - card - card model - building model - building \| 35 \| Model Card Typos \|
	\| 38 \| demo - space - spaces - gradio - cause \| 35 \| Troubleshooting Gradio Demo \|
	\| 39 \| commercially - does model - commercial - model used - usable \| 34 \| Commercial Usability of AI Model \|
	\| 40 \| automatic1111 - webui - automatic - ui - web ui \| 33 \| Automatic1111 WebUI \|
	\| 41 \| import - transformers - module - failed - export \| 33 \| ImportError in Transformers Module \|
	\| 42 \| example - examples - example use - prompt example - usage example \| 33 \| Example Usage \|
	\| 43 \| audio - noise - spectrogram - second - speaker \| 33 \| Audio Transcription and Conversion \|
	\| 44 \| cool - love - idea - amazing - great \| 32 \| "cool and amazing" \|
	\| 45 \| language model - language - kenlm - lm - multilingual \| 32 \| Language Model Inference with KenLM \|
	\| 46 \| really - nice - cool - love - amazing \| 32 \| amazing model \|
	\| 47 \| sagemaker - endpoint - deployment - deploy - amazon \| 32 \| Deploying SageMaker Endpoints \|
	\| 48 \| training training - training - training steps - general - video \| 31 \| "Training Steps Video" \|
	\| 49 \| tokenizer - problems - masked - tokenizer tokenizer - tokens \| 31 \| Tokenizer Problems \|
	\| 50 \| sd - sd2 - sd sd - does support - wd \| 30 \| Using SD with Different Versions \|
	\| 51 \| test - testing - sampler - discussion - split \| 30 \| Testing Sampler Discussion \|
	\| 52 \| argument - unexpected - keyword - typeerror - got \| 30 \| Unexpected keyword argument TypeError \|
	\| 53 \| float - runtimeerror expected - runtimeerror - expected - type \| 30 \| RuntimeErrors with Float and Half Types \|
	\| 54 \| dataset used - dataset - dataset dataset - used fine - used \| 28 \| Dataset Usage \|
	\| 55 \| json - json file - model architecture - inconsistency - architecture \| 28 \| JSON file inconsistency \|
	\| 56 \| usage - project - app - macos - usage questions \| 28 \| Usage with Sherpa \|
	\| 57 \| reproduce - results - result - civitai - reproducing results \| 28 \| Reproduce Result Difficulty \|
	\| 58 \| gene - cell - question generation - generation - geneformer \| 27 \| Gene Embedding Generation \|
	\| 59 \| gpu - gpus - multiple - gpu run - model multiple \| 27 \| Multi-GPU Model Execution \|
	\| 60 \| tokenizer use - wlop - mean - token - webui version \| 26 \| Tokenizer for Cantonese \|
	\| 61 \| model fine - tuning model - fine tuning - fine - tuning \| 26 \| Fine-Tuning the Model \|
	\| 62 \| model training - training model - training - redshift - model model \| 26 \| Model Training \|
	\| 63 \| bot - discord - tesla - chat - character \| 26 \| Tesla Discord Bot 2021 \|
	\| 64 \| work - doesn work - doesn - dont - does appear \| 26 \| Non-functional potty lora \|
	\| 65 \| use use - use - best - way use - methods \| 26 \| Best ways to use \|
	\| 66 \| report card - metadata - card - report - \| 26 \| Metadata Report Card \|
	\| 67 \| guide - instructions - guidance - prompt - cost \| 25 \| Fine-tuning guide instructions \|
	\| 68 \| code - finetuning code - finetuning - fine tuning - tuning \| 25 \| Fine-tuning Code Sample \|
	\| 69 \| dataset - custom dataset - dataset fine - custom - fine tuning \| 25 \| Custom dataset fine-tuning \|
	\| 70 \| safetensors - safetensor - version - version safetensors - safetensor version \| 25 \| SafeTensors Version Inquiry \|
	\| 71 \| model based - task model - model changes - bring - v7 \| 25 \| Model Description and Changes \|
	\| 72 \| weights - weight - flax - diffusers weights - load weights \| 25 \| Outdated Flax Weights \|
	\| 73 \| style - modern - mode - new - dark mode \| 24 \| Style in Modern Technology \|
	\| 74 \| convert - format - trying convert - safetensors - converter \| 24 \| Safetensors conversion error \|
	\| 75 \| checkpoint - save - checkpoint file - checkpoints - restore \| 24 \| Checkpoint Safety Restore \|
	\| 76 \| t5 - flan t5 - flan - google flan - xxl \| 23 \| T5 vs Flan-T5 Differences \|
	\| 77 \| download model - model load - download - load - model download \| 23 \| "Model Download" \|
	\| 78 \| access access - access - access need - need access - need \| 23 \| Access Request Assistance \|
	\| 79 \| model details - details model - details - information model - model access \| 23 \| Model Details \|
	\| 80 \| job - excellent - nice - great - congrats \| 23 \| Job Well Done \|
	\| 81 \| onnx - conversion - onnx conversion - convert - torchscript \| 22 \| ONNX Conversion Implementation \|
	\| 82 \| git - repository - repo - cloning - slow \| 22 \| Git repository cloning issues \|
	\| 83 \| online - 50 - 200 - buy - annotator \| 22 \| Buy Medications Online \|
	\| 84 \| access - request access - acces request - access request - request \| 22 \| Access Request \|
	\| 85 \| cuda - cuda memory - memory - cuda error - memory cuda \| 22 \| CUDA memory out of error \|
	\| 86 \| api model - api - inference api - model api - trying use \| 22 \| API Model Errors \|
	\| 87 \| training data - data training - data - training dataset - training \| 22 \| Data Training Examples \|
	\| 88 \| pipeline - valid - pipe - sentence similarity - similarity \| 21 \| Pipeline error analysis \|
	\| 89 \| tensor - tensors - device - expected - size \| 21 \| Tensor size mismatch errors \|
	\| 90 \| in_silico_perturber - eos_token_id - switch - 64 - encoder \| 21 \| Error in decoder generation \|
	\| 91 \| pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model \| 21 \| Missing pytorch_model.bin file \|
	\| 92 \| 404 - url - https - https huggingface - resolve \| 21 \| 404 error Huggingface documents \|
	\| 93 \| requirements - acess - feature request - request request - feature \| 21 \| System Requirements Access \|
	\| 94 \| info - technical - details - information - detailed \| 21 \| Technical Details Inquiry \|
	\| 95 \| hello - hi - good - translates - 100 \| 20 \| Greetings and Translations \|
	\| 96 \| accuracy - drop - compatibility - precision - half precision \| 20 \| Accuracy Drop in Precision \|
	\| 97 \| access request - request access - access - request - new \| 20 \| Access Request \|
	\| 98 \| file missing - log - filenotfounderror - location - sorry \| 20 \| File Not Found \|
	\| 99 \| model card - card - link model - link - example model \| 20 \| Broken link in model \|
	\| 100 \| python - kernel - 10 - pytorch - talks \| 20 \| Python usage and errors \|
	\| 101 \| bug - fix - racist - possible bug - thing \| 19 \| Bug Fix with Racist Bug \|
	\| 102 \| training code - code training - code - share - share training \| 19 \| "Training Code Sharing" \|
	\| 103 \| license - accept - license license - model accept - indication \| 19 \| Model License \|
	\| 104 \| gpt - protgpt2 - 6b - jt - gpt jt \| 19 \| GPT-JT-6B-v1 Abilities \|
	\| 105 \| report report - report - - - \| 19 \| Multiple Reports on Topic \|
	\| 106 \| tuning fine - tune fine - fine - fine tuning - tuning \| 18 \| Fine-tuning for domain adaptation \|
	\| 107 \| inpaint model - inpaint - ix - size model - model pruned \| 18 \| Inpaint Model \|
	\| 108 \| config file - config - tokenizer config - files config - file \| 18 \| Config File Troubleshooting \|
	\| 109 \| sample code - example - sample - copied - error example \| 18 \| Issues with sample code \|
	\| 110 \| nsfw - nsfw content - content - disable - safety \| 18 \| NSFW Content Filtering \|
	\| 111 \| length - summary - longformer - summary length - text length \| 18 \| Length of Summaries \|
	\| 112 \| access download - access - download - access access - download working \| 18 \| Access Download \|
	\| 113 \| thank - thanks - just want - pretty - request thank \| 18 \| Thank you efforts \|
	\| 114 \| sd v1 - v1 - ema ckpt - sd - ema \| 18 \| Access to sd-v1-4-full-ema.ckpt \|
	\| 115 \| padding_side - tokens - token - cls token - token id \| 18 \| Padding and token discrepancy \|
	\| 116 \| amd - vram - gb - gpu - 448 \| 17 \| "AMD GPU compatibility" \|
	\| 117 \| dataset - pretraining - dataset dataset - datasets - request dataset \| 17 \| Dataset Pretraining \|
	\| 118 \| version - ggml version - version ggml - ggml - pytorch version \| 17 \| "Version Possibility" \|
	\| 119 \| memory - leak - a100 - cuda memory - memory google \| 17 \| Memory-related Issues \|
	\| 120 \| trigger - words - word - trigger word - semantic \| 17 \| Trigger words and semantic search \|
	\| 121 \| result - results - output - score - ways \| 16 \| Visualizing Inference Results \|
	\| 122 \| sd - tested - sd sd - lora training - ui \| 16 \| Stable Diffusion LORA Training \|
	\| 123 \| ckpt file - bin - convert - weights - dreambooth \| 16 \| Convert Diffusion Diffusers to CKPT \|
	\| 124 \| need help - help - help help - need - started \| 16 \| Need Help Getting Started \|
	\| 125 \| keyerror - key - exception error - key error - codegen \| 16 \| KeyError Troubleshooting \|
	\| 126 \| controlnet - control - a1111 - installed - model embedding \| 16 \| ControlNet not working \|
	\| 127 \| implementation - issue - solved - np - experiencing \| 16 \| Implementation Issue Fix \|
	\| 128 \| runtimeerror - time series - everytime - process runtimeerror - try run \| 16 \| Time Series Runtime Error \|
	\| 129 \| use use - use - use readme - use diffusers - tk \| 15 \| How to use Diffusers \|
	\| 130 \| training dataset - dataset used - used dataset - nli - used training \| 15 \| Training Dataset Used \|
	\| 131 \| yaml files - colab pc - install run - diffusion google - train custom \| 15 \| Stable Diffusion Tutorials \|
	\| 132 \| spam - deleted - removed - delete - contact \| 15 \| Removal of Spam Discussion \|
	\| 133 \| details training - details - training - details details - details info \| 14 \| Training Details \|
	\| 134 \| hyper parameters - hyper - parameters - provide - provide training \| 14 \| Hyperparameter Optimization \|
	\| 135 \| fine tune - tune - ner - fine - emotions \| 14 \| Fine-tune Sentence Embeddings \|
	\| 136 \| model using - using model - examples - question lora - models used \| 14 \| Inkpunk Diffusion model \|
	\| 137 \| error running - running - running example - usage code - code \| 14 \| Error running example code \|
	\| 138 \| difference - alpaca - model difference - original model - difference model \| 14 \| Model Differences \|
	\| 139 \| install - locally - know install - run local - mini \| 14 \| "How to install locally" \|
	\| 140 \| training script - script - script training - sharing training - midi \| 13 \| Training Script \|
	\| 141 \| model file - missing model - corrupt - file model - file missing \| 13 \| Model File Issues \|
	\| 142 \| error help - help error - help - solve - try \| 13 \| Error Help \|
	\| 143 \| hardware - hardware requirements - requirements - gpu inference - requirements fine \| 13 \| Hardware Requirements for Inference \|
	\| 144 \| update - updated - channel - expired - new update \| 13 \| update query status \|
	\| 145 \| negative - negative prompt - negative prompts - prompts - prompt \| 13 \| "Negative Prompt Function" \|
	\| 146 \| unable run - unable - run unable - run - human \| 13 \| Unable to run on local machine \|
	\| 147 \| injection - nmkd gui - nmkd - tutorial videos - gui \| 12 \| Stable Diffusion Tutorial Videos \|
	\| 148 \| download download - download - request acces - know download - fim \| 12 \| "Download Instructions" \|
	\| 149 \| transformers - sentence transformers - huggingface transformers - different results - usage \| 12 \| Transformer Usage Discrepancy \|
	\| 150 \| link - broken link - broken - documentation - expired \| 11 \| Broken links and documentation \|
	\| 151 \| broke - padding - dead - kenlm - dropout \| 11 \| "Dead KenLM Finetuning" \|
	\| 152 \| training question - question training - training process - question regarding - question \| 11 \| Training Process Question \|
	\| 153 \| dataset training - training data - training dataset - data training - custom dataset \| 11 \| Training Data Quality \|
	\| 154 \| download - download download - possible download - hd 18 - hd \| 11 \| Troubleshooting download errors \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: None
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: None
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.22.4
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.3
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.31.0
	* Numba: 0.56.4
	* Plotly: 5.13.1
	* Python: 3.10.6