File size: 16,534 Bytes
a4d0823 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# hub_issues_topocs
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/hub_issues_topocs")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 156
* Number of training documents: 6427
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | model - version - training - add - base | 10 | Outlier Topic |
| 0 | yes - upscaling - embeddings - dir - 18 | 1785 | Yes Upscaling VAE Embeddings |
| 1 | images - image - img2img - generated - black | 218 | Image Distortion Investigation |
| 2 | languages - language - chinese - support - multilingual | 169 | Multilingual Language Support |
| 3 | request - thesis - checker - request request - work | 103 | DOI request and thesis checker |
| 4 | bloom - 176b - bloomz - bert - 7b1 | 95 | Bloom inference on BERT |
| 5 | api - inference api - hosted - inference - hosted inference | 80 | Configuring Inference API |
| 6 | report report - report - reports - look - awesome | 78 | Awesome Reports |
| 7 | use model - run model - model run - model use - tune model | 73 | Use model instructions |
| 8 | request access - access request - access - request - request requesting | 65 | Access Request Solution |
| 9 | colab - google - google colab - model google - collab | 64 | "Running Galactica on Colab" |
| 10 | json - config json - config - json file - file named | 62 | JSON configuration files |
| 11 | load model - load - model working - unable load - unable | 60 | "Model loading issues" |
| 12 | text - text generation - words - truncated - generation | 57 | Text Generation Techniques |
| 13 | label - labels - tags - classifier - entity | 57 | Document Labels |
| 14 | data - model dataset - dataset - train model - used train | 55 | Model Training Data |
| 15 | issue report - issue - report - 论文 - artists | 55 | Ethical Issues in Artists' Legal Discussion |
| 16 | loading - loading model - error loading - model error - load model | 55 | Model Loading Errors |
| 17 | error error - error - 500 error - connection - unknown error | 49 | Error 500 Connection |
| 18 | train model - train - trained - model did - model trained | 46 | Training models in Arabic |
| 19 | stable diffusion - diffusion - stable - diffusion v1 - diffusion webui | 46 | Stable Diffusion Downloads |
| 20 | question - answers - questions - tts - double | 45 | Question about Fig.2c |
| 21 | length - max - maximum - limit - sequence length | 45 | Length Limits and Token Length |
| 22 | model model - model architecture - generator - architecture - type | 42 | Model Architecture |
| 23 | commercial - license - commercial use - license license - mit | 41 | Commercial Use License |
| 24 | transformers - transformer - sentence transformers - sentence - using transformers | 40 | Issues with sentence transformers |
| 25 | huggingface - hugging face - hugging - face - using hugging | 40 | Hugging Face model usage |
| 26 | legal - legal issue - issue report - issue - report | 40 | Legal Issues Reports |
| 27 | v2 - v3 - anime - wav2vec2 - virus | 40 | Anime Virus Detection Vae |
| 28 | tutorials - thread - tricks - 26 - tips | 39 | Stable Diffusion 26+ Tutorials |
| 29 | difference - fp16 - dpm - opus - opus mt | 39 | Difference between phase1 and phase2 |
| 30 | tokenizer - using from_pretrained - loading - error loading - load | 37 | Tokenizer Loading Error |
| 31 | output - extraction - truncated - summaries - outputs | 37 | Output Extraction |
| 32 | attribute - object - attributeerror - typeerror - string | 36 | AttributeError in object attributes |
| 33 | ckpt file - ckpt - file ckpt - file - ckpt files | 36 | CKPT file location |
| 34 | dataset dataset - dataset - source dataset - datasets - source | 36 | dataset source semantic search |
| 35 | size - mismatch - discrepancy - vocab size - dimensionality | 36 | Size Mismatch Discrepancy |
| 36 | license - license license - permission - agreement - licence | 36 | License Agreement |
| 37 | model card - card - card model - building model - building | 35 | Model Card Typos |
| 38 | demo - space - spaces - gradio - cause | 35 | Troubleshooting Gradio Demo |
| 39 | commercially - does model - commercial - model used - usable | 34 | Commercial Usability of AI Model |
| 40 | automatic1111 - webui - automatic - ui - web ui | 33 | Automatic1111 WebUI |
| 41 | import - transformers - module - failed - export | 33 | ImportError in Transformers Module |
| 42 | example - examples - example use - prompt example - usage example | 33 | Example Usage |
| 43 | audio - noise - spectrogram - second - speaker | 33 | Audio Transcription and Conversion |
| 44 | cool - love - idea - amazing - great | 32 | "cool and amazing" |
| 45 | language model - language - kenlm - lm - multilingual | 32 | Language Model Inference with KenLM |
| 46 | really - nice - cool - love - amazing | 32 | amazing model |
| 47 | sagemaker - endpoint - deployment - deploy - amazon | 32 | Deploying SageMaker Endpoints |
| 48 | training training - training - training steps - general - video | 31 | "Training Steps Video" |
| 49 | tokenizer - problems - masked - tokenizer tokenizer - tokens | 31 | Tokenizer Problems |
| 50 | sd - sd2 - sd sd - does support - wd | 30 | Using SD with Different Versions |
| 51 | test - testing - sampler - discussion - split | 30 | Testing Sampler Discussion |
| 52 | argument - unexpected - keyword - typeerror - got | 30 | Unexpected keyword argument TypeError |
| 53 | float - runtimeerror expected - runtimeerror - expected - type | 30 | RuntimeErrors with Float and Half Types |
| 54 | dataset used - dataset - dataset dataset - used fine - used | 28 | Dataset Usage |
| 55 | json - json file - model architecture - inconsistency - architecture | 28 | JSON file inconsistency |
| 56 | usage - project - app - macos - usage questions | 28 | Usage with Sherpa |
| 57 | reproduce - results - result - civitai - reproducing results | 28 | Reproduce Result Difficulty |
| 58 | gene - cell - question generation - generation - geneformer | 27 | Gene Embedding Generation |
| 59 | gpu - gpus - multiple - gpu run - model multiple | 27 | Multi-GPU Model Execution |
| 60 | tokenizer use - wlop - mean - token - webui version | 26 | Tokenizer for Cantonese |
| 61 | model fine - tuning model - fine tuning - fine - tuning | 26 | Fine-Tuning the Model |
| 62 | model training - training model - training - redshift - model model | 26 | Model Training |
| 63 | bot - discord - tesla - chat - character | 26 | Tesla Discord Bot 2021 |
| 64 | work - doesn work - doesn - dont - does appear | 26 | Non-functional potty lora |
| 65 | use use - use - best - way use - methods | 26 | Best ways to use |
| 66 | report card - metadata - card - report - | 26 | Metadata Report Card |
| 67 | guide - instructions - guidance - prompt - cost | 25 | Fine-tuning guide instructions |
| 68 | code - finetuning code - finetuning - fine tuning - tuning | 25 | Fine-tuning Code Sample |
| 69 | dataset - custom dataset - dataset fine - custom - fine tuning | 25 | Custom dataset fine-tuning |
| 70 | safetensors - safetensor - version - version safetensors - safetensor version | 25 | SafeTensors Version Inquiry |
| 71 | model based - task model - model changes - bring - v7 | 25 | Model Description and Changes |
| 72 | weights - weight - flax - diffusers weights - load weights | 25 | Outdated Flax Weights |
| 73 | style - modern - mode - new - dark mode | 24 | Style in Modern Technology |
| 74 | convert - format - trying convert - safetensors - converter | 24 | Safetensors conversion error |
| 75 | checkpoint - save - checkpoint file - checkpoints - restore | 24 | Checkpoint Safety Restore |
| 76 | t5 - flan t5 - flan - google flan - xxl | 23 | T5 vs Flan-T5 Differences |
| 77 | download model - model load - download - load - model download | 23 | "Model Download" |
| 78 | access access - access - access need - need access - need | 23 | Access Request Assistance |
| 79 | model details - details model - details - information model - model access | 23 | Model Details |
| 80 | job - excellent - nice - great - congrats | 23 | Job Well Done |
| 81 | onnx - conversion - onnx conversion - convert - torchscript | 22 | ONNX Conversion Implementation |
| 82 | git - repository - repo - cloning - slow | 22 | Git repository cloning issues |
| 83 | online - 50 - 200 - buy - annotator | 22 | Buy Medications Online |
| 84 | access - request access - acces request - access request - request | 22 | Access Request |
| 85 | cuda - cuda memory - memory - cuda error - memory cuda | 22 | CUDA memory out of error |
| 86 | api model - api - inference api - model api - trying use | 22 | API Model Errors |
| 87 | training data - data training - data - training dataset - training | 22 | Data Training Examples |
| 88 | pipeline - valid - pipe - sentence similarity - similarity | 21 | Pipeline error analysis |
| 89 | tensor - tensors - device - expected - size | 21 | Tensor size mismatch errors |
| 90 | in_silico_perturber - eos_token_id - switch - 64 - encoder | 21 | Error in decoder generation |
| 91 | pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model | 21 | Missing pytorch_model.bin file |
| 92 | 404 - url - https - https huggingface - resolve | 21 | 404 error Huggingface documents |
| 93 | requirements - acess - feature request - request request - feature | 21 | System Requirements Access |
| 94 | info - technical - details - information - detailed | 21 | Technical Details Inquiry |
| 95 | hello - hi - good - translates - 100 | 20 | Greetings and Translations |
| 96 | accuracy - drop - compatibility - precision - half precision | 20 | Accuracy Drop in Precision |
| 97 | access request - request access - access - request - new | 20 | Access Request |
| 98 | file missing - log - filenotfounderror - location - sorry | 20 | File Not Found |
| 99 | model card - card - link model - link - example model | 20 | Broken link in model |
| 100 | python - kernel - 10 - pytorch - talks | 20 | Python usage and errors |
| 101 | bug - fix - racist - possible bug - thing | 19 | Bug Fix with Racist Bug |
| 102 | training code - code training - code - share - share training | 19 | "Training Code Sharing" |
| 103 | license - accept - license license - model accept - indication | 19 | Model License |
| 104 | gpt - protgpt2 - 6b - jt - gpt jt | 19 | GPT-JT-6B-v1 Abilities |
| 105 | report report - report - - - | 19 | Multiple Reports on Topic |
| 106 | tuning fine - tune fine - fine - fine tuning - tuning | 18 | Fine-tuning for domain adaptation |
| 107 | inpaint model - inpaint - ix - size model - model pruned | 18 | Inpaint Model |
| 108 | config file - config - tokenizer config - files config - file | 18 | Config File Troubleshooting |
| 109 | sample code - example - sample - copied - error example | 18 | Issues with sample code |
| 110 | nsfw - nsfw content - content - disable - safety | 18 | NSFW Content Filtering |
| 111 | length - summary - longformer - summary length - text length | 18 | Length of Summaries |
| 112 | access download - access - download - access access - download working | 18 | Access Download |
| 113 | thank - thanks - just want - pretty - request thank | 18 | Thank you efforts |
| 114 | sd v1 - v1 - ema ckpt - sd - ema | 18 | Access to sd-v1-4-full-ema.ckpt |
| 115 | padding_side - tokens - token - cls token - token id | 18 | Padding and token discrepancy |
| 116 | amd - vram - gb - gpu - 448 | 17 | "AMD GPU compatibility" |
| 117 | dataset - pretraining - dataset dataset - datasets - request dataset | 17 | Dataset Pretraining |
| 118 | version - ggml version - version ggml - ggml - pytorch version | 17 | "Version Possibility" |
| 119 | memory - leak - a100 - cuda memory - memory google | 17 | Memory-related Issues |
| 120 | trigger - words - word - trigger word - semantic | 17 | Trigger words and semantic search |
| 121 | result - results - output - score - ways | 16 | Visualizing Inference Results |
| 122 | sd - tested - sd sd - lora training - ui | 16 | Stable Diffusion LORA Training |
| 123 | ckpt file - bin - convert - weights - dreambooth | 16 | Convert Diffusion Diffusers to CKPT |
| 124 | need help - help - help help - need - started | 16 | Need Help Getting Started |
| 125 | keyerror - key - exception error - key error - codegen | 16 | KeyError Troubleshooting |
| 126 | controlnet - control - a1111 - installed - model embedding | 16 | ControlNet not working |
| 127 | implementation - issue - solved - np - experiencing | 16 | Implementation Issue Fix |
| 128 | runtimeerror - time series - everytime - process runtimeerror - try run | 16 | Time Series Runtime Error |
| 129 | use use - use - use readme - use diffusers - tk | 15 | How to use Diffusers |
| 130 | training dataset - dataset used - used dataset - nli - used training | 15 | Training Dataset Used |
| 131 | yaml files - colab pc - install run - diffusion google - train custom | 15 | Stable Diffusion Tutorials |
| 132 | spam - deleted - removed - delete - contact | 15 | Removal of Spam Discussion |
| 133 | details training - details - training - details details - details info | 14 | Training Details |
| 134 | hyper parameters - hyper - parameters - provide - provide training | 14 | Hyperparameter Optimization |
| 135 | fine tune - tune - ner - fine - emotions | 14 | Fine-tune Sentence Embeddings |
| 136 | model using - using model - examples - question lora - models used | 14 | Inkpunk Diffusion model |
| 137 | error running - running - running example - usage code - code | 14 | Error running example code |
| 138 | difference - alpaca - model difference - original model - difference model | 14 | Model Differences |
| 139 | install - locally - know install - run local - mini | 14 | "How to install locally" |
| 140 | training script - script - script training - sharing training - midi | 13 | Training Script |
| 141 | model file - missing model - corrupt - file model - file missing | 13 | Model File Issues |
| 142 | error help - help error - help - solve - try | 13 | Error Help |
| 143 | hardware - hardware requirements - requirements - gpu inference - requirements fine | 13 | Hardware Requirements for Inference |
| 144 | update - updated - channel - expired - new update | 13 | update query status |
| 145 | negative - negative prompt - negative prompts - prompts - prompt | 13 | "Negative Prompt Function" |
| 146 | unable run - unable - run unable - run - human | 13 | Unable to run on local machine |
| 147 | injection - nmkd gui - nmkd - tutorial videos - gui | 12 | Stable Diffusion Tutorial Videos |
| 148 | download download - download - request acces - know download - fim | 12 | "Download Instructions" |
| 149 | transformers - sentence transformers - huggingface transformers - different results - usage | 12 | Transformer Usage Discrepancy |
| 150 | link - broken link - broken - documentation - expired | 11 | Broken links and documentation |
| 151 | broke - padding - dead - kenlm - dropout | 11 | "Dead KenLM Finetuning" |
| 152 | training question - question training - training process - question regarding - question | 11 | Training Process Question |
| 153 | dataset training - training data - training dataset - data training - custom dataset | 11 | Training Data Quality |
| 154 | download - download download - possible download - hd 18 - hd | 11 | Troubleshooting download errors |
</details>
## Training hyperparameters
* calculate_probabilities: False
* language: None
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: True
## Framework versions
* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6
|