metadata

tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

hub_issues_topocs

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/hub_issues_topocs")

topic_model.get_topic_info()

Topic overview

Number of topics: 156
Number of training documents: 6427

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	model - version - training - add - base	10	Outlier Topic
0	yes - upscaling - embeddings - dir - 18	1785	Yes Upscaling VAE Embeddings
1	images - image - img2img - generated - black	218	Image Distortion Investigation
2	languages - language - chinese - support - multilingual	169	Multilingual Language Support
3	request - thesis - checker - request request - work	103	DOI request and thesis checker
4	bloom - 176b - bloomz - bert - 7b1	95	Bloom inference on BERT
5	api - inference api - hosted - inference - hosted inference	80	Configuring Inference API
6	report report - report - reports - look - awesome	78	Awesome Reports
7	use model - run model - model run - model use - tune model	73	Use model instructions
8	request access - access request - access - request - request requesting	65	Access Request Solution
9	colab - google - google colab - model google - collab	64	"Running Galactica on Colab"
10	json - config json - config - json file - file named	62	JSON configuration files
11	load model - load - model working - unable load - unable	60	"Model loading issues"
12	text - text generation - words - truncated - generation	57	Text Generation Techniques
13	label - labels - tags - classifier - entity	57	Document Labels
14	data - model dataset - dataset - train model - used train	55	Model Training Data
15	issue report - issue - report - 论文 - artists	55	Ethical Issues in Artists' Legal Discussion
16	loading - loading model - error loading - model error - load model	55	Model Loading Errors
17	error error - error - 500 error - connection - unknown error	49	Error 500 Connection
18	train model - train - trained - model did - model trained	46	Training models in Arabic
19	stable diffusion - diffusion - stable - diffusion v1 - diffusion webui	46	Stable Diffusion Downloads
20	question - answers - questions - tts - double	45	Question about Fig.2c
21	length - max - maximum - limit - sequence length	45	Length Limits and Token Length
22	model model - model architecture - generator - architecture - type	42	Model Architecture
23	commercial - license - commercial use - license license - mit	41	Commercial Use License
24	transformers - transformer - sentence transformers - sentence - using transformers	40	Issues with sentence transformers
25	huggingface - hugging face - hugging - face - using hugging	40	Hugging Face model usage
26	legal - legal issue - issue report - issue - report	40	Legal Issues Reports
27	v2 - v3 - anime - wav2vec2 - virus	40	Anime Virus Detection Vae
28	tutorials - thread - tricks - 26 - tips	39	Stable Diffusion 26+ Tutorials
29	difference - fp16 - dpm - opus - opus mt	39	Difference between phase1 and phase2
30	tokenizer - using from_pretrained - loading - error loading - load	37	Tokenizer Loading Error
31	output - extraction - truncated - summaries - outputs	37	Output Extraction
32	attribute - object - attributeerror - typeerror - string	36	AttributeError in object attributes
33	ckpt file - ckpt - file ckpt - file - ckpt files	36	CKPT file location
34	dataset dataset - dataset - source dataset - datasets - source	36	dataset source semantic search
35	size - mismatch - discrepancy - vocab size - dimensionality	36	Size Mismatch Discrepancy
36	license - license license - permission - agreement - licence	36	License Agreement
37	model card - card - card model - building model - building	35	Model Card Typos
38	demo - space - spaces - gradio - cause	35	Troubleshooting Gradio Demo
39	commercially - does model - commercial - model used - usable	34	Commercial Usability of AI Model
40	automatic1111 - webui - automatic - ui - web ui	33	Automatic1111 WebUI
41	import - transformers - module - failed - export	33	ImportError in Transformers Module
42	example - examples - example use - prompt example - usage example	33	Example Usage
43	audio - noise - spectrogram - second - speaker	33	Audio Transcription and Conversion
44	cool - love - idea - amazing - great	32	"cool and amazing"
45	language model - language - kenlm - lm - multilingual	32	Language Model Inference with KenLM
46	really - nice - cool - love - amazing	32	amazing model
47	sagemaker - endpoint - deployment - deploy - amazon	32	Deploying SageMaker Endpoints
48	training training - training - training steps - general - video	31	"Training Steps Video"
49	tokenizer - problems - masked - tokenizer tokenizer - tokens	31	Tokenizer Problems
50	sd - sd2 - sd sd - does support - wd	30	Using SD with Different Versions
51	test - testing - sampler - discussion - split	30	Testing Sampler Discussion
52	argument - unexpected - keyword - typeerror - got	30	Unexpected keyword argument TypeError
53	float - runtimeerror expected - runtimeerror - expected - type	30	RuntimeErrors with Float and Half Types
54	dataset used - dataset - dataset dataset - used fine - used	28	Dataset Usage
55	json - json file - model architecture - inconsistency - architecture	28	JSON file inconsistency
56	usage - project - app - macos - usage questions	28	Usage with Sherpa
57	reproduce - results - result - civitai - reproducing results	28	Reproduce Result Difficulty
58	gene - cell - question generation - generation - geneformer	27	Gene Embedding Generation
59	gpu - gpus - multiple - gpu run - model multiple	27	Multi-GPU Model Execution
60	tokenizer use - wlop - mean - token - webui version	26	Tokenizer for Cantonese
61	model fine - tuning model - fine tuning - fine - tuning	26	Fine-Tuning the Model
62	model training - training model - training - redshift - model model	26	Model Training
63	bot - discord - tesla - chat - character	26	Tesla Discord Bot 2021
64	work - doesn work - doesn - dont - does appear	26	Non-functional potty lora
65	use use - use - best - way use - methods	26	Best ways to use
66	report card - metadata - card - report -	26	Metadata Report Card
67	guide - instructions - guidance - prompt - cost	25	Fine-tuning guide instructions
68	code - finetuning code - finetuning - fine tuning - tuning	25	Fine-tuning Code Sample
69	dataset - custom dataset - dataset fine - custom - fine tuning	25	Custom dataset fine-tuning
70	safetensors - safetensor - version - version safetensors - safetensor version	25	SafeTensors Version Inquiry
71	model based - task model - model changes - bring - v7	25	Model Description and Changes
72	weights - weight - flax - diffusers weights - load weights	25	Outdated Flax Weights
73	style - modern - mode - new - dark mode	24	Style in Modern Technology
74	convert - format - trying convert - safetensors - converter	24	Safetensors conversion error
75	checkpoint - save - checkpoint file - checkpoints - restore	24	Checkpoint Safety Restore
76	t5 - flan t5 - flan - google flan - xxl	23	T5 vs Flan-T5 Differences
77	download model - model load - download - load - model download	23	"Model Download"
78	access access - access - access need - need access - need	23	Access Request Assistance
79	model details - details model - details - information model - model access	23	Model Details
80	job - excellent - nice - great - congrats	23	Job Well Done
81	onnx - conversion - onnx conversion - convert - torchscript	22	ONNX Conversion Implementation
82	git - repository - repo - cloning - slow	22	Git repository cloning issues
83	online - 50 - 200 - buy - annotator	22	Buy Medications Online
84	access - request access - acces request - access request - request	22	Access Request
85	cuda - cuda memory - memory - cuda error - memory cuda	22	CUDA memory out of error
86	api model - api - inference api - model api - trying use	22	API Model Errors
87	training data - data training - data - training dataset - training	22	Data Training Examples
88	pipeline - valid - pipe - sentence similarity - similarity	21	Pipeline error analysis
89	tensor - tensors - device - expected - size	21	Tensor size mismatch errors
90	in_silico_perturber - eos_token_id - switch - 64 - encoder	21	Error in decoder generation
91	pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model	21	Missing pytorch_model.bin file
92	404 - url - https - https huggingface - resolve	21	404 error Huggingface documents
93	requirements - acess - feature request - request request - feature	21	System Requirements Access
94	info - technical - details - information - detailed	21	Technical Details Inquiry
95	hello - hi - good - translates - 100	20	Greetings and Translations
96	accuracy - drop - compatibility - precision - half precision	20	Accuracy Drop in Precision
97	access request - request access - access - request - new	20	Access Request
98	file missing - log - filenotfounderror - location - sorry	20	File Not Found
99	model card - card - link model - link - example model	20	Broken link in model
100	python - kernel - 10 - pytorch - talks	20	Python usage and errors
101	bug - fix - racist - possible bug - thing	19	Bug Fix with Racist Bug
102	training code - code training - code - share - share training	19	"Training Code Sharing"
103	license - accept - license license - model accept - indication	19	Model License
104	gpt - protgpt2 - 6b - jt - gpt jt	19	GPT-JT-6B-v1 Abilities
105	report report - report - - -	19	Multiple Reports on Topic
106	tuning fine - tune fine - fine - fine tuning - tuning	18	Fine-tuning for domain adaptation
107	inpaint model - inpaint - ix - size model - model pruned	18	Inpaint Model
108	config file - config - tokenizer config - files config - file	18	Config File Troubleshooting
109	sample code - example - sample - copied - error example	18	Issues with sample code
110	nsfw - nsfw content - content - disable - safety	18	NSFW Content Filtering
111	length - summary - longformer - summary length - text length	18	Length of Summaries
112	access download - access - download - access access - download working	18	Access Download
113	thank - thanks - just want - pretty - request thank	18	Thank you efforts
114	sd v1 - v1 - ema ckpt - sd - ema	18	Access to sd-v1-4-full-ema.ckpt
115	padding_side - tokens - token - cls token - token id	18	Padding and token discrepancy
116	amd - vram - gb - gpu - 448	17	"AMD GPU compatibility"
117	dataset - pretraining - dataset dataset - datasets - request dataset	17	Dataset Pretraining
118	version - ggml version - version ggml - ggml - pytorch version	17	"Version Possibility"
119	memory - leak - a100 - cuda memory - memory google	17	Memory-related Issues
120	trigger - words - word - trigger word - semantic	17	Trigger words and semantic search
121	result - results - output - score - ways	16	Visualizing Inference Results
122	sd - tested - sd sd - lora training - ui	16	Stable Diffusion LORA Training
123	ckpt file - bin - convert - weights - dreambooth	16	Convert Diffusion Diffusers to CKPT
124	need help - help - help help - need - started	16	Need Help Getting Started
125	keyerror - key - exception error - key error - codegen	16	KeyError Troubleshooting
126	controlnet - control - a1111 - installed - model embedding	16	ControlNet not working
127	implementation - issue - solved - np - experiencing	16	Implementation Issue Fix
128	runtimeerror - time series - everytime - process runtimeerror - try run	16	Time Series Runtime Error
129	use use - use - use readme - use diffusers - tk	15	How to use Diffusers
130	training dataset - dataset used - used dataset - nli - used training	15	Training Dataset Used
131	yaml files - colab pc - install run - diffusion google - train custom	15	Stable Diffusion Tutorials
132	spam - deleted - removed - delete - contact	15	Removal of Spam Discussion
133	details training - details - training - details details - details info	14	Training Details
134	hyper parameters - hyper - parameters - provide - provide training	14	Hyperparameter Optimization
135	fine tune - tune - ner - fine - emotions	14	Fine-tune Sentence Embeddings
136	model using - using model - examples - question lora - models used	14	Inkpunk Diffusion model
137	error running - running - running example - usage code - code	14	Error running example code
138	difference - alpaca - model difference - original model - difference model	14	Model Differences
139	install - locally - know install - run local - mini	14	"How to install locally"
140	training script - script - script training - sharing training - midi	13	Training Script
141	model file - missing model - corrupt - file model - file missing	13	Model File Issues
142	error help - help error - help - solve - try	13	Error Help
143	hardware - hardware requirements - requirements - gpu inference - requirements fine	13	Hardware Requirements for Inference
144	update - updated - channel - expired - new update	13	update query status
145	negative - negative prompt - negative prompts - prompts - prompt	13	"Negative Prompt Function"
146	unable run - unable - run unable - run - human	13	Unable to run on local machine
147	injection - nmkd gui - nmkd - tutorial videos - gui	12	Stable Diffusion Tutorial Videos
148	download download - download - request acces - know download - fim	12	"Download Instructions"
149	transformers - sentence transformers - huggingface transformers - different results - usage	12	Transformer Usage Discrepancy
150	link - broken link - broken - documentation - expired	11	Broken links and documentation
151	broke - padding - dead - kenlm - dropout	11	"Dead KenLM Finetuning"
152	training question - question training - training process - question regarding - question	11	Training Process Question
153	dataset training - training data - training dataset - data training - custom dataset	11	Training Data Quality
154	download - download download - possible download - hd 18 - hd	11	Troubleshooting download errors

Training hyperparameters

calculate_probabilities: False
language: None
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: None
seed_topic_list: None
top_n_words: 10
verbose: True

Framework versions

Numpy: 1.22.4
HDBSCAN: 0.8.33
UMAP: 0.5.3
Pandas: 1.5.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.2.2
Transformers: 4.31.0
Numba: 0.56.4
Plotly: 5.13.1
Python: 3.10.6