Edit model card

xsum_55555_3000_1500_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_55555_3000_1500_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 26
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - also - would - people 5 -1_said_mr_also_would
0 police - said - mr - court - heard 716 0_police_said_mr_court
1 syria - turkey - syrian - military - said 112 1_syria_turkey_syrian_military
2 foul - win - kick - half - shot 72 2_foul_win_kick_half
3 growth - year - bank - business - economy 68 3_growth_year_bank_business
4 council - said - building - development - new 63 4_council_said_building_development
5 england - cricket - captain - test - wicket 48 5_england_cricket_captain_test
6 league - club - season - loan - transfer 42 6_league_club_season_loan
7 sport - gold - world - athlete - olympic 38 7_sport_gold_world_athlete
8 film - music - best - star - song 36 8_film_music_best_star
9 party - labour - mr - leader - said 33 9_party_labour_mr_leader
10 ireland - wales - leinster - rugby - player 32 10_ireland_wales_leinster_rugby
11 care - nhs - hospital - patient - said 27 11_care_nhs_hospital_patient
12 road - crash - police - collision - car 26 12_road_crash_police_collision
13 dog - animal - greyhound - racing - owner 23 13_dog_animal_greyhound_racing
14 ship - beach - said - lifeguard - rnli 22 14_ship_beach_said_lifeguard
15 school - education - child - council - said 20 15_school_education_child_council
16 wales - bill - welsh - labour - assembly 19 16_wales_bill_welsh_labour
17 eu - uk - european - europe - referendum 18 17_eu_uk_european_europe
18 fire - blaze - bus - flame - said 18 18_fire_blaze_bus_flame
19 mr - president - besigye - maduro - election 16 19_mr_president_besigye_maduro
20 race - froome - stage - second - lap 13 20_race_froome_stage_second
21 rail - train - rmt - scotrail - transport 10 21_rail_train_rmt_scotrail
22 planet - earth - electron - theory - mars 10 22_planet_earth_electron_theory
23 ryder - cup - tour - pga - mcilroy 7 23_ryder_cup_tour_pga
24 email - lazar - fbi - guccifer - ferizi 6 24_email_lazar_fbi_guccifer

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.