Edit model card

xsum_123_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 47
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - police - people - would 5 -1_said_mr_police_people
0 win - game - half - foul - league 1132 0_win_game_half_foul
1 eu - labour - party - would - uk 591 1_eu_labour_party_would
2 athlete - sport - gold - olympic - medal 149 2_athlete_sport_gold_olympic
3 nhs - health - care - patient - hospital 104 3_nhs_health_care_patient
4 growth - price - market - sale - economy 84 4_growth_price_market_sale
5 president - mr - government - maduro - rousseff 71 5_president_mr_government_maduro
6 crash - police - hospital - road - driver 58 6_crash_police_hospital_road
7 murray - match - set - tennis - seed 46 7_murray_match_set_tennis
8 syrian - us - syria - rebel - force 45 8_syrian_us_syria_rebel
9 school - education - pupil - schools - child 41 9_school_education_pupil_schools
10 animal - zoo - wildlife - bird - specie 40 10_animal_zoo_wildlife_bird
11 film - actor - star - series - drama 38 11_film_actor_star_series
12 abuse - court - sexual - police - victim 38 12_abuse_court_sexual_police
13 trump - mr - clinton - republican - president 31 13_trump_mr_clinton_republican
14 fire - blaze - building - service - firefighters 31 14_fire_blaze_building_service
15 suu - party - mr - government - election 29 15_suu_party_mr_government
16 china - korea - chinese - south - north 29 16_china_korea_chinese_south
17 album - band - song - music - best 25 17_album_band_song_music
18 ms - heard - court - death - said 24 18_ms_heard_court_death
19 wales - welsh - said - train - government 23 19_wales_welsh_said_train
20 road - police - death - seen - found 23 20_road_police_death_seen
21 passenger - crew - sea - boat - aircraft 23 21_passenger_crew_sea_boat
22 russian - ukraine - russia - mr - ukrainian 22 22_russian_ukraine_russia_mr
23 fight - joshua - title - khan - boxing 22 23_fight_joshua_title_khan
24 samsung - phone - app - android - user 20 24_samsung_phone_app_android
25 earthquake - particle - nepal - building - mars 19 25_earthquake_particle_nepal_building
26 highways - traffic - dartford - council - road 18 26_highways_traffic_dartford_council
27 vettel - hamilton - lap - race - alonso 18 27_vettel_hamilton_lap_race
28 park - building - visitor - festival - visitscotland 16 28_park_building_visitor_festival
29 site - council - street - project - plan 15 29_site_council_street_project
30 abdeslam - paris - attack - belgian - salah 15 30_abdeslam_paris_attack_belgian
31 virus - ebola - disease - hiv - sierra 14 31_virus_ebola_disease_hiv
32 security - data - attack - cyber - malware 14 32_security_data_attack_cyber
33 dog - dogs - stray - pet - owner 14 33_dog_dogs_stray_pet
34 birdie - pga - bogey - woods - open 13 34_birdie_pga_bogey_woods
35 man - police - wearing - incident - anyone 13 35_man_police_wearing_incident
36 energy - pipeline - waste - renewables - electricity 13 36_energy_pipeline_waste_renewables
37 silence - bishop - belfast - people - attended 11 37_silence_bishop_belfast_people
38 painting - art - work - artist - exhibition 11 38_painting_art_work_artist
39 eyre - gaunt - lyttle - peter - court 10 39_eyre_gaunt_lyttle_peter
40 crime - police - force - constable - chief 9 40_crime_police_force_constable
41 flood - river - rain - louisiana - flooded 9 41_flood_river_rain_louisiana
42 charity - abuse - yentob - porn - batmanghelidjh 7 42_charity_abuse_yentob_porn
43 india - nidar - gun - yrf - film 6 43_india_nidar_gun_yrf
44 driving - stirling - winn - fraser - road 6 44_driving_stirling_winn_fraser
45 boko - haram - shekau - militant - monguno 5 45_boko_haram_shekau_militant

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.