BERTopic / README.md
keonju's picture
Add BERTopic model
627ebfb
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# BERTopic
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("keonju/BERTopic")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 158
* Number of training documents: 10158
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | and - the - of - in - to | 10 | -1_and_the_of_in |
| 0 | holocene - china - the - monsoon - bp | 3858 | 0_holocene_china_the_monsoon |
| 1 | energy - biofuels - production - biodiesel - bioenergy | 291 | 1_energy_biofuels_production_biodiesel |
| 2 | coal - coals - the - basin - seams | 248 | 2_coal_coals_the_basin |
| 3 | yr - holocene - the - bp - and | 205 | 3_yr_holocene_the_bp |
| 4 | hg - mercury - mehg - of hg - in | 202 | 4_hg_mercury_mehg_of hg |
| 5 | ch4 - methane - emissions - fluxes - flux | 159 | 5_ch4_methane_emissions_fluxes |
| 6 | data - forest - spectral - for - mapping | 118 | 6_data_forest_spectral_for |
| 7 | bp - the - holocene - pollen - lake | 116 | 7_bp_the_holocene_pollen |
| 8 | wetlands - wetland - and - are - of | 104 | 8_wetlands_wetland_and_are |
| 9 | co2 - ecosystem - nee - exchange - net | 103 | 9_co2_ecosystem_nee_exchange |
| 10 | species - of - fen - the - restoration | 100 | 10_species_of_fen_the |
| 11 | peat - tropical - peatlands - palm - peatland | 98 | 11_peat_tropical_peatlands_palm |
| 12 | pb - lead - atmospheric - metal - deposition | 96 | 12_pb_lead_atmospheric_metal |
| 13 | the - lake - of the - of - poland | 93 | 13_the_lake_of the_of |
| 14 | pm2 - haze - burning - air - aerosol | 90 | 14_pm2_haze_burning_air |
| 15 | doc - catchments - carbon - organic carbon - export | 88 | 15_doc_catchments_carbon_organic carbon |
| 16 | the - carbon - of - co2 - of the | 73 | 16_the_carbon_of_co2 |
| 17 | wetland - wetlands - classification - mapping - and | 69 | 17_wetland_wetlands_classification_mapping |
| 18 | uv - ozone - o3 - isoprene - elevated | 67 | 18_uv_ozone_o3_isoprene |
| 19 | mediterranean - the - glacial - iberian - during | 66 | 19_mediterranean_the_glacial_iberian |
| 20 | media - compost - growing media - growing - biochar | 63 | 20_media_compost_growing media_growing |
| 21 | 137cs - of 137cs - sup - ce sup - radiocaesium | 63 | 21_137cs_of 137cs_sup_ce sup |
| 22 | testate - amoebae - testate amoebae - of testate - amoeba | 62 | 22_testate_amoebae_testate amoebae_of testate |
| 23 | peat - pyrolysis - lignin - gc - of | 62 | 23_peat_pyrolysis_lignin_gc |
| 24 | cu - zn - metals - peat - elements | 62 | 24_cu_zn_metals_peat |
| 25 | alkanes - alkane - chain - values - plants | 61 | 25_alkanes_alkane_chain_values |
| 26 | permafrost - active layer - thermal - ground - layer | 60 | 26_permafrost_active layer_thermal_ground |
| 27 | streams - diatom - species - macroinvertebrate - stream | 60 | 27_streams_diatom_species_macroinvertebrate |
| 28 | records - the - of - record - ireland | 60 | 28_records_the_of_record |
| 29 | water - flow - groundwater - recharge - runoff | 59 | 29_water_flow_groundwater_recharge |
| 30 | habitat - species - breeding - bird - nest | 57 | 30_habitat_species_breeding_bird |
| 31 | brgdgts - gdgts - glycerol - brgdgt - branched | 56 | 31_brgdgts_gdgts_glycerol_brgdgt |
| 32 | deposition - nitrogen - nitrogen deposition - sphagnum - of | 55 | 32_deposition_nitrogen_nitrogen deposition_sphagnum |
| 33 | oil sands - sands - fen - oil - reclamation | 54 | 33_oil sands_sands_fen_oil |
| 34 | fire - burned - severity - burning - post fire | 54 | 34_fire_burned_severity_burning |
| 35 | acidification - deposition - acid - ph - catchment | 54 | 35_acidification_deposition_acid_ph |
| 36 | farm - land - agricultural - farmers - policy | 53 | 36_farm_land_agricultural_farmers |
| 37 | cdom - doc - dom - dissolved organic - dissolved | 53 | 37_cdom_doc_dom_dissolved organic |
| 38 | redd - indonesia - deforestation - in indonesia - forest | 50 | 38_redd_indonesia_deforestation_in indonesia |
| 39 | ash - wood ash - wood - growth - of wood | 49 | 39_ash_wood ash_wood_growth |
| 40 | fungal - fungi - mycorrhizal - species - root | 49 | 40_fungal_fungi_mycorrhizal_species |
| 41 | stand - growth - models - tree - stands | 49 | 41_stand_growth_models_tree |
| 42 | smouldering - smoldering - spread - peat - combustion | 49 | 42_smouldering_smoldering_spread_peat |
| 43 | pollen - of pollen - vegetation - of - from | 49 | 43_pollen_of pollen_vegetation_of |
| 44 | arsenic - as - of as - fe - of arsenic | 49 | 44_arsenic_as_of as_fe |
| 45 | ch4 - methane - production - peat - methanogenesis | 47 | 45_ch4_methane_production_peat |
| 46 | africa - the - bp - south - late | 46 | 46_africa_the_bp_south |
| 47 | soc - carbon - soil - stocks - land | 45 | 47_soc_carbon_soil_stocks |
| 48 | soil - organic - carbon - soil organic - soils | 45 | 48_soil_organic_carbon_soil organic |
| 49 | wetlands - constructed - wetland - treatment - phosphorus | 43 | 49_wetlands_constructed_wetland_treatment |
| 50 | microbial - rare - soil - bacterial - diversity | 43 | 50_microbial_rare_soil_bacterial |
| 51 | litter - decomposition - mass loss - litter decomposition - mass | 39 | 51_litter_decomposition_mass loss_litter decomposition |
| 52 | co2 - pco2 - emissions - carbon - ch4 | 39 | 52_co2_pco2_emissions_carbon |
| 53 | soc - carbon - wetland - wetlands - soil | 39 | 53_soc_carbon_wetland_wetlands |
| 54 | countries - emissions - emission - to - climate | 38 | 54_countries_emissions_emission_to |
| 55 | services - ecosystem - ecosystem services - es - pes | 37 | 55_services_ecosystem_ecosystem services_es |
| 56 | catalyst - peat - pyrolysis - char - catalysts | 37 | 56_catalyst_peat_pyrolysis_char |
| 57 | clearfelling - water - phosphorus - buffer - nutrient | 35 | 57_clearfelling_water_phosphorus_buffer |
| 58 | forest - forests - trees - tree - stands | 35 | 58_forest_forests_trees_tree |
| 59 | carbon - climate - atmosphere - earth - carbon cycle | 34 | 59_carbon_climate_atmosphere_earth |
| 60 | tephra - volcanic - cryptotephra - eruptions - tephras | 34 | 60_tephra_volcanic_cryptotephra_eruptions |
| 61 | testate - arcellinida - coi - species - amoebae | 34 | 61_testate_arcellinida_coi_species |
| 62 | methane - methanogenic - community - methanogen - methanogens | 34 | 62_methane_methanogenic_community_methanogen |
| 63 | consolidation - soil - embankment - road - the | 33 | 63_consolidation_soil_embankment_road |
| 64 | species - spider - bogs - spiders - habitat | 33 | 64_species_spider_bogs_spiders |
| 65 | evaporation - energy - model - was - the | 33 | 65_evaporation_energy_model_was |
| 66 | phosphorus - catchment - in - tp - concentrations | 33 | 66_phosphorus_catchment_in_tp |
| 67 | co2 - ch4 - marsh - wetland - emissions | 33 | 67_co2_ch4_marsh_wetland |
| 68 | runoff - peat - channels - flow - catchment | 33 | 68_runoff_peat_channels_flow |
| 69 | nutrient - nitrogen - fertilizer - litter - of | 32 | 69_nutrient_nitrogen_fertilizer_litter |
| 70 | brazil - bp - the - of - in the | 31 | 70_brazil_bp_the_of |
| 71 | tsunami - holocene - the - volcanic - deposits | 30 | 71_tsunami_holocene_the_volcanic |
| 72 | climate change - change - climate - biodiversity - ecosystem | 30 | 72_climate change_change_climate_biodiversity |
| 73 | gpr - resistivity - radar - penetrating - penetrating radar | 29 | 73_gpr_resistivity_radar_penetrating |
| 74 | holocene - the - andes - and - bp | 29 | 74_holocene_the_andes_and |
| 75 | permafrost - soc - soil - soils - arctic | 28 | 75_permafrost_soc_soil_soils |
| 76 | policy - forest - owners - arguments - forest owners | 28 | 76_policy_forest_owners_arguments |
| 77 | bog - poland - peatland - europe - ca | 28 | 77_bog_poland_peatland_europe |
| 78 | ch4 - oxidation - methane - paddy - aom | 28 | 78_ch4_oxidation_methane_paddy |
| 79 | enzyme - enzymes - eea - soil - activities | 28 | 79_enzyme_enzymes_eea_soil |
| 80 | channel - catchment - flow - bends - model | 28 | 80_channel_catchment_flow_bends |
| 81 | soil - soil science - science - of soil - eu | 27 | 81_soil_soil science_science_of soil |
| 82 | pahs - pah - polycyclic aromatic - polycyclic - aromatic | 27 | 82_pahs_pah_polycyclic aromatic_polycyclic |
| 83 | n2o - n2o emissions - emissions - emission - nitrous | 26 | 83_n2o_n2o emissions_emissions_emission |
| 84 | peat water - adsorption - electrocoagulation - brackish peat - brackish peat water | 26 | 84_peat water_adsorption_electrocoagulation_brackish peat |
| 85 | mangrove - mangroves - carbon - coastal - b2 | 26 | 85_mangrove_mangroves_carbon_coastal |
| 86 | species - retention - alien - richness - forests | 25 | 86_species_retention_alien_richness |
| 87 | colloidal - river - elements - fe - colloids | 25 | 87_colloidal_river_elements_fe |
| 88 | sulfate - sulfur - 34s - peat - sulphur | 24 | 88_sulfate_sulfur_34s_peat |
| 89 | caribou - habitat - woodland caribou - populations - wolf | 24 | 89_caribou_habitat_woodland caribou_populations |
| 90 | food - agriculture - food system - change - covid 19 | 24 | 90_food_agriculture_food system_change |
| 91 | microbial - community - microbial community - communities - bacterial | 23 | 91_microbial_community_microbial community_communities |
| 92 | sorption - cu - ions - ii - cu ii | 22 | 92_sorption_cu_ions_ii |
| 93 | fire - fires - algorithm - frp - hotspot | 22 | 93_fire_fires_algorithm_frp |
| 94 | choice - wtp - preferences - valuation - choice experiment | 22 | 94_choice_wtp_preferences_valuation |
| 95 | nematodes - earthworm - soil - food - nematode | 22 | 95_nematodes_earthworm_soil_food |
| 96 | conservation - orangutan - habitat - forest - species | 21 | 96_conservation_orangutan_habitat_forest |
| 97 | cushion - accumulation - peat - amazonian - vegetation | 21 | 97_cushion_accumulation_peat_amazonian |
| 98 | ch4 - oxidation - ch4 oxidation - uptake - ch4 uptake | 20 | 98_ch4_oxidation_ch4 oxidation_uptake |
| 99 | tidal - sediment - coastal - delta - the | 20 | 99_tidal_sediment_coastal_delta |
| 100 | emissions - co2 - ghg - n2o - table | 20 | 100_emissions_co2_ghg_n2o |
| 101 | methane - ph - cytochrome - methanotrophs - acetic acid | 20 | 101_methane_ph_cytochrome_methanotrophs |
| 102 | patterns - model - self organization - evolutionary - self | 20 | 102_patterns_model_self organization_evolutionary |
| 103 | nitrogen - denitrification - n2o - soil - n2 | 20 | 103_nitrogen_denitrification_n2o_soil |
| 104 | birch - rotation - biomass - buds - biomass production | 19 | 104_birch_rotation_biomass_buds |
| 105 | fire - wildfire - fires - wildfires - health | 19 | 105_fire_wildfire_fires_wildfires |
| 106 | grazing - heathland - heather - moorland - england | 19 | 106_grazing_heathland_heather_moorland |
| 107 | emissions - fire - burning - fire emissions - biomass burning | 19 | 107_emissions_fire_burning_fire emissions |
| 108 | peat - landslides - failure - of peat - peat compaction | 18 | 108_peat_landslides_failure_of peat |
| 109 | biochar - straw - soil - fe - bc | 18 | 109_biochar_straw_soil_fe |
| 110 | ecosystem - respiration - carbon - ecosystem respiration - meadow | 17 | 110_ecosystem_respiration_carbon_ecosystem respiration |
| 111 | wetland - wetlands - risk - of wetland - the wetland | 17 | 111_wetland_wetlands_risk_of wetland |
| 112 | dom - thm - groundwater - molecular - organic | 17 | 112_dom_thm_groundwater_molecular |
| 113 | geochemistry - landscape geochemistry - rocks - peat - mafic | 17 | 113_geochemistry_landscape geochemistry_rocks_peat |
| 114 | tundra - ch4 - n2o - fluxes - antarctic | 16 | 114_tundra_ch4_n2o_fluxes |
| 115 | cellulose - sphagnum - isotopic - isotope - δ18ocel | 16 | 115_cellulose_sphagnum_isotopic_isotope |
| 116 | solute - transport - chloride - peat - pore | 16 | 116_solute_transport_chloride_peat |
| 117 | charcoal - fire - fires - holocene - fire history | 15 | 117_charcoal_fire_fires_holocene |
| 118 | ghg - agricultural - dairy - abatement - emissions | 15 | 118_ghg_agricultural_dairy_abatement |
| 119 | palm - oil - palm oil - sustainability - industry | 15 | 119_palm_oil_palm oil_sustainability |
| 120 | humic - humic substances - substances - acids - fluorescence | 15 | 120_humic_humic substances_substances_acids |
| 121 | canopy - ndvi - pri - lue - phenological | 15 | 121_canopy_ndvi_pri_lue |
| 122 | pollen - bog - peat - the - human impact | 15 | 122_pollen_bog_peat_the |
| 123 | marshes - tidal - marshes are - salt - or | 15 | 123_marshes_tidal_marshes are_salt |
| 124 | soil - prediction - mapping - covariates - dsm | 15 | 124_soil_prediction_mapping_covariates |
| 125 | si - of si - silicon - biogenic - protozoic | 14 | 125_si_of si_silicon_biogenic |
| 126 | et - evapotranspiration - le - wetland - rice | 14 | 126_et_evapotranspiration_le_wetland |
| 127 | forest - finland - forests - stock - management | 14 | 127_forest_finland_forests_stock |
| 128 | iodine - 129i - sorption - iodide - the sorption | 14 | 128_iodine_129i_sorption_iodide |
| 129 | palm - oil - palm oil - smallholders - certification | 14 | 129_palm_oil_palm oil_smallholders |
| 130 | dndc - model - models - soil - carbon | 14 | 130_dndc_model_models_soil |
| 131 | snow - thaw - cover - sca - data | 14 | 131_snow_thaw_cover_sca |
| 132 | stx2 - microbiota - gut - gut microbiota - microbial | 13 | 132_stx2_microbiota_gut_gut microbiota |
| 133 | dom - doc - organic - dissolved organic - of dom | 13 | 133_dom_doc_organic_dissolved organic |
| 134 | forest - cbm - ontario - cfs3 - cbm cfs3 | 13 | 134_forest_cbm_ontario_cfs3 |
| 135 | wind - wind farms - farms - onshore - onshore wind | 13 | 135_wind_wind farms_farms_onshore |
| 136 | uranium - of uranium - 232th - th - ar | 13 | 136_uranium_of uranium_232th_th |
| 137 | groundwater - springs - spring - gdes - discharge | 13 | 137_groundwater_springs_spring_gdes |
| 138 | fire - forest - boreal - burned - fires | 13 | 138_fire_forest_boreal_burned |
| 139 | metal - metals - cd - sediments - zn | 13 | 139_metal_metals_cd_sediments |
| 140 | slr - sea level - coastal - sea - sea level rise | 13 | 140_slr_sea level_coastal_sea |
| 141 | damo - methane - anaerobic - oxidation - aom | 12 | 141_damo_methane_anaerobic_oxidation |
| 142 | temperature - microbial - soil - co2 - pd | 12 | 142_temperature_microbial_soil_co2 |
| 143 | soil - respiration - root - soil respiration - enchytraeid | 12 | 143_soil_respiration_root_soil respiration |
| 144 | kerp - fusiformisporites - permian - genus - flora | 11 | 144_kerp_fusiformisporites_permian_genus |
| 145 | dust - dust deposition - dust sources - deposition - atmospheric dust | 11 | 145_dust_dust deposition_dust sources_deposition |
| 146 | methane - sources - ch4 - les - de | 11 | 146_methane_sources_ch4_les |
| 147 | n2o - n2o emissions - emissions - permafrost - n2o fluxes | 11 | 147_n2o_n2o emissions_emissions_permafrost |
| 148 | australia - mis - record - ka - crater | 11 | 148_australia_mis_record_ka |
| 149 | oc - fjords - fjord - lakes - of oc | 10 | 149_oc_fjords_fjord_lakes |
| 150 | fe - reduction - fe iii - sr10 - iron | 10 | 150_fe_reduction_fe iii_sr10 |
| 151 | loading - eutrophication - nitrogen - coastal - phytoplankton | 10 | 151_loading_eutrophication_nitrogen_coastal |
| 152 | model - wetlands - groundwater - water - the wetlands | 10 | 152_model_wetlands_groundwater_water |
| 153 | co2 - soil - co2 efflux - soil co2 efflux - soil co2 | 10 | 153_co2_soil_co2 efflux_soil co2 efflux |
| 154 | transfer - transfer functions - transfer function - testate - functions | 10 | 154_transfer_transfer functions_transfer function_testate |
| 155 | peat - spain - bog - matter - autofluorescent | 10 | 155_peat_spain_bog_matter |
| 156 | isbas - insar - subsidence - motion - deformation | 10 | 156_isbas_insar_subsidence_motion |
</details>
## Training hyperparameters
* calculate_probabilities: False
* language: None
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 3)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 30
* verbose: False
## Framework versions
* Numpy: 1.22.4
* HDBSCAN: 0.8.29
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.30.2
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.12