General Information
This is a bert-base-cased
, binary classification model, fine-tuned to classify a given sentence as containing advertising content or not. It leverages previous-sentence context to make more accurate predictions.
The model is used in the paper 'Leveraging multimodal content for podcast summarization' published at ACM SAC 2022.
Usage:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('morenolq/spotify-podcast-advertising-classification')
tokenizer = AutoTokenizer.from_pretrained('morenolq/spotify-podcast-advertising-classification')
desc_sentences = ["Sentence 1", "Sentence 2", "Sentence 3"]
for i, s in enumerate(desc_sentences):
if i==0:
context = "__START__"
else:
context = desc_sentences[i-1]
out = tokenizer(context, s, padding = "max_length",
max_length = 256,
truncation=True,
return_attention_mask=True,
return_tensors = 'pt')
outputs = model(**out)
print (f"{s},{outputs}")
The manually annotated data, used for model fine-tuning are available here
Hereafter is the classification report of the model evaluation on the test split:
precision recall f1-score support
0 0.95 0.93 0.94 256
1 0.88 0.91 0.89 140
accuracy 0.92 396
macro avg 0.91 0.92 0.92 396
weighted avg 0.92 0.92 0.92 396
If you find it useful, please cite the following paper:
@inproceedings{10.1145/3477314.3507106,
author = {Vaiani, Lorenzo and La Quatra, Moreno and Cagliero, Luca and Garza, Paolo},
title = {Leveraging Multimodal Content for Podcast Summarization},
year = {2022},
isbn = {9781450387132},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477314.3507106},
doi = {10.1145/3477314.3507106},
booktitle = {Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing},
pages = {863โ870},
numpages = {8},
keywords = {multimodal learning, multimodal features fusion, extractive summarization, deep learning, podcast summarization},
location = {Virtual Event},
series = {SAC '22}
}
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.