Highlight Detection in Podcasts: A Multimodal Deep Learning Approach

2

Views

0

Downloads

Phuengpanyaloet, Wongsapat, Boonruengkhao, Nonpipat, Anchutin, Viktor, Pasupa, Kitsuchart and Loo, Chu Kiong (2024) Highlight Detection in Podcasts: A Multimodal Deep Learning Approach In: The 31th International Conference on Neural Information Processing (ICONIP 2024) Lecture Note in Computer Science, 15294 Springer Nature Singapore, Auckland, New Zealand, 1-15. ISBN 978-981-96-6599-0

Abstract

Podcasts have become a pervasive form of digital media, offering diverse content that often spans long hours. However, the vast volume of podcast episodes can make it challenging for listeners to locate the most engaging segments. Speech Emotion Recognition (SER) has witnessed remarkable advancements with the integration of deep learning techniques. This work proposes utilizing deep learning techniques employed in SER to discern emotional cues within podcasts, thereby enabling the detection of highlights. The task is framed as a binary classification problem, where the positive class contains examples of speech segments with high emotional activation. Transfer learning techniques from computer vision and speech recognition domains are applied, utilizing pre-trained models such as ConvNeXt, Vision Transformer, and wav2vec 2.0, which are compared with a baseline Convolutional Neural Network-Transformer hybrid. Additionally, multimodal models are introduced that learn from two distinct modalities: log mel-spectrograms and high-dimensional vector embeddings, both extracted from the raw audio data. The two modalities are combined using (i) a Simple Concatenated and (ii) CentralNet models. Experimental results demonstrate the effectiveness of combining two modalities over a single modality, achieving F1-scores of 0.6111 and 0.6270 for the Simple Concatenated and CentralNet models, respectively.

Item Type:

Book Section

Identification Number (DOI):

Subjects:

Subjects > Computer Science > Artificial Intelligence

Subjects > Computer Science > Machine Learning

Deposited by:

Kitsuchart Pasupa

Date Deposited:

2026-01-06 22:38:51

Last Modified:

2026-01-07 15:45:04

Impact and Interest:

Statistics