Hate Speech Detection in Thai Social Media with Ordinal-Imbalanced Text Classification

264

Views

0

Downloads

Pasupa, Kitsuchart, Karnbanjob, Werasut and Aksornsiri, Massakorn (2022) Hate Speech Detection in Thai Social Media with Ordinal-Imbalanced Text Classification In: The 19th International Joint Conference on Computer Science and Software Engineering (JCSSE 2022), 22-25 June 2022, Bangkok, Thailand. (Submitted)

Abstract

Cyberbullying has become a serious problem in Thai social media. For example, some Thai people posted hate speeches on Myanmar workers in Thailand during the COVID-19 pandemic, which might elevate hate crime. It is imperative and urgent to detect cyberbullying on Thai social media. The task is a text classification problem. Moreover, hate speeches contain the order of severity levels, but many pieces of work did not consider this point in the model. Therefore, we developed a Thai hate-speech classification method with various loss functions to detect such hate speeches accurately. We evaluated them on a corpus of ordinal-imbalanced Thai text. The evaluated outcomes indicated that the best---in terms of F1-score--model was the model with a loss function of a hybrid between an Ordinal regression loss function and Pearson correlation coefficients (common in similarity function). It yielded an average F1-score of 78.38%--0.88% significantly higher than the score achieved by a conventional loss function--and an average mean squared error of 0.2478--5.49% relative improvement. Thus, the proposed hybrid loss function improved the efficiency of the model.

Item Type:

Conference or Workshop Item (Paper)

Subjects:

Subjects > Computer Science > Artificial Intelligence

Subjects > Computer Science > Machine Learning

Subjects > Computer Science > Computation and Language (Computational Linguistics and Natural Language and Speech Processing)

Deposited by:

Kitsuchart Pasupa

Date Deposited:

2022-05-23 13:56:35

Last Modified:

2022-11-03 11:24:00

Impact and Interest:

Statistics