Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit

Šošić, Milena; Stanković, Ranka; Graovac, Jelena

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/3219

DC Field	Value	Language
dc.contributor.author	Šošić, Milena	en_US
dc.contributor.author	Stanković, Ranka	en_US
dc.contributor.author	Graovac, Jelena	en_US
dc.date.accessioned	2026-03-19T14:00:15Z	-
dc.date.available	2026-03-19T14:00:15Z	-
dc.date.issued	2024	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/3219	-
dc.description.abstract	In the digital environment of South Slavic languages, emotion analysis in texts on social media is becoming increasingly important for understanding public opinion, creating personalized content, and analyzing user interactions. This presentation presents a detailed methodology and results of corpus annotation in the Serbian language according to Plutchik's categorization model, which identifies eight basic emotional categories: joy, sadness, anger, fear, trust, disgust, anticipation, and surprise. The aim of the research is to analyze the emotional content of texts taken from social media X (formerly Twitter) and Reddit, each collection containing around 17,000 individual messages and approximately 5,000 complete conversations. The corpus annotation process involved several stages: data collection and preparation, manual annotation by experts, verification of annotation accuracy, and statistical analysis of the harmonized labels. By using a multi-label annotation approach, a richer and more qualitative analysis of emotional states was made possible, with particular significance for the application in analyzing complex emotional content found on social media. To collect data, automated tools were used to download conversations written in Serbian from social media accounts that address current social, political, musical, and sports topics. Data preparation involved additional selection of messages to ensure the quality of their content, while maintaining the conversational structure of the retrieved data. During data preparation, messages were preliminarily annotated using automatic methods, employing both classical and advanced computational linguistics techniques to improve the efficiency of the manual labeling process. Teams of linguists and psychologists reviewed and assessed the automatically assigned labels for their validity concerning the textual content to which they were assigned. To ensure high accuracy and consistency, standardized procedures were used for training annotators and verifying their evaluations through statistical measures of annotation reliability. The analysis of annotation reliability demonstrated that it is possible to classify emotions in texts from social media in Serbian using Plutchik's model. Statistical data analysis revealed significant distributions of emotions in the messages and provided insights into users' emotional reactions to various emotional stimuli and thematic content. The multi-label categorized emotional corpus in Serbian Social-Emo.SR represents a significant advancement toward a deeper understanding of emotional dynamics on social media among users. In addition to enriching linguistic resources for the Serbian language, this corpus opens new possibilities for application in research, commercial applications, and enhancing mental health analysis of the population. The potential application of modern methodologies on the developed corpus would enable the creation of useful tools for recognizing and reflecting the complexity of human emotions in the current digital world within the Serbian-speaking community. The corpus will be published under open license CC-BY-4.0.	en_US
dc.language.iso	en	en_US
dc.publisher	Beograd : Filološki fakultet	en_US
dc.subject	Emotions	en_US
dc.subject	Plutchik's model	en_US
dc.subject	annotation	en_US
dc.subject	corpus	en_US
dc.subject	Social media	en_US
dc.subject	Serbian language	en_US
dc.title	Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit	en_US
dc.title.alternative	Social-Emo.SR: Emocionalna višeznačna kategorizacija konverzacionih poruka sa društvenih mreža X i Reddit	en_US
dc.type	Conference Object	en_US
dc.relation.conference	International Conference South Slavic Languages in Digital Environment (2024 ; Belgrade)	en_US
dc.relation.publication	International Conference South Slavic Languages in Digital Environment JuDig : Book of Abstracts	en_US
dc.identifier.url	https://judig.jerteh.rs/images/knjige/JUDIG-2024-book%20of%20abstracts.pdf	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.isbn	978-86-6153-754-7	en_US
dc.description.rank	M34	en_US
dc.relation.firstpage	58	en_US
dc.relation.lastpage	59	en_US
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.openairetype	Conference Object	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.orcid	0000-0002-9323-4695	-
Appears in Collections:	Research outputs

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM