NgramSPD: Exploring optimal n -gram model for sentiment polarity detection in different languages

Graovac, Jelena; Mladenović, Miljana; Tanasijević, Ivana

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2054

DC Field	Value	Language
dc.contributor.author	Graovac, Jelena	en_US
dc.contributor.author	Mladenović, Miljana	en_US
dc.contributor.author	Tanasijević, Ivana	en_US
dc.date.accessioned	2025-05-17T12:32:31Z	-
dc.date.available	2025-05-17T12:32:31Z	-
dc.date.issued	2019-01-01	-
dc.identifier.issn	1088467X	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/2054	-
dc.description.abstract	Due to the rapid growth of web platforms such as blogs, discussion forums, peer-to-peer networks, and various other types of social media, Sentiment Polarity Detection (SPD) (classifying texts by "positive" or "negative" orientation) has become more important and challenging task in recent years. There is a growing need for management and study of SPD not only in English, but also in other languages. The key reason for using Machine Learning (ML) for SPD lies in engineering a representative set of features. This paper explores different (byte, character and word) n-gram based text representation models in order to determine the most valuable model for the representation of text documents in various languages, which can be used successfully by ML classification techniques for solving SPD task. Proposed n-gram models were used in conjunction with k Nearest Neighbourhood (kNN), Support Vector Machine (SVM) and Maximum Entropy (MaxEnt) algorithms to determine opinion polarity of the proposed movie reviews. The effectiveness and language independence of the proposed n-gram models were demonstrated in experiments performed on seven publicly available movie review benchmarks in Arabic, Czech, English, French, Spanish,Turkish, and Serbian being the authors' mother tongue. Formal evaluation has confirmed that the proposed byte and character n-gram models outperform word n-gram model, and in conjunction with the presented MaxEnt algorithm outperform other ML supervised techniques used with more complex document representation approaches. In some cases (Arabic, Czech, French, Serbian and Turkish), signficant improvements over the baselines have been achieved. Despite their simplicity and broad applicability, byte and character n-grams have been shown to be able to capture information on different levels - lexical and syntactic.	en_US
dc.language.iso	en	en_US
dc.publisher	Sage Journals-IOS Press	en_US
dc.relation.ispartof	Intelligent Data Analysis	en_US
dc.subject	kNN	en_US
dc.subject	MaxEnt	en_US
dc.subject	movie reviews	en_US
dc.subject	n-grams	en_US
dc.subject	Sentiment polarity detection	en_US
dc.subject	SVM	en_US
dc.title	NgramSPD: Exploring optimal n -gram model for sentiment polarity detection in different languages	en_US
dc.type	Article	en_US
dc.identifier.doi	10.3233/IDA-183879	-
dc.identifier.scopus	2-s2.0-85064405058	-
dc.identifier.isi	000464031700003	-
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85064405058	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.issn	1088-467X	en_US
dc.description.rank	M23	en_US
dc.relation.firstpage	279	en_US
dc.relation.lastpage	296	en_US
dc.relation.volume	23	en_US
dc.relation.issue	2	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
item.grantfulltext	none	-
item.languageiso639-1	en	-
item.openairetype	Article	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.orcid	0000-0002-9323-4695	-
crisitem.author.orcid	0000-0003-3764-1269	-
Appears in Collections:	Research outputs

Show simple item record

SCOPUS^TM
Citations

14

checked on Apr 21, 2026

Google Scholar^TM

Check

SCOPUS^TM
Citations

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM