Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2054
DC FieldValueLanguage
dc.contributor.authorGraovac, Jelenaen_US
dc.contributor.authorMladenović, Miljanaen_US
dc.contributor.authorTanasijević, Ivanaen_US
dc.date.accessioned2025-05-17T12:32:31Z-
dc.date.available2025-05-17T12:32:31Z-
dc.date.issued2019-01-01-
dc.identifier.issn1088467X-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2054-
dc.description.abstractDue to the rapid growth of web platforms such as blogs, discussion forums, peer-to-peer networks, and various other types of social media, Sentiment Polarity Detection (SPD) (classifying texts by "positive" or "negative" orientation) has become more important and challenging task in recent years. There is a growing need for management and study of SPD not only in English, but also in other languages. The key reason for using Machine Learning (ML) for SPD lies in engineering a representative set of features. This paper explores different (byte, character and word) n-gram based text representation models in order to determine the most valuable model for the representation of text documents in various languages, which can be used successfully by ML classification techniques for solving SPD task. Proposed n-gram models were used in conjunction with k Nearest Neighbourhood (kNN), Support Vector Machine (SVM) and Maximum Entropy (MaxEnt) algorithms to determine opinion polarity of the proposed movie reviews. The effectiveness and language independence of the proposed n-gram models were demonstrated in experiments performed on seven publicly available movie review benchmarks in Arabic, Czech, English, French, Spanish,Turkish, and Serbian being the authors' mother tongue. Formal evaluation has confirmed that the proposed byte and character n-gram models outperform word n-gram model, and in conjunction with the presented MaxEnt algorithm outperform other ML supervised techniques used with more complex document representation approaches. In some cases (Arabic, Czech, French, Serbian and Turkish), signficant improvements over the baselines have been achieved. Despite their simplicity and broad applicability, byte and character n-grams have been shown to be able to capture information on different levels - lexical and syntactic.en_US
dc.language.isoenen_US
dc.publisherSage Journals-IOS Pressen_US
dc.relation.ispartofIntelligent Data Analysisen_US
dc.subjectkNNen_US
dc.subjectMaxEnten_US
dc.subjectmovie reviewsen_US
dc.subjectn-gramsen_US
dc.subjectSentiment polarity detectionen_US
dc.subjectSVMen_US
dc.titleNgramSPD: Exploring optimal n -gram model for sentiment polarity detection in different languagesen_US
dc.typeArticleen_US
dc.identifier.doi10.3233/IDA-183879-
dc.identifier.scopus2-s2.0-85064405058-
dc.identifier.isi000464031700003-
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/85064405058-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.issn1088-467Xen_US
dc.description.rankM23en_US
dc.relation.firstpage279en_US
dc.relation.lastpage296en_US
dc.relation.volume23en_US
dc.relation.issue2en_US
item.openairetypeArticle-
item.fulltextNo Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.grantfulltextnone-
item.languageiso639-1en-
item.cerifentitytypePublications-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-9323-4695-
crisitem.author.orcid0000-0003-3764-1269-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

14
checked on Jun 12, 2025

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.