Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2457
DC FieldValueLanguage
dc.contributor.authorGraovac, Jelenaen_US
dc.contributor.authorPavlović Lažetić, Gordanaen_US
dc.contributor.authorKovačević, Jovanaen_US
dc.date.accessioned2025-09-06T07:28:56Z-
dc.date.available2025-09-06T07:28:56Z-
dc.date.issued2015-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2457-
dc.description.abstractWe introduce a new language independent text categorization technique based on n-grams profile representation of restricted size of both document and a category, an n-gram weighting factors scheme, and a simple algorithm for comparing profiles. The technique does not require any morphological analysis of texts, any preprocessing steps, or any prior information about document content or language. We apply it to the text categorization problem in two widely spoken yet paradigmatically quite different languages – English and Arabic, thus demonstrating language-independence. We used their publicly available document collections – 20-Newsgroups and Mesleh-10, respectively. Experimental results presented in terms of macro- and micro-averaged F1 measures imply that the new technique outperforms other n-gram based and bag-of-words machine learning techniques when applied to English and Arabic text categorization.en_US
dc.language.isoenen_US
dc.publisherBrazilian Computer Society Special Interest Group on Databasesen_US
dc.relation.ispartofJournal of Information and Data Managementen_US
dc.subjectArabicen_US
dc.subjectbyte-level n-gramen_US
dc.subjectEnglishen_US
dc.subjectkNNen_US
dc.subjectnatural language text categorizationen_US
dc.titleLanguage Independent n-Gram-Based Text Categorization with Weighting Factors: A Case Studyen_US
dc.typeArticleen_US
dc.identifier.urlhttps://periodicos.ufmg.br/index.php/jidm/article/view/279-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.issn2178-7107en_US
dc.relation.firstpage4en_US
dc.relation.lastpage17en_US
dc.relation.volume6en_US
dc.relation.issue1en_US
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.fulltextNo Fulltext-
item.grantfulltextnone-
item.openairetypeArticle-
item.languageiso639-1en-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-9323-4695-
crisitem.author.orcid0000-0002-0242-2472-
Appears in Collections:Research outputs
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.