Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2552
DC FieldValueLanguage
dc.contributor.authorGraovac, Jelenaen_US
dc.date.accessioned2025-09-16T13:18:41Z-
dc.date.available2025-09-16T13:18:41Z-
dc.date.issued2014-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2552-
dc.description.abstractA technique for automated categorization of text documents, based on byte-level n-gram profiles and a new dissimilarity measure between profiles is presented. K nearest neighbors classifier is used. The technique is language independent. It has been applied to four document collections in English, Chinese and Serbian: Reuters-21578 newswire articles, 20-Newsgroups, Tancorp and Ebart. The evaluation was done by using the micro- and macro-averaged function. The results obtained confirm that the presented technique, although very simple, in the case of Tancorp and 20-Newsgroups corpora achieves better results than other n-gram based techniques. As compared to other state-of-the-art methods, it performs better than “bag-of-words” K nearest neighbors classifier and in the case of 20-Newsgroups corpus it works even better than “bag-of-words” Support vector machines classifier. It can be successfully used in a variety of related problems.en_US
dc.language.isoenen_US
dc.publisherSage Journalsen_US
dc.relation.ispartofIntelligent Data Analysisen_US
dc.titleA variant of n-gram based language-independent text categorizationen_US
dc.typeArticleen_US
dc.identifier.doi10.3233/ida-140663-
dc.identifier.scopus2-s2.0-84948397107-
dc.identifier.isi000339703000009-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.issn1088-467Xen_US
dc.description.rankM23en_US
dc.relation.firstpage677en_US
dc.relation.lastpage695en_US
dc.relation.volume18en_US
dc.relation.issue4en_US
item.languageiso639-1en-
item.cerifentitytypePublications-
item.openairetypeArticle-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextNo Fulltext-
item.grantfulltextnone-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-9323-4695-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

21
checked on Sep 24, 2025

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.