Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2552
DC FieldValueLanguage
dc.contributor.authorGraovac, Jelenaen_US
dc.date.accessioned2025-09-16T13:18:41Z-
dc.date.available2025-09-16T13:18:41Z-
dc.date.issued2014-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2552-
dc.description.abstractA technique for automated categorization of text documents, based on byte-level n-gram profiles and a new dissimilarity measure between profiles is presented. K nearest neighbors classifier is used. The technique is language independent. It has been applied to four document collections in English, Chinese and Serbian: Reuters-21578 newswire articles, 20-Newsgroups, Tancorp and Ebart. The evaluation was done by using the micro- and macro-averaged function. The results obtained confirm that the presented technique, although very simple, in the case of Tancorp and 20-Newsgroups corpora achieves better results than other n-gram based techniques. As compared to other state-of-the-art methods, it performs better than “bag-of-words” K nearest neighbors classifier and in the case of 20-Newsgroups corpus it works even better than “bag-of-words” Support vector machines classifier. It can be successfully used in a variety of related problems.en_US
dc.language.isoenen_US
dc.publisherSage Journalsen_US
dc.relation.ispartofIntelligent Data Analysisen_US
dc.titleA variant of n-gram based language-independent text categorizationen_US
dc.typeArticleen_US
dc.identifier.doi10.3233/ida-140663-
dc.identifier.scopus2-s2.0-84948397107-
dc.identifier.isi000339703000009-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.issn1088-467Xen_US
dc.description.rankM23en_US
dc.relation.firstpage677en_US
dc.relation.lastpage695en_US
dc.relation.volume18en_US
dc.relation.issue4en_US
item.cerifentitytypePublications-
item.languageiso639-1en-
item.fulltextNo Fulltext-
item.openairetypeArticle-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.grantfulltextnone-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-9323-4695-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

21
checked on Jun 13, 2026

Page view(s)

4
checked on Jun 13, 2026

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.