Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/2552
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Graovac, Jelena | en_US |
dc.date.accessioned | 2025-09-16T13:18:41Z | - |
dc.date.available | 2025-09-16T13:18:41Z | - |
dc.date.issued | 2014 | - |
dc.identifier.uri | https://research.matf.bg.ac.rs/handle/123456789/2552 | - |
dc.description.abstract | A technique for automated categorization of text documents, based on byte-level n-gram profiles and a new dissimilarity measure between profiles is presented. K nearest neighbors classifier is used. The technique is language independent. It has been applied to four document collections in English, Chinese and Serbian: Reuters-21578 newswire articles, 20-Newsgroups, Tancorp and Ebart. The evaluation was done by using the micro- and macro-averaged function. The results obtained confirm that the presented technique, although very simple, in the case of Tancorp and 20-Newsgroups corpora achieves better results than other n-gram based techniques. As compared to other state-of-the-art methods, it performs better than “bag-of-words” K nearest neighbors classifier and in the case of 20-Newsgroups corpus it works even better than “bag-of-words” Support vector machines classifier. It can be successfully used in a variety of related problems. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Sage Journals | en_US |
dc.relation.ispartof | Intelligent Data Analysis | en_US |
dc.title | A variant of n-gram based language-independent text categorization | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.3233/ida-140663 | - |
dc.identifier.scopus | 2-s2.0-84948397107 | - |
dc.identifier.isi | 000339703000009 | - |
dc.contributor.affiliation | Informatics and Computer Science | en_US |
dc.relation.issn | 1088-467X | en_US |
dc.description.rank | M23 | en_US |
dc.relation.firstpage | 677 | en_US |
dc.relation.lastpage | 695 | en_US |
dc.relation.volume | 18 | en_US |
dc.relation.issue | 4 | en_US |
item.languageiso639-1 | en | - |
item.cerifentitytype | Publications | - |
item.openairetype | Article | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.fulltext | No Fulltext | - |
item.grantfulltext | none | - |
crisitem.author.dept | Informatics and Computer Science | - |
crisitem.author.orcid | 0000-0002-9323-4695 | - |
Appears in Collections: | Research outputs |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.