Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/495
DC FieldValueLanguage
dc.contributor.authorTomović, Andrijaen_US
dc.contributor.authorJaničić, Predragen_US
dc.date.accessioned2022-08-13T10:14:40Z-
dc.date.available2022-08-13T10:14:40Z-
dc.date.issued2007-01-01-
dc.identifier.isbn9783540747819-
dc.identifier.issn03029743en
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/495-
dc.description.abstractRapid classification of documents is of high-importance in many multilingual settings (such as international institutions or Internet search engines). This has been, for years, a well-known problem, addressed by different techniques, with excellent results. We address this problem by a simple n-grams based technique, a variation of techniques of this family. Our n-grams-based classification is very robust and successful, even for 20-fold classification, and even for short text strings. We give a detailed study for different lengths of strings and size of n-grams and we explore what classification parameters give the best performance. There is no requirement for vocabularies, but only for a few training documents. As a main corpus, we used a EU set of documents in 20 languages. Experimental comparison shows that our approach gives better results than four other popular approaches. © Springer-Verlag Berlin Heidelberg 2007.en
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en_US
dc.titleA variant of N-gram based language classificationen_US
dc.typeConference Paperen_US
dc.relation.publicationCongress of the Italian Association for Artificial Intelligence AI*IA 2007en_US
dc.identifier.doi10.1007/978-3-540-74782-6_36-
dc.identifier.scopus2-s2.0-38049148532-
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/38049148532-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.firstpage410en_US
dc.relation.lastpage421en_US
dc.relation.volume4733 LNAIen_US
item.fulltextNo Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.grantfulltextnone-
item.openairetypeConference Paper-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0001-8922-4948-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

5
checked on Dec 18, 2024

Page view(s)

11
checked on Dec 25, 2024

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.