A variant of N-gram based language classification

Tomović, Andrija; Janičić, Predrag

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/495

Title:	A variant of N-gram based language classification
Authors:	Tomović, Andrija Janičić, Predrag
Affiliations:	Informatics and Computer Science
Issue Date:	1-Jan-2007
Rank:	M33
Publisher:	Springer
Related Publication(s):	Congress of the Italian Association for Artificial Intelligence AI*IA 2007
Journal:	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Conference:	Congress of the Italian Association for Artificial Intelligence (10 ; 2007 ; Rome)
Abstract:	Rapid classification of documents is of high-importance in many multilingual settings (such as international institutions or Internet search engines). This has been, for years, a well-known problem, addressed by different techniques, with excellent results. We address this problem by a simple n-grams based technique, a variation of techniques of this family. Our n-grams-based classification is very robust and successful, even for 20-fold classification, and even for short text strings. We give a detailed study for different lengths of strings and size of n-grams and we explore what classification parameters give the best performance. There is no requirement for vocabularies, but only for a few training documents. As a main corpus, we used a EU set of documents in 20 languages. Experimental comparison shows that our approach gives better results than four other popular approaches. © Springer-Verlag Berlin Heidelberg 2007.
URI:	https://research.matf.bg.ac.rs/handle/123456789/495
ISBN:	9783540747819
ISSN:	03029743
DOI:	10.1007/978-3-540-74782-6_36
Appears in Collections:	Research outputs

Show full item record

SCOPUS^TM
Citations

5

checked on Jun 10, 2026

Page view(s)

16

checked on Jun 12, 2026

Google Scholar^TM

Check

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM