Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1663
Title: Language Identification: The Case of Serbian
Authors: Zečević, Andjelka
Vujičić Stanković, Staša 
Affiliations: Informatics and Computer Science 
Keywords: Language identification;Serbian language
Rank: M63
Publisher: Belgrade : University of Belgrade, Faculty of Mathematics
Related Publication(s): Natural Language Processing for Serbian – Resources and Application
Abstract: 
Serbian and other national standard languages that are used instead of common standard Serbo-Croatian have a phonologically based orthography. The characteristics of this orthography are that Serbian can be written in two alphabets (Latin and Cyrillic) and in two dialects (Ekavian and Ijekavian) which is directly reproduced in a written language. Consequently, Serbian is hard to identify because there are languages that are very similar (sharing alphabets and dialects). Therefore the problems typical for closely related languages are strongly presented in Serbian. The existing top-level tools do not give results comparable to the other classes of languages, so it is necessary to locate the problem and use the cumulative linguistic knowledge to overcome it.
This paper summarizes the first results towards that goal. We have chosen several top-level language identi cation tools and tested theirs sensibility for both the alphabets and both the dialects. For the testing purpose we have created corpora encompassing the newspaper articles, the literary works written by Serbian authors and the translations of many widely-circulated novels. The obtained results indicate that not all the tools support Latin and Cyrillic scripts and con rm that the language identification of documents written in Ijekavian variant is much more error prone in comparison to documents written in Ekavian variant.
URI: https://research.matf.bg.ac.rs/handle/123456789/1663
ISSN: 978-86-7589-088-1
Appears in Collections:Research outputs

Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.