Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1663
DC FieldValueLanguage
dc.contributor.authorZečević, Andjelkaen_US
dc.contributor.authorVujičić Stanković, Stašaen_US
dc.date.accessioned2025-03-14T12:40:33Z-
dc.date.available2025-03-14T12:40:33Z-
dc.identifier.issn978-86-7589-088-1-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/1663-
dc.description.abstractSerbian and other national standard languages that are used instead of common standard Serbo-Croatian have a phonologically based orthography. The characteristics of this orthography are that Serbian can be written in two alphabets (Latin and Cyrillic) and in two dialects (Ekavian and Ijekavian) which is directly reproduced in a written language. Consequently, Serbian is hard to identify because there are languages that are very similar (sharing alphabets and dialects). Therefore the problems typical for closely related languages are strongly presented in Serbian. The existing top-level tools do not give results comparable to the other classes of languages, so it is necessary to locate the problem and use the cumulative linguistic knowledge to overcome it. This paper summarizes the first results towards that goal. We have chosen several top-level language identi cation tools and tested theirs sensibility for both the alphabets and both the dialects. For the testing purpose we have created corpora encompassing the newspaper articles, the literary works written by Serbian authors and the translations of many widely-circulated novels. The obtained results indicate that not all the tools support Latin and Cyrillic scripts and con rm that the language identification of documents written in Ijekavian variant is much more error prone in comparison to documents written in Ekavian variant.en_US
dc.language.isoenen_US
dc.publisherBelgrade : University of Belgrade, Faculty of Mathematicsen_US
dc.subjectLanguage identificationen_US
dc.subjectSerbian languageen_US
dc.titleLanguage Identification: The Case of Serbianen_US
dc.typeConference Objecten_US
dc.relation.publicationNatural Language Processing for Serbian – Resources and Applicationen_US
dc.identifier.urlhttps://jerteh.rs/wp-content/uploads/2015/05/Zecevic.pdf-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.isbn978-86-7589-088-1en_US
dc.description.rankM63en_US
dc.relation.firstpage101en_US
dc.relation.lastpage112en_US
item.languageiso639-1en-
item.grantfulltextnone-
item.fulltextNo Fulltext-
item.openairetypeConference Object-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-7200-3724-
Appears in Collections:Research outputs
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.