Language Identification: The Case of Serbian

Zečević, Andjelka; Vujičić Stanković, Staša

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1663

DC Field	Value	Language
dc.contributor.author	Zečević, Andjelka	en_US
dc.contributor.author	Vujičić Stanković, Staša	en_US
dc.date.accessioned	2025-03-14T12:40:33Z	-
dc.date.available	2025-03-14T12:40:33Z	-
dc.date.issued	2014	-
dc.identifier.issn	978-86-7589-088-1	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/1663	-
dc.description.abstract	Serbian and other national standard languages that are used instead of common standard Serbo-Croatian have a phonologically based orthography. The characteristics of this orthography are that Serbian can be written in two alphabets (Latin and Cyrillic) and in two dialects (Ekavian and Ijekavian) which is directly reproduced in a written language. Consequently, Serbian is hard to identify because there are languages that are very similar (sharing alphabets and dialects). Therefore the problems typical for closely related languages are strongly presented in Serbian. The existing top-level tools do not give results comparable to the other classes of languages, so it is necessary to locate the problem and use the cumulative linguistic knowledge to overcome it. This paper summarizes the first results towards that goal. We have chosen several top-level language identi cation tools and tested theirs sensibility for both the alphabets and both the dialects. For the testing purpose we have created corpora encompassing the newspaper articles, the literary works written by Serbian authors and the translations of many widely-circulated novels. The obtained results indicate that not all the tools support Latin and Cyrillic scripts and con rm that the language identification of documents written in Ijekavian variant is much more error prone in comparison to documents written in Ekavian variant.	en_US
dc.language.iso	en	en_US
dc.publisher	Belgrade : University of Belgrade, Faculty of Mathematics	en_US
dc.subject	Language identification	en_US
dc.subject	Serbian language	en_US
dc.title	Language Identification: The Case of Serbian	en_US
dc.type	Conference Object	en_US
dc.relation.conference	Conference "35th Anniversary of Computational Linguistics in Serbia" (2014 ; Belgrad)	en_US
dc.relation.publication	Natural Language Processing for Serbian: Resources and Applications, Proceedings of the Conference "35th Anniversary of Computational Linguistics in Serbia"	en_US
dc.identifier.url	https://jerteh.rs/wp-content/uploads/2015/05/Zecevic.pdf	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.isbn	978-86-7589-088-1	en_US
dc.description.rank	M63	en_US
dc.relation.firstpage	101	en_US
dc.relation.lastpage	112	en_US
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.openairetype	Conference Object	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.orcid	0000-0002-7200-3724	-
Appears in Collections:	Research outputs

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM