Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1226
DC FieldValueLanguage
dc.contributor.authorPajić, Vesnaen_US
dc.contributor.authorVujičić Stanković, Stašaen_US
dc.contributor.authorStanković, Rankaen_US
dc.contributor.authorPajić, Milošen_US
dc.date.accessioned2022-09-29T16:08:24Z-
dc.date.available2022-09-29T16:08:24Z-
dc.date.issued2018-06-11-
dc.identifier.issn02640473en
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/1226-
dc.description.abstractPurpose: A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach: The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts. Findings: By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary. Originality/value: The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.en
dc.relation.ispartofElectronic Libraryen
dc.subjectData analysisen
dc.subjectData processingen
dc.subjectData retrievalen
dc.subjectDigital documentsen
dc.subjectDocument handlingen
dc.subjectEvaluationen
dc.subjectForeign languagesen
dc.subjectInformation retrievalen
dc.titleSemi-automatic extraction of multiword terms from domain-specific corporaen_US
dc.typeArticleen_US
dc.identifier.doi10.1108/EL-06-2017-0128-
dc.identifier.scopus2-s2.0-85047317443-
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/85047317443-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.firstpage550en
dc.relation.lastpage567en
dc.relation.volume36en
dc.relation.issue3en
item.fulltextNo Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.grantfulltextnone-
item.openairetypeArticle-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0002-7200-3724-
Appears in Collections:Research outputs
Show simple item record

SCOPUSTM   
Citations

7
checked on Dec 20, 2024

Page view(s)

22
checked on Dec 24, 2024

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.