Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2339
Title: Application of a Structural Support Vector Machine Method to N-gram Based Text Classification in Serbian
Other Titles: Н-грамски заснована класификација текста на српском језику применом методе структуралних подржавајућих вектора
Authors: Kovačević, Jovana 
Graovac, Jelena 
Affiliations: Informatics and Computer Science 
Informatics and Computer Science 
Keywords: hierarchical text classification;Support Vector Machine Method;Ebart corpus
Issue Date: 2016
Rank: M53
Publisher: Beograd : Filološki fakultet
Beograd : Univerzitetska biblioteka "Svetozar Marković"
Beograd : Zajednica biblioteka univerziteta u Srbiji
Journal: Infotheca - Journal for Digital Humanities
Abstract: 
The paper presents classification results of a hierarchically organized document corpus in Serbian, by using Support Vector Machine method (SVM). Two techniques have been applied derived from the SVM with structural output: multiclass flat and hierarchical classification. Common representation model of a document and a class or a hierarchy of classes the document belongs to, specific for this form of the SVM method, is based on different length byte n-grams. Four tf-idf statistics have been used that define significance of an n-gram for a specific document. The techniques and statistics described have been tested on a hierarchically structured subset of the Ebart corpus of newspaper texts. The results obtained for both types of classifiers are similar for the corpus as a whole, while hierarchical classifier performs better for most specific classes with small number of texts.
URI: https://research.matf.bg.ac.rs/handle/123456789/2339
DOI: 10.18485/infotheca.2016.16.1_2.1
Appears in Collections:Research outputs

Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.