Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/2339
Title: | Application of a Structural Support Vector Machine Method to N-gram Based Text Classification in Serbian | Other Titles: | Н-грамски заснована класификација текста на српском језику применом методе структуралних подржавајућих вектора | Authors: | Kovačević, Jovana Graovac, Jelena |
Affiliations: | Informatics and Computer Science Informatics and Computer Science |
Keywords: | hierarchical text classification;Support Vector Machine Method;Ebart corpus | Issue Date: | 2016 | Rank: | M53 | Publisher: | Beograd : Filološki fakultet Beograd : Univerzitetska biblioteka "Svetozar Marković" Beograd : Zajednica biblioteka univerziteta u Srbiji |
Journal: | Infotheca - Journal for Digital Humanities | Abstract: | The paper presents classification results of a hierarchically organized document corpus in Serbian, by using Support Vector Machine method (SVM). Two techniques have been applied derived from the SVM with structural output: multiclass flat and hierarchical classification. Common representation model of a document and a class or a hierarchy of classes the document belongs to, specific for this form of the SVM method, is based on different length byte n-grams. Four tf-idf statistics have been used that define significance of an n-gram for a specific document. The techniques and statistics described have been tested on a hierarchically structured subset of the Ebart corpus of newspaper texts. The results obtained for both types of classifiers are similar for the corpus as a whole, while hierarchical classifier performs better for most specific classes with small number of texts. |
URI: | https://research.matf.bg.ac.rs/handle/123456789/2339 | DOI: | 10.18485/infotheca.2016.16.1_2.1 |
Appears in Collections: | Research outputs |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.