Application of a Structural Support Vector Machine Method to N-gram Based Text Classification in Serbian

Kovačević, Jovana; Graovac, Jelena

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2339

Title:	Application of a Structural Support Vector Machine Method to N-gram Based Text Classification in Serbian
Other Titles:	Н-грамски заснована класификација текста на српском језику применом методе структуралних подржавајућих вектора
Authors:	Kovačević, Jovana Graovac, Jelena
Affiliations:	Informatics and Computer Science Informatics and Computer Science
Keywords:	hierarchical text classification;Support Vector Machine Method;Ebart corpus
Issue Date:	2016
Rank:	M53
Publisher:	Beograd : Filološki fakultet Beograd : Univerzitetska biblioteka "Svetozar Marković" Beograd : Zajednica biblioteka univerziteta u Srbiji
Journal:	Infotheca - Journal for Digital Humanities
Abstract:	The paper presents classification results of a hierarchically organized document corpus in Serbian, by using Support Vector Machine method (SVM). Two techniques have been applied derived from the SVM with structural output: multiclass flat and hierarchical classification. Common representation model of a document and a class or a hierarchy of classes the document belongs to, specific for this form of the SVM method, is based on different length byte n-grams. Four tf-idf statistics have been used that define significance of an n-gram for a specific document. The techniques and statistics described have been tested on a hierarchically structured subset of the Ebart corpus of newspaper texts. The results obtained for both types of classifiers are similar for the corpus as a whole, while hierarchical classifier performs better for most specific classes with small number of texts.
URI:	https://research.matf.bg.ac.rs/handle/123456789/2339
DOI:	10.18485/infotheca.2016.16.1_2.1
Appears in Collections:	Research outputs

Show full item record

Google Scholar^TM

Check

Google Scholar^TM

Altmetric

Altmetric

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM