Please use this identifier to cite or link to this item:
https://research.matf.bg.ac.rs/handle/123456789/1666
Title: | An Algorithm for Sentence Recovery from PDF Files |
Authors: | Pajić, Vesna Vujičić Stanković, Staša Pajić, Miloš |
Affiliations: | Informatics and Computer Science |
Publisher: | Beograd : Filološki fakultet, Univerzitetska biblioteka "Svetozar Marković", Zajednica visokoškolskih biblioteka Srbije |
Journal: | Infotheca: Journal for Digital Humanities |
Abstract: | The use of PDF documents in Natural Language Processing (NLP) became an almost daily activity for researchers in the field of computer linguistics and alike. Extracting plain text from PDF documents, with existing software tools, leads to severe distortion of sentence and paragraph structures, which is a huge problem for linguistically oriented research. In this paper, we present a novel algorithm... |
URI: | https://research.matf.bg.ac.rs/handle/123456789/1666 |
Appears in Collections: | Research outputs |
Show full item record
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.