Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2945
DC FieldValueLanguage
dc.contributor.authorMladenović, Miljanaen_US
dc.contributor.authorKartelj, Aleksandaren_US
dc.contributor.authorVujičić Stanković, Stašaen_US
dc.date.accessioned2025-12-01T08:51:36Z-
dc.date.available2025-12-01T08:51:36Z-
dc.date.issued2025-
dc.identifier.urihttps://research.matf.bg.ac.rs/handle/123456789/2945-
dc.description.abstractRecent advances in large language models (LLMs) have enabled the generation of texts that successfully simulate the stylistic and metric structures of traditional oral forms such as Serbian epic poetry, lyrical songs, and fables. However, while these models excel in replicating surface-level features, such as archaic lexicon or oral storytelling patterns, they often lack semantic integrity. That is, they can preserve formal authenticity while subtly changing the ideological, moral, or cultural messages embedded in the original. This research examines the risk that such altered texts may enter cultural heritage archives without being noticed. The problem is severe in digital environments, where there is no original metadata or traditional performance context. We conducted a series of controlled experiments using carefully designed prompts. Our goal was to assess the ability of state-of-the-art LLMs to rewrite authentic texts sourced from national digital repositories. The models were instructed to preserve key stylistic features such as structure, rhythm, and vocabulary, while introducing only subtle changes in meaning. These changes included, for example, reinterpreting a hero’s motivation or shifting the historical context of a conflict. To evaluate the robustness of detection methods, we applied both classical stylometric classifiers and modern neural network architectures, which had been previously trained for rephrase detection. The results show that LLMs can generate texts that closely mimic the original style, making them difficult to detect using standard tools. Since our classifier focuses on stylistic, rather than semantic, differences, it often fails when the AI preserves surface-level form but subtly alters meaning. Semantic drift, though subtle, was crucially harder to detect when changes aligned with plausible historical reinterpretations. We argue that relying solely on stylistic features is insufficient for AI text detection in cultural heritage. We recommend using several methods in conjunction: tracking the origin of the text, comparing it to other versions, identifying unusual language patterns, and having experts review it. Moreover, we emphasize the urgent need for digital watermarking and hash-based integrity verification of archival materials, particularly in contexts where AI-generated content may be inserted into public memory systems, intentionally or unintentionally.en_US
dc.language.isoenen_US
dc.publisherBeograd : Matematički fakulteten_US
dc.subjectAI-generated Folkloreen_US
dc.subjectOral traditionen_US
dc.subjectSemantic integrityen_US
dc.subjectcultural heritageen_US
dc.subjectparaphrase detectionen_US
dc.titleSemantic Integrity and AI-Generated "Authentic" Folklore - How can oral traditions be protected?en_US
dc.typeConference Objecten_US
dc.relation.conferenceNational Conference Digitization of national heritage, old records from natural and social sciences and digital humanities (22 ; 2025 ; Belgrade)en_US
dc.relation.publicationBook of abstracts : 22nd National Conference Digitization of national heritage, old records from natural and social sciences and digital humanitiesen_US
dc.identifier.urlhttps://www.ncd.matf.bg.ac.rs/conferences/ncd2025/NCD2025_Book_of_Abstracts.pdf-
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.contributor.affiliationInformatics and Computer Scienceen_US
dc.relation.isbn978-86-7589-206-9en_US
dc.description.rankM64en_US
dc.relation.firstpage10en_US
dc.relation.lastpage10en_US
item.openairetypeConference Object-
item.languageiso639-1en-
item.grantfulltextnone-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextNo Fulltext-
item.cerifentitytypePublications-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.deptInformatics and Computer Science-
crisitem.author.orcid0000-0001-9839-6039-
crisitem.author.orcid0000-0002-7200-3724-
Appears in Collections:Research outputs
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.