Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/2945
Title: Semantic Integrity and AI-Generated "Authentic" Folklore - How can oral traditions be protected?
Authors: Mladenović, Miljana
Kartelj, Aleksandar 
Vujičić Stanković, Staša 
Affiliations: Informatics and Computer Science 
Informatics and Computer Science 
Keywords: AI-generated Folklore;Oral tradition;Semantic integrity;cultural heritage;paraphrase detection
Issue Date: 2025
Rank: M64
Publisher: Beograd : Matematički fakultet
Related Publication(s): Book of abstracts : 22nd National Conference Digitization of national heritage, old records from natural and social sciences and digital humanities
Conference: National Conference Digitization of national heritage, old records from natural and social sciences and digital humanities (22 ; 2025 ; Belgrade)
Abstract: 
Recent advances in large language models (LLMs) have enabled the generation of texts that successfully simulate the stylistic and metric structures of traditional oral forms such as Serbian epic poetry, lyrical songs, and fables. However, while these models excel in replicating surface-level features, such as archaic lexicon or oral storytelling patterns, they often lack semantic integrity. That is, they can preserve formal authenticity while subtly changing the ideological, moral, or cultural messages embedded in the original.
This research examines the risk that such altered texts may enter cultural heritage archives without being noticed. The problem is severe in digital environments, where there is no original metadata or traditional performance context. We conducted a series of controlled experiments using carefully designed prompts. Our goal was to assess the ability of state-of-the-art LLMs to rewrite authentic texts sourced from national digital repositories. The models were instructed to preserve key stylistic features such as structure, rhythm, and vocabulary, while introducing only subtle changes in meaning. These changes included, for example, reinterpreting a hero’s motivation or shifting the historical context of a conflict.
To evaluate the robustness of detection methods, we applied both classical stylometric classifiers and modern neural network architectures, which had been previously trained for rephrase detection. The results show that LLMs can generate texts that closely mimic the original style, making them difficult to detect using standard tools. Since our classifier focuses on stylistic, rather than semantic, differences, it often fails when the AI preserves surface-level form but subtly alters meaning. Semantic drift, though subtle, was crucially harder to detect when changes aligned with plausible historical reinterpretations.
We argue that relying solely on stylistic features is insufficient for AI text detection in cultural heritage. We recommend using several methods in conjunction: tracking the origin of the text, comparing it to other versions, identifying unusual language patterns, and having experts review it. Moreover, we emphasize the urgent need for digital watermarking and hash-based integrity verification of archival materials, particularly in contexts where AI-generated content may be inserted into public memory systems, intentionally or unintentionally.
URI: https://research.matf.bg.ac.rs/handle/123456789/2945
Appears in Collections:Research outputs

Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.