Textual Ciphers as a Tool for Better Understanding the Transformers

Provazník, Jan

Textové šifry jako nástroj pro lepší pochopení modelů Transformer

dc.contributor.advisor	Libovický, Jindřich
dc.creator	Provazník, Jan
dc.date.accessioned	2024-11-29T06:11:04Z
dc.date.available	2024-11-29T06:11:04Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.11956/192072
dc.description.abstract	Architektura Transformer je velmi popula ́rnı ́, takz ̌e mu ̊z ̌e by ́t potencia ́lne ̌ vy ́znamne ́ interpretovat, co ovlivn ̌uje jejı ́ vy ́kon. Testujeme hypote ́zu, z ̌e mo- del se pr ̌i pra ́ci s textem spole ́ha ́ na jeho lingvisticke ́ vlastnosti. Abychom eli- minovali vliv kultury na vy ́znam, pouz ̌ı ́va ́me u ́lohu pracujı ́cı ́ na u ́rovni znaku ̊ s Transformer modelem ByT5. Dotre ́nujeme ByT5-small na des ̌ifrova ́nı ́ ve ̌t zas ̌ifrovany ́ch pomocı ́ textovy ́ch s ̌ifer (Vigene ̀re, Enigma). Anotujeme eva- luac ̌nı ́ dataset ve ̌t pomocı ́ publikovany ́ch na 'stroju ̊ pro NLP. Na evaluac ̌nı ́m datasetu zkouma ́me vztahy mezi lingvisticky ́mi vlastnostmi a c ̌etnostı ́ chyb dotre ́novane ́ho ByT5 pr ̌i des ̌ifrova ́nı ́ ve ̌t. Analyzujeme korelace, tre ́nujeme ML modely na predikci c ̌etnosti chyb ve ̌ty z jijı ́ch lingvisticky ́ch vlastnostı ́ a interpretujeme du ̊lez ̌itost vlastnostı ́ pomocı ́ SHAP. Nacha ́zı ́me male ́ signifi- kantnı ́ korelace, ale predikce c ̌etnosti chyb z vlastnostı ́ selha ́va ́. Dospı ́va ́me k za ́ve ̌ru, z ̌e identifikovane ́ vlastnosti neposkytujı ́ vhled do vy ́konu Transfor- meru ̊.	cs_CZ
dc.description.abstract	The Transformer architecture is very popular, so it is potentially im- pactful to interpret what influences its performance. We test the hypothesis that the model relies on the linguistic properties of a text when working with it. We remove interference with cultural aspects of meaning by using a character-level task with the ByT5 Transformer model. We train ByT5 to decipher sentences encrypted with text ciphers (Vigenère, Enigma). We annotate a sentence dataset with linguistic properties with published NLP tools. On this dataset, we study the relationships between the linguistic properties and the fine-tuned ByT5 decipherment error rate. We analyze correlations, train ML models to predict error rates from the properties and interpret them with SHAP. We find small significant correlations but can- not predict error rates from the properties. We conclude the properties we identified do not give much insight into the performance of the Transformer.	en_US
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	Transformer\|interpretability\|NLP\|deep learning\|ciphers	en_US
dc.subject	Transformer\|interpretovatelnost\|NLP\|deep learning\|šifry	cs_CZ
dc.title	Textual Ciphers as a Tool for Better Understanding the Transformers	en_US
dc.type	bakalářská práce	cs_CZ
dcterms.created	2024
dcterms.dateAccepted	2024-06-28
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.identifier.repId	262108
dc.title.translated	Textové šifry jako nástroj pro lepší pochopení modelů Transformer	cs_CZ
dc.contributor.referee	Kasner, Zdeněk
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	Computer Science with specialisation in Artificial Intelligence	en_US
thesis.degree.discipline	Informatika se specializací Umělá inteligence	cs_CZ
thesis.degree.program	Computer Science	en_US
thesis.degree.program	Informatika	cs_CZ
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Informatika se specializací Umělá inteligence	cs_CZ
uk.degree-discipline.en	Computer Science with specialisation in Artificial Intelligence	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Architektura Transformer je velmi popula ́rnı ́, takz ̌e mu ̊z ̌e by ́t potencia ́lne ̌ vy ́znamne ́ interpretovat, co ovlivn ̌uje jejı ́ vy ́kon. Testujeme hypote ́zu, z ̌e mo- del se pr ̌i pra ́ci s textem spole ́ha ́ na jeho lingvisticke ́ vlastnosti. Abychom eli- minovali vliv kultury na vy ́znam, pouz ̌ı ́va ́me u ́lohu pracujı ́cı ́ na u ́rovni znaku ̊ s Transformer modelem ByT5. Dotre ́nujeme ByT5-small na des ̌ifrova ́nı ́ ve ̌t zas ̌ifrovany ́ch pomocı ́ textovy ́ch s ̌ifer (Vigene ̀re, Enigma). Anotujeme eva- luac ̌nı ́ dataset ve ̌t pomocı ́ publikovany ́ch na 'stroju ̊ pro NLP. Na evaluac ̌nı ́m datasetu zkouma ́me vztahy mezi lingvisticky ́mi vlastnostmi a c ̌etnostı ́ chyb dotre ́novane ́ho ByT5 pr ̌i des ̌ifrova ́nı ́ ve ̌t. Analyzujeme korelace, tre ́nujeme ML modely na predikci c ̌etnosti chyb ve ̌ty z jijı ́ch lingvisticky ́ch vlastnostı ́ a interpretujeme du ̊lez ̌itost vlastnostı ́ pomocı ́ SHAP. Nacha ́zı ́me male ́ signifi- kantnı ́ korelace, ale predikce c ̌etnosti chyb z vlastnostı ́ selha ́va ́. Dospı ́va ́me k za ́ve ̌ru, z ̌e identifikovane ́ vlastnosti neposkytujı ́ vhled do vy ́konu Transfor- meru ̊.	cs_CZ
uk.abstract.en	The Transformer architecture is very popular, so it is potentially im- pactful to interpret what influences its performance. We test the hypothesis that the model relies on the linguistic properties of a text when working with it. We remove interference with cultural aspects of meaning by using a character-level task with the ByT5 Transformer model. We train ByT5 to decipher sentences encrypted with text ciphers (Vigenère, Enigma). We annotate a sentence dataset with linguistic properties with published NLP tools. On this dataset, we study the relationships between the linguistic properties and the fine-tuned ByT5 decipherment error rate. We analyze correlations, train ML models to predict error rates from the properties and interpret them with SHAP. We find small significant correlations but can- not predict error rates from the properties. We conclude the properties we identified do not give much insight into the performance of the Transformer.	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O