2026-06-19 –, A112 (capacity 64)
Securing a large language model today resembles an endless game of cat and mouse. Programmers try to manually write filters and prohibitions, but all it takes is one creatively written prompt and the model obediently generates dangerous content. Traditional defenses are inflexible, slow, and attackers are always one step ahead.
This talk shows how to break out of this vicious circle. We introduce our open-source framework, which is used for systematic red teaming and testing models against 25 types of prompt-based attacks. We show how to analyze AI behavior under fire. On this basis, we then introduce a new defense method based on genetic programming. Instead of manually patching holes, this "digital evolution" automatically searches for optimal rules that strengthen the model and create a defensive layer. All this without having to change a single parameter in the model's weights. You will find out why evolutionary search for system rules is more effective than an army of experts.
I am a final-year Master’s student in Machine Learning at Brno University of Technology, currently completing my thesis. As a student security researcher at VUT FIT, I previously focused on deepfake detection; today, my work centers on large language models and their safety.