DevConf.IN 2026

Mohit Sewak, Ph.D.

Dr. Mohit Sewak is a Staff AI Researcher and Engineer at Google, specializing in Safe, Secure, and Responsible Generative AI. A leading expert in the field, he is the author of two books and an inventor with over 20 patents. His recent work focuses on the security vulnerabilities of Agentic AI, specifically the "Confused Deputy" problem and the Model Context Protocol (MCP). Mohit bridges the gap between theoretical AI safety and practical cybersecurity, advocating for "Agentic Zero Trust" architectures.

Company or affiliation:

Google

Job title:

Staff AI Engineer


Session

02-13
14:45
45min
Agentic AI and Model Context Protocol's Security Vulnerabilities
Mohit Sewak, Ph.D.

We have moved from the era of Chatbots that "speak" to AI Agents that "do." By giving LLMs access to tools via the Model Context Protocol (MCP), we have unlocked incredible power—but also a catastrophic new attack surface. Recent benchmarks like InjecAgent reveal that over 50% of agentic tasks are vulnerable to injection, allowing attackers to hijack your agent to delete files, exfiltrate data, or execute code.

This talk moves beyond simple "jailbreaking" to explore the advanced vulnerabilities threatening the Agentic ecosystem. We will demonstrate how Indirect Prompt Injection (IPI) turns innocent data into malicious code, how Tool Poisoning compromises the supply chain, and how the "Confused Deputy" problem turns your helpful assistant into an insider threat. We will dissect the "Agentic Gap"—where cognitive load degrades safety training—and conclude by defining the critical shift from model safety to system-level security.

Outline:
1. The Paradigm Shift: From Informational Harm to Instrumental Harm
● The Evolution: We are shifting from Chatbots (Input/Output) to Agents (Observation/Thought/Action).
● The Threat Shift: Moving beyond "mean tweets" (reputational risk) to "operational compromise" (Instrumental Harm). We will discuss how an agent can be tricked into wiring funds or bricking a server.
● The "Buffer Overflow" of AI: How LLMs acting as Von Neumann machines fail to distinguish between user instructions (Code) and retrieved content (Data).

  1. The Mechanism of Failure: The "Agentic Gap"
    ● Cognitive Load: Drawing on recent research, we will explain the "Agentic Gap"—the phenomenon where a model's refusal training degrades significantly when it is under the "cognitive load" of tool execution.
    ● The "Artie" Persona: Using the "Artie the Intern" analogy to explain why models prioritize functional success (completing the task) over safety constraints when processing complex workflows.

  2. Advanced Taxonomy of MCP Vulnerabilities
    We will move beyond basic prompt injection to explore sophisticated attack vectors specific to the Model Context Protocol:
    ● Context Manipulation:
    ○ TopicAttack: How attackers use natural language transitions to "smooth talk" the agent into accepting malicious contexts.
    ○ WebInject: The use of steganography in images or metadata to hide commands that the agent's vision system interprets as instructions.
    ● Supply Chain & Tool Poisoning:
    ○ Schema Poisoning: Hiding malicious instructions inside the API "instruction manual" (tool definitions) that the agent reads.
    ○ Output-Based Poisoning: When a legitimate tool returns data (e.g., a weather report) containing a hidden payload that executes in the next step of the chain.
    ○ The "Evil Twin" Attack: Tool impersonation risks in the MCP ecosystem.

  3. Why Traditional Defenses Fail
    ● The Futility of "Better Prompts": Why defensive prompting and standard RLHF are mathematically insufficient against adversarial suffixes and automated red-teaming tools.
    ● The Detection Paradox: How large context windows and "Chain of Thought" reasoning can actually increase vulnerability to logic-based injection attacks.

  4. Conclusion: The Security Imperatives
    ● A brief overview of the necessary shift toward "Defense-in-Depth."
    ● Moving from "Chatbot Safety" to "Systems Security" (Architecture over Alignment).

AI, Data Science, and Emerging Tech
VYAS - 1 - Room#VY124