Back to blog

Agents Don’t Ask Permission. They Need Boundaries.

Steve Kozy

July, 31 2025

3 minutes read

read by 10

In a move that signals the next frontier of AI, OpenAI unveiled ChatGPT Agent, a powerful new tool that can autonomously interact with your digital world. Paying users now have the option to click “Tools” > “Agent Mode” and allow ChatGPT to act on their behalf—logging into accounts, sending emails, downloading files, and more.

Have you used this feature yet? Please comment.

It’s a major leap forward. But with that leap comes a wave of security implications that’s hard to ignore. In fact, this launch might be remembered as the day security became not just a checkbox—but the very foundation of enterprise AI.

Red Teaming at Warp Speed

Before rolling it out, OpenAI assembled a 16-person red teamcomposed of security researchers with PhDs in biosafety.

They were given 40 hours to break the Agent. Mission: Success.

110 attacks were launched
16 exceeded OpenAI’s internal risk thresholds
7 universal exploits were uncovered, each capable of compromising the system

The results weren’t theoretical. These were real-world, scalable vulnerabilities involving data exfiltration, command execution, and bio-threat synthesis.

Seven Attack Vectors That Changed Everything

Among the most concerning discoveries:

Visual Browser Hidden Instructions: Used to secretly exfiltrate data via manipulated web pages.
Google Drive Exploits: Forced document leaks by abusing cloud connectors.
Multi-Step Chain Attacks: Combined benign tasks into complete session takeovers.
Biological Threat Synthesis: The model could generate dangerous knowledge from public sources.

How OpenAI Responded

Rather than delay the launch indefinitely, OpenAI treated the red team’s findings as a crucible for security transformation. Several critical defenses were implemented:

Watch Mode: If the user navigates away during sensitive activity (like banking), all actions freeze.
Memory Disabled: A bold move—core memory features were turned off at launch to prevent slow-leak vulnerabilities.
Terminal Restrictions: The model’s internet access is limited to GET requests only—no command execution allowed.
Rapid Remediation Protocol: Vulnerabilities are now patched within hours, not weeks.

And most notably: 100% real-time monitoring of all agent interactions—not just sampling

Why It Matters for Enterprise AI

For CISOs and IT leaders, this sets a new baseline for what responsible AI deployment looks like:

Quantifiable Protection: Numbers—not promises—are the new benchmark.
Real-Time Visibility: If your AI tool isn’t logging 100% of interactions, you’re flying blind.
Tight Operational Boundaries: Core features like memory must be off until proven safe.
Lightning-Fast Patching: Exploits spread instantly; security must move faster.

Preparedness, Now Operationalized

OpenAI’s Keren Gu explained it best:

“Before we reached High capability, Preparedness was about analyzing capabilities and planning safeguards. Now… Preparedness safeguards have become an operational requirement.”

That’s the shift—security planning isn’t theoretical anymore. It’s embedded in the runtime.

Red Teams Are Now Architects

The final takeaway? Red teams aren’t just testing anymore. They’re shaping the architecture of modern AI.

Their work revealed the holes. OpenAI’s engineering sealed them shut. And the ChatGPT Agent emerged not just more powerful, but meaningfully safer.

The Bottom Line

The ChatGPT Agent is one of the most ambitious AI features ever released to the public. But what really stands out isn’t its power to; it’s how that power was hardened through relentless testing, high-stakes feedback, and operational humility.

In the new AI era, the strongest platforms won’t be the flashiest—they’ll be the ones that turn red teaming into their superpower.

Have you used this feature yet? I’d love some comments about your success. I say it’s not ready for prime time, yet..