The OAuth Threat Model for AI Agents

OAuth 2.0 has eight threat vectors that become significantly worse when the client is an AI agent rather than a human user. None of these are bugs in the OAuth spec. They are gaps between what the spec assumes and how agents actually behave.

Why agents break the OAuth assumptions

The OAuth 2.0 authorization framework, defined in RFC 6749 and extended in RFC 9207, was designed for a world where the entity requesting access is either a human sitting at a browser or a well-defined server-side application with a fixed IP range and a known operator. Both assumptions break when the client is an AI agent.

AI agents are ephemeral. They spin up for a task, request credentials, do work, and spin down — or they do not spin down, and the credentials persist past the agent's useful life. They are polymorphic. The same agent framework (LangChain, AutoGen, CrewAI) might be used for 40 different tasks with 40 different credential requirements. And they are unpredictable. An agent's scope requirements can shift based on what the LLM decides to do next, which is not something you can model in a static OAuth registration.

When you put an entity with these characteristics into a standard OAuth flow, you get eight categories of risk that traditional identity tooling does not address.

Threat 1: Token hoarding

Standard OAuth clients request a token when they need access and ideally refresh or discard it afterward. AI agents request tokens on every run — sometimes on every API call within a run — and rarely implement proper token storage or revocation. After 90 days of a mid-complexity agentic pipeline, a typical GitHub organization will have dozens of active tokens from agents that stopped running weeks ago.

We measured this directly during a customer onboarding in January 2025. A fintech team running three LangChain agents against GitHub and Slack had 140 active OAuth tokens. Forty-one of those tokens belonged to agent instances that had been shut down after a failed experiment. Three of the tokens had admin-level GitHub access. None of the developers on the team knew those tokens existed.

Token hoarding is not malicious. It is the natural result of building agent pipelines without a credential lifecycle policy. The fix is automated token TTL with hard expiry — but that requires infrastructure most teams do not have today.

Threat 2: Scope escalation through framework defaults

LangChain, AutoGen, and similar frameworks ship with example integrations that request broad scopes by default. The GitHub integration in LangChain's documentation requests repo scope — which includes full read/write access to all repositories — because that is the scope that covers every possible use case. Most teams copy the example, never audit the scope, and end up with agents running with far more access than they need.

This is not a framework bug. The documentation is clear about what repo scope covers. The problem is that the gap between "what the agent needs" and "what the OAuth client requests" is never measured. In a properly audited deployment, an agent that only reads issue titles should have issues:read scope, not repo. In practice, most agentic deployments never go through that audit.

Threat 3: Cross-agent token reuse

When multiple agents in a pipeline share a credential store, they often share tokens. Agent A mints a Slack token to post a message. Agent B, doing something unrelated, finds that token in the shared store and reuses it rather than minting its own. This behavior reduces API calls and looks efficient — but it breaks the audit trail completely.

Every action Agent B takes with Agent A's token is attributed to Agent A in Slack's audit log. If Agent B is later compromised or behaves unexpectedly, the forensic trail points to the wrong agent. This is not a hypothetical — it is a standard pattern in CrewAI pipelines where agent roles share a toolset.

Threat 4: Prompt injection via OAuth redirect

This threat is specific to agents that can initiate OAuth flows on behalf of users. If an agent is operating in an environment where it processes external input (emails, Slack messages, web content) and that input contains a crafted OAuth redirect URI, the agent can be tricked into initiating an OAuth flow that redirects to an attacker-controlled domain.

The attack vector was first documented by researchers at ETH Zurich in 2024 and affects any agent that implements the Authorization Code flow without strict redirect URI validation. The agent completes what it believes is a legitimate OAuth exchange; the attacker receives the authorization code.

Threat 5: Leaked tokens in LLM context windows

LLMs process text. OAuth tokens are text. When an agent passes API responses through an LLM for parsing or summarization — which is extremely common — any tokens, secrets, or authorization headers in those responses get embedded in the model's context window. If the LLM is a hosted service (OpenAI, Anthropic, Gemini), that context is transmitted to a third-party server.

Most LLM providers do not train on API inputs, and their data handling policies are reasonably clear. But "reasonably clear" is not a compliance answer for healthcare, financial services, or any regulated industry. Agents that process API responses through LLMs need explicit data handling policies for anything that looks like a credential.

Threat 6: Revocation latency

RFC 7009 defines OAuth token revocation, but not all providers implement it at the same speed. GitHub revokes tokens within seconds. Salesforce can take up to 15 minutes for token revocation to propagate across their edge nodes. During that window, a revoked token remains valid against some Salesforce endpoints.

For human users, this is a minor inconvenience. For an AI agent that is running thousands of tasks per day, a 15-minute revocation window is meaningful. If you revoke an agent's credentials because you suspect a breach, you have a 15-minute window where that agent — or anything using its cached token — can still access your Salesforce data.

Threat 7: Overprivileged service accounts as agent backing identities

Many teams give their agents credentials that belong to a service account rather than doing a proper per-agent OAuth registration. The service account has broad permissions because it was set up to handle "everything the automations might need." Every agent that uses that service account's token inherits all of its permissions — including the ones no individual agent would ever need.

This is the agentic equivalent of sharing a root password. It is common because the alternative — registering each agent as a separate OAuth client and managing per-agent credentials — is operationally painful without dedicated tooling. That operational pain is exactly what Alter addresses.

Threat 8: Absent audit trails

OAuth providers log the tokens they issue, but they log at the application level. GitHub knows that your OAuth app made a request — it does not know which of your 12 agents made that request, what task it was running, or which human's workflow triggered it. That attribution gap means that when something goes wrong — an unexpected file deletion, an unauthorized API call — you cannot trace it back to a specific agent decision.

Building that attribution requires adding metadata to every token request: agent ID, run ID, task type, initiating user. Standard OAuth clients do not do this. Alter does.

What a mitigated architecture looks like

Mitigating all eight threats requires a proxy layer that sits between your agents and every OAuth provider. The proxy intercepts each token request, validates it against a per-agent scope policy, mints a short-lived token rather than passing through a long-lived credential, and logs the full context: agent ID, task, requested scope, granted scope, TTL.

This architecture does not require changes to your agent code or your OAuth provider configuration. It does require a proxy you trust to enforce policy consistently. That is what we built. If you are running AI agents in production against any OAuth-protected API, these eight threats are present in your system right now. The question is whether you have the tooling to manage them.