Securing Agents with 1970s Access Control
How do we prevent AI prompt injections from sprawling? How do we ensure agents don’t leak company secrets?
AI governance is a massive pain point, but maybe we can take inspiration from access control principles from the 70s? As a fan of boring technology, I’ll take any opportunity to apply battle-tested principles when building things like agentic workflows that are arguably on the bleeding edge.
Modern access control is usually stateless. We belong to groups that grant access to resources. Right role, we’re in. But AI agents accumulate context over time, and we can make use of that.
Today, common approaches to AI security include sidecar LLMs that monitor inputs and outputs, or even prompt instructions that simply ask the model to behave. Using probabilistic models to secure probabilistic models means we never get hard guarantees.
Stateful multi-level security
Back in the 70s, the US DoD needed to ensure mathematically that mainframes could be used securely. This foundational work resulted in models such as Bell-LaPadula for protecting confidentiality, and Biba for protecting integrity. This post is not about BLP and Biba, but the main rules are worth mentioning:
Bell-LaPadula (protects confidentiality):
- Subjects can’t read above their clearance level.
- A subject at a given level can’t write to objects at lower levels (if you have been exposed to secrets you may leak them).
Biba (protects integrity):
- High-trust subjects can’t read low-integrity data.
- Low-integrity subjects can’t write to higher-integrity objects (if you have seen dirty data you may assume it’s true).
These are pillars of multi-level security. We track what subjects have accessed to determine what they are allowed to do next, which gives us information flow tracking, or taint tracking. Real-world implementations like SELinux MLS sometimes skip the stateful parts, using fixed labels instead. For AI agents, the statefulness is what we need in order to track what information has entered the context.
How is this useful for AI agents?
Suppose we keep track of everything an agent accesses and is currently aware of. This means assigning levels to objects that we place in contexts, whether sourced from RAG pipelines or tool calls. For writes, we do the same for tools or workflows that modify data.
If some agent reads company data classified as INTERNAL, that same agent couldn’t write a PUBLIC press release because it might leak internal information.
But what I find more powerful is the integrity side. Suppose an agent reads information from the public web, and we classify this as UNVERIFIED. We could prevent that agent from modifying data that we rely on to be accurate, because from having read internet-facing data, the agent is susceptible to prompt injections.
Or if we want to be ruthless, we could say that LLM outputs are always low integrity because all models can hallucinate.
Taint tracking also has immediate infrastructure benefits. If we know the exact confidentiality level of the data in an agent’s context, we can use that state to decide where to perform inference. Highly sensitive data could be automatically routed to internal GPUs rather than third-party APIs.
Sanitization and declassification
In practice, we also need to implement methods to sanitize (raise integrity) and declassify information (lower confidentiality). An agent can process data within its own level, but it can never share its findings with a lower-clearance user, nor can its findings ever be trusted by a higher-integrity one.
This is a natural fit for human-in-the-loop steps. Thanks to how we already need to track information flow to do taint tracking, we can piggyback on this for review steps. If an agent wants to raise the integrity or lower the confidentiality of anything, we can present all information pieces that the agent used, along with their confidentiality and integrity levels, even recursively, to the reviewer.