Keep the model's actions inside approved boundaries.
Contain is the outbound layer. Every model response and tool call is checked before it touches a user, a database, or a customer system. Your policy decides whether to block, mask, log, or route for human approval — and the decision is enforced in real time.
Destructive-action blocking
When an agent tries to drop a database table, format a disk, transfer money, mass-email customers, escalate IAM permissions, or bulk-export data, Cerberus blocks the call on the critical path. Every block is recorded and routed for approval.
Tool allow-listing
Define what each agent is permitted to call, with parameter checks for the tools you do allow. An agent that suddenly tries to invoke a tool outside its allow-list is stopped before the call leaves your perimeter.
Regulated-content flagging
Catches medical, legal, financial, discriminatory, or offensive content in agent responses. Doesn't claim to judge truth — flags the category, attaches a disclaimer, or routes for review based on your policy.
Sensitive-data scanning on outputs
The same scanner that runs inbound also runs on every response. Catches data the model produces — not just data the user typed. Mask, redact, or block based on what your policy requires.
Uncertainty signal
Surfaces hedging language, contradictions, and abnormally short or long responses for human review — without blocking by default. Useful for catching the model bluffing in high-stakes domains.
Drift detection
Flags when an agent's behavior changes from its baseline — different tools, different topics, different language patterns. Often the first signal something has gone wrong upstream.