Contain · Outbound layer

Keep the model's actions inside approved boundaries.

Contain is the outbound layer. Every model response and tool call is checked before it touches a user, a database, or a customer system. Your policy decides whether to block, mask, log, or route for human approval — and the decision is enforced in real time.

Destructive-action blocking

When an agent tries to drop a database table, format a disk, transfer money, mass-email customers, escalate IAM permissions, or bulk-export data, Cerberus blocks the call on the critical path. Every block is recorded and routed for approval.

Database: DROP / TRUNCATEFilesystem: rm -rf, formatUnauthorized paymentsMass communicationsIAM escalationBulk data exfil

Tool allow-listing

Define what each agent is permitted to call, with parameter checks for the tools you do allow. An agent that suddenly tries to invoke a tool outside its allow-list is stopped before the call leaves your perimeter.

Per-agent allow-listParameter validationAnomalous-tool detection

Regulated-content flagging

Catches medical, legal, financial, discriminatory, or offensive content in agent responses. Doesn't claim to judge truth — flags the category, attaches a disclaimer, or routes for review based on your policy.

Medical claimsLegal adviceFinancial recommendationsDiscriminatory language

Sensitive-data scanning on outputs

The same scanner that runs inbound also runs on every response. Catches data the model produces — not just data the user typed. Mask, redact, or block based on what your policy requires.

Generated PIIMask · redact · blockPer-agent override

Uncertainty signal

Surfaces hedging language, contradictions, and abnormally short or long responses for human review — without blocking by default. Useful for catching the model bluffing in high-stakes domains.

'I think' · 'maybe'Self-contradictionsAbnormal length signal

Drift detection

Flags when an agent's behavior changes from its baseline — different tools, different topics, different language patterns. Often the first signal something has gone wrong upstream.

Per-agent baselineTool driftTopic driftLanguage drift