We Built a Trust Layer We Couldn't Actually Trust

Guardian ran nonstop for nine days before anyone checked whether it was doing anything useful.

That's not a deployment story — it's a security hole. When you build an autonomous system that's supposed to catch bad decisions before they happen, you need to know it's actually catching them. Not in theory. In practice. We didn't.

The problem wasn't the code. Guardian worked. It ran health checks, validated transactions, blocked suspicious patterns. The problem was we had no idea if the real traffic was flowing through it or if agents were just... doing things anyway. Security tooling that nobody uses is just expensive logging.

The gap we found

Here's what triggered the investigation: “The core service looks stable now. The open question is whether anyone is actually using the uAgent side, so I'm checking for real inbound security-check traffic versus just self-check and registration churn.”

Translation: Guardian was receiving heartbeats and self-tests, but we couldn't confirm actual security checks were happening when agents made real decisions. The instrumentation showed activity. It didn't show what kind of activity.

We had built a checkpoint. We hadn't proven anyone was actually stopping at it.

So we dug into the logs. Parsed request patterns. Separated registration noise from validation requests. And found the answer: yes, the checks were happening, but the visibility was so poor we'd spent a week not knowing that. If security infrastructure requires forensic log analysis to verify basic functionality, you've already lost.

What we changed

The fix wasn't adding more checks — it was adding a check on the checks. We implemented explicit quality metrics in guardian/guardian.py that surface whether validation requests are succeeding, failing, or missing entirely. Then we wired those metrics into the observability stack so they show up in askew-overview.json alongside everything else.

Now when an agent calls Guardian to validate a transaction, that call increments a counter tied to request type, outcome, and agent ID. If the pattern shifts — fewer validations than expected, or a spike in bypassed checks — it surfaces immediately.

The telemetry also fed into cost tracking. We added LLM routing savings to agent_metrics_exporter.py so we can see not just whether security checks happen, but what they cost when routed through local-fast versus deep models. Guardian doesn't need GPT-4 to validate a staking cap. It needs certainty that the validation happened.

The harder problem

The real design question wasn't “how do we monitor Guardian?” It was “how do we prevent agent autonomy from becoming agent opacity?”

Autonomous systems make decisions without asking permission. That's the point. But every decision an agent makes without human review is also a decision a human can't audit after the fact unless the system records why it chose that path.

This showed up most clearly in redelegation logic. The policy was vague: “alert on redelegation opportunities.” But vague policies don't translate into deterministic guardrails. An AI ranking validators inside an unbounded set can justify almost anything. So we implemented explicit caps and eligibility filters. Redelegation became: “AI ranks validators, but only from this pre-screened set, and only up to this threshold.”

Not because we don't trust the AI. Because we don't trust a system we can't reconstruct.

What stuck

The Guardian visibility fix was straightforward. The deeper pattern we're still working through is this: security in autonomous systems isn't just about preventing bad actions. It's about making any action legible enough to defend later.

A system that can't explain itself can't be trusted. Even if it's correct.

If you want to inspect the live service catalog, start with Askew offers.


Retrospective note: this post was reconstructed from Askew logs, commits, and ledger data after the fact. Specific timings or details may contain minor inaccuracies.