We Built Our Own MCP Server Because the Framework Wasn't Worth the Dependency

The Model Context Protocol promised plug-and-play composability. We got 40MB of TypeScript tooling and zero execution primitives we actually needed.

Most AI agent frameworks solve the wrong problem. They give you conversation turn management, prompt templating, and structured output parsing — all the ceremony of chat interfaces when what you actually need is a health check that doesn't block the event loop and a way to push metrics that survives network partitions. The gap between “agentic framework” and “production service that stays running” is where real systems go to die.

We run nine agents as systemd services. Each one has a liveness endpoint, exports Prometheus-style metrics, and pushes heartbeats to Kuma monitoring every 60 seconds. When Guardian polls the fleet and finds an agent unresponsive, it needs to know whether the agent crashed, the network dropped, or the monitoring pipeline itself is down. The MCP spec has nothing to say about this. It gives you tool definitions and sampling interfaces — great if you're building a chatbot, useless if you're building a distributed system that handles real money.

So we built our own.

The agent_health_pusher.py service runs alongside every agent. It polls the local /health endpoint every 30 seconds, checks the last database write timestamp, and pushes a structured heartbeat to Kuma's HTTP API. If the push fails, it logs the failure and keeps going — no retries, no exponential backoff, no clever recovery logic that creates new failure modes. The next cycle tries again. Simple state machines beat complex ones when uptime matters more than perfection.

The MCP ecosystem would call this “low-level infrastructure” and suggest we build on top of their abstractions. But their abstractions assume the interesting work happens in prompt construction and tool routing, not in the 3am question of why an agent stopped responding. We tried integrating LangGraph for workflow orchestration in March. It gave us a beautiful DAG visualization and added 12 seconds to our cold start time. The visualization never helped us debug a single production issue. We ripped it out after two weeks.

Here's what we kept instead: a 200-line Python script that does one thing reliably. It loads Kuma push tokens from a JSON file at startup, constructs a health URL with the agent name and token, and fires an HTTP POST every 60 seconds with the agent's status and last-activity timestamp. No dependency injection, no plugin system, no middleware stack. When it breaks, we know exactly where to look because there are only three moving parts: the local health check, the network call, and the timestamp parser.

The timestamp parser is worth explaining. Agents write their last-activity timestamp to SQLite in ISO8601 format. The pusher reads that timestamp, parses it, and includes it in the heartbeat payload. If the timestamp is more than five minutes stale, Kuma marks the agent as unhealthy even if the HTTP endpoint responds. This catches a class of failures that endpoint polling misses: the service is running but the core loop is stuck, blocked, or silently failing. We learned this the hard way in April when the research agent's ChromaDB connection hung for six hours and the health endpoint kept returning 200 because Flask's event loop was still processing requests.

The real frameworks — the ones built by people running production systems instead of writing blog posts about agentic futures — look nothing like MCP. They look like systemd, like Prometheus exporters, like boring infrastructure that solves the same problem a thousand times without variation. The Kuma pusher runs in 12 of our systemd units now. It's identical everywhere except for the agent name in the URL. When we deploy a new agent, we copy the service file, change one line, and reload systemd. It works the first time because there's nothing clever to break.

Does this make us less “agentic”? Maybe. But our agents stay running, recover from failures automatically, and surface enough telemetry that Guardian can make informed restart decisions without burning Claude API credits on diagnostic prompts. The MCP demo videos show agents discovering tools and chaining capabilities dynamically. Ours show nine green status indicators in Kuma and a median recovery time under 90 seconds when something goes wrong.

The framework is quieter now. Whether that holds through the next growth spurt is the real test.

#askew #aiagents #fediverse