We Built a Research Agent That Forgot How to Research

Our research agent started recommending the same three Ronin findings on loop.

It should have been hunting down agent frameworks, comparing protocol quirks, flagging edge cases in virtual economies. Instead, markethunter was producing dense, detailed reports on Fishing Frenzy's trading volume and Ronin Arcade's anti-bot measures — useful once, redundant the third time, actively distracting by the fifth. The research library grew, but the thinking narrowed. We'd accidentally built a system that could write great reports while forgetting why it was researching in the first place.

This matters because research diversity determines opportunity surface area. If every agent in the fleet keeps scanning the same markets, reading the same documentation, and surfacing the same findings, we're flying blind to everything else. The whole point of directed research is to follow threads — not to write research papers about threads we've already pulled.

The failure showed up in the orchestrator logs first. Markethunter was dutifully recording new source candidates — gw2.app for Guild Wars 2 items, poe.ninja for Path of Exile trading, FRAGBACK.gg for CS:GO skins. Ten source candidates in one batch, all tagged gaming_items, all logged April 30th at 11:48:28. Good coverage, solid signals. But when we looked at the research findings feeding back into the fleet, we kept seeing Ronin. Fishing Frenzy's community engagement. Ronin Arcade's referral bonuses. Mavis Market integration support.

Nothing wrong with those findings individually — they're solid intel on how virtual economies handle bots, reward distribution, and developer tooling. But they weren't advancing the research frontier. The agent was recursing on what it already knew instead of exploring what it didn't.

So what went wrong? The research pipeline had no memory of what it had already reported. Markethunter could find new sources, but the system that turned those sources into actionable findings had no mechanism to ask “have we already covered this?” Research diversity relies on two things: breadth of input and variance of output. We had breadth. We didn't have variance.

The fix wasn't obvious. We could hard-filter duplicate topics, but that risks killing legitimate follow-up work. We could decay the weight of recently covered topics, but that assumes recency is the right signal — sometimes you should revisit a finding when new context arrives. We could track which findings informed which decisions and down-weight findings that never connected to action, but that punishes exploratory research.

We went with topic decay with an escape hatch. The research agent now tracks when a topic was last surfaced and applies exponential back-off to repeat coverage — but only for findings that haven't triggered a decision or experiment change. If Ronin findings keep coming up because they're actually driving fleet behavior, they stay in rotation. If they're just echoing in the void, they fade.

The behavioral shift showed up fast. Within two research cycles, we started seeing findings on agent commerce patterns in non-blockchain games, security models for rate-limited APIs, and economic design in games with emergent player-driven markets. The library still grows, but now it grows outward instead of deeper into the same three wells.

Here's the tradeoff: we're trading deterministic coverage for exploratory sprawl. A system that re-examines the same topic five times will never miss a detail. A system that decays familiar topics might miss the one critical update buried in the noise. We're betting that missing an update in a known area hurts less than never discovering the unknown area in the first place.

The real test isn't whether the research agent writes good reports. It's whether the fleet stops converging on the same opportunities everyone else is chasing. Because if we're all reading the same docs and surfacing the same findings, we're not researching — we're just taking notes.