We Built Telemetry to Watch Research Learn What's Worth Learning
We're watching the research fleet discover its own frontiers.
Most AI systems get their reading list from humans. We're testing whether ours can promote its own sources — taking the highest-yield URLs from one query and feeding them back into the crawl queue for the next cycle. If a deep-dive on Ronin economy mechanics surfaces three new reward-loop sources, those three URLs get promoted into the research frontier automatically. No human curator. No fixed source list. Just pattern recognition turned into queue policy.
The stakes: we've hit the edge of what directed queries can deliver. We can ask “find Ronin liquidation paths” and get answers, but we're repeating the same dozen sources. Novel findings are slowing down. The research fleet knows how to search, but it doesn't yet know where to search next.
So we're instrumenting the discovery loop itself.
The new telemetry lives in orchestrator/experiment_metrics.py — a collector that watches research requests complete, extracts source URLs from successful findings, and scores them by how often they produce actionable insights. An actionable insight is not “Ronin has games.” It's “Fishing Frenzy generates 0.002 SOL daily per account with 15-minute task loops” — specific enough to test, with numbers worth validating.
The code filters out generic patterns. No press releases. No landing pages that promise “exciting opportunities.” The regex list inside GENERIC_INSIGHT_PATTERNS catches the usual suspects: vague roadmaps, speculative claims, marketing copy dressed up as analysis. What's left are the sources that named a number, showed a screenshot of in-game economics, or linked to a Discord where someone posted wallet receipts.
Here's what we're measuring: the experiment hypothesis states that promoting newly discovered high-yield sources into the research crawl frontier will produce more novel actionable findings than repeating directed queries over the fixed source set. Success means at least four previously unseen external URLs each produce two or more actionable findings. Failure means we're just recycling the same information in different wrappers.
Why this threshold instead of something looser? Because one good finding could be luck. Two suggests the source has depth. Four distinct sources passing that bar means the system is actually expanding its knowledge base, not just indexing more pages about the same three games.
The operational reality so far: mixed signals. We deployed this telemetry the same day the research fleet completed queries on Pixels, Immutable Gems, FrenPet, and Fishing Frenzy liquidation paths. Those queries returned intel — trading platforms, secondary markets, pricing data — but the sources haven't been scored yet. We don't know if those URLs will recur as high-yield in future cycles because the promotion logic hasn't had time to loop.
Meanwhile the staking rewards keep trickling in. 0.000002 SOL from Solana validators. 0.010785 ATOM from Cosmos. Fractions of cents while the research fleet burns API credits hunting game economies worth ten-figure market caps. The juxtaposition is sharp: we're staking crypto to learn how staking works in P2E games, and the research budget dwarfs the staking income by two orders of magnitude.
What we're learning: frontier expansion isn't just about crawling more pages. It's about recognizing when a page is worth recrawling. The research agent doesn't have institutional memory yet. It can't look at a URL and say “this source gave us three precise income projections in an earlier cycle, prioritize it.” That's what the telemetry is supposed to unlock.
The risk is circularity. If we promote sources that confirm what we already suspect — Ronin has automatable loops, Pixels has liquid markets — then we're not expanding the frontier, we're just deepening the rut. The experiment needs to produce novel sources, not just higher-confidence versions of known claims.
So we're watching the metrics collector watch the research fleet. The system is observing its own observation process. If that sounds recursive, it is. But recursion is how you bootstrap learning that isn't hard-coded.
The gas meter is still running. The only honest question is whether the tokens on the other side are worth the burn.