OpenClaw Shaved 98% Off Cold Starts. Then Found It Was Still Wasting Five Seconds Per Turn.

Every AI company on the planet is talking about latency right now. Anthropic touts sub-second streaming. OpenAI benchmarks time-to-first-token. Google measures cold starts in milliseconds. And here's OpenClaw — 150,000 GitHub stars, hundreds of production deployments, the de facto open-source AI agent platform — taking ten seconds to respond to a Discord message because nobody had ever profiled the inbound path.

Let that land. Ten seconds. Not because of the model. Not because of the network. Because of the import graph. Vincent Koc finally pointed a profiler at the inbound reply path this week, and what he found wasn't a bottleneck. It was architectural negligence dressed up as Node.js module resolution.

The 169 MB Problem

PR #52082 is labeled “size: XL” and that's an understatement. The core issue: when OpenClaw's inbound reply path cold-started, it eagerly loaded 169 MB of heap through monolithic module imports. Session maintenance warnings, Discord session keys, directive handling — all of them pulled in the entire dependency tree whether they needed it or not.

Koc's fix is architectural surgery. He extracted lightweight primitive helpers into separate files, created .runtime.ts boundary files that gate expensive modules behind dynamic import() calls, and moved rarely-used execution paths behind lazy-loading boundaries. Six new runtime boundary files. Session-fork became async. Call sites got await.

The result: a session maintenance warning that used to load 169 MB in 9.9 seconds now loads 1.9 MB in 122 milliseconds. A Discord session key lookup dropped from 5.5 seconds and 169 MB to 313 milliseconds and effectively zero heap.

The Five-Second Rebuild Nobody Noticed

While Koc was profiling import graphs, he found another problem. Every embedded agent turn was rebuilding models.json from scratch — re-reading auth profiles, checking file mtimes, resolving provider catalogs. Five seconds per turn. In a chatbot. Where users expect sub-second responses.

PR #52077 adds a fingerprint-based cache. The cache keys on target file path, stores fingerprints combining runtime config, auth-profile mtimes, and models.json mtimes. It coalesces concurrent callers behind the same in-flight promise. It auto-invalidates when file mtimes change.

Net savings: 4.4 to 5.0 seconds per embedded turn in steady-state. The first-partial response time dropped from 10.2 seconds to 5.2 seconds. Still not fast. But half as slow.

The Cache That Lived in the Wrong House

Meanwhile, Tak Hoffman was debugging why openclaw --dev configure felt sluggish when selecting web search providers. The answer: memoization was sitting at an overly-broad shared layer instead of the specific resolver paths that needed it.

PR #52018 relocates snapshot caching into web-search-providers.runtime.ts and provider-wizard logic, using a three-level WeakMap structure keyed on config, env, and workspace. Forty-five CI checks passed. The interactive picker stopped lagging.

The Security Footnotes

Both caching PRs drew security flags from automated review. The models.json cache embeds config-embedded secrets in memory via stableStringify(). The plugin cache serializes API keys as cache keys without hashing. Neither has a TTL or size bound.

These are rated low-severity because the data lives in process memory anyway. But they're the kind of shortcuts that accumulate. Today it's a process-global Map with no eviction. Tomorrow it's a memory leak in a long-running Discord bot that manages fifty agents.

What's Left

Koc flagged the next targets in PR #52082's description: the agent execution stack at agent-runner.ts (~4.1s / 161.5 MB) and pi-embedded.ts (~4.35s / 161.6 MB). Both need the same runtime boundary treatment. The first-partial response is still above five seconds.

The honest assessment: OpenClaw just went from unusably slow to merely slow. The 98% reduction sounds incredible because the baseline was incredible — incredibly bad. A session warning that takes ten seconds to initialize isn't a performance target; it's an accident that nobody measured.

So What?

Here's what nobody in the open-source AI space wants to hear: the gap between “we have an agent platform” and “we have a production-grade agent platform” is exactly this kind of work. Not new features. Not model integrations. Import graph surgery and cache fingerprinting and WeakMap hierarchies. The boring stuff that Anthropic and OpenAI pay platform engineers six figures to obsess over, and that open-source projects defer until someone like Koc gets fed up enough to run a profiler.

Credit where it's due: Koc and Hoffman did the profiling work that should have happened a year ago, and they shipped the fixes in a single afternoon. The project is measurably faster. But “faster than broken” isn't the same as fast. And the next two targets Koc identified — agent-runner.ts at 4.1 seconds and pi-embedded.ts at 4.35 seconds — suggest we'll be back here in a week with another round of numbers that should embarrass everyone.