All News
NewsInfrastructure

OpenClaw's Test Suite Was Leaking 35 Megabytes Per Run and Nobody Noticed for Months

March 21, 20264 min read
+35 MB
heap growth per worker
3
new doctor modules
7
CI batches (was 1)
45s
batch time budget

Changes at a Glance

#51884by vincentkocCI Infrastructure

Recycle unit-fast CI batches

Vitest workers leaked memory through transform caches. Fix: recycle into 45-second time-bounded batches in CI.

#51876by vincentkocRefactoring

Continue extracting shared doctor helpers

The doctor command's second decomposition pass. Three new focused modules with dedicated tests. Security scanner flagged a bypass.

#51409by dongzhenyeUX Fix

Make up-to-date package status explicit

The update command's output was ambiguous enough that cron agents couldn't tell if an update was available. Now it says 'up to date' explicitly.

Nobody writes articles about CI pipelines. That's the problem.

The most important infrastructure in any open-source project is the one that tells you whether your code works. When that infrastructure quietly degrades — when test workers leak memory, when build times creep upward, when flaky tests get ignored instead of fixed — the entire project's quality starts eroding from the inside.

Three PRs merged on March 21 are the kind of maintenance work that never trends on Hacker News. They're also the kind of work that determines whether a project with 150K stars is genuinely healthy or just popular.

The Memory Leak Was in Vitest Itself

Vincent Koc took heap snapshots of OpenClaw's unit-fast CI workers and found something ugly. One worker held stable memory. The other grew continuously. The culprit: JSArrayBufferData allocations from Vite's SSR transform cache, growing by 35.04 MB per run. Every test file transformed got cached. The cache never cleared. The worker never recycled.

The fix is pragmatic, not elegant: a new splitFilesByDurationBudget helper divides test files into time-bounded batches, targeting 45 seconds each. In CI, what was one long-lived worker becomes seven short-lived ones. Each batch gets a fresh process. The transform cache dies with the process. Local behavior is unchanged — developers still get a single test lane.

The PR also lightens supporting modules: stub objects replace full plugin instances in test targets, and a stale Discord import in the schema help file gets cleaned up. These are the micro-cleanups that prevent the next memory leak.

The Doctor Command's Second Surgery

OpenClaw's doctor command — the built-in diagnostic tool that checks your installation's health — was a monolithic function that nobody wanted to touch. Last week, PR #51753 started splitting it into provider modules. This week, PR #51876 continues the extraction with three new focused helpers:

  • exec-safe-bins.ts — scanning and repairing executable safe-bin profiles
  • legacy-tools-by-sender.ts — deprecated tool-sender configurations
  • default-account-warnings.ts — missing default-account alerts

Each module gets dedicated unit tests. The Aisle security scanner flagged a medium-severity issue: the doctor's auto-repair creates empty profile objects that bypass safe-bin argument restrictions. That's the kind of finding that only emerges when code gets modularized — when a function is buried in a 400-line monolith, nobody audits its side effects.

Greptile also caught duplicated helper functions in one new module that already existed elsewhere, with subtle behavioral differences. This is the classic refactoring discovery: you split code apart and find that past developers solved the same problem twice, slightly differently, and now you have to decide which version is canonical.

When Cron Jobs Can't Read Your Output

PR #51409 is the smallest of the three, and in some ways the most telling. OpenClaw's update command displayed this when your package was current:

Update pnpm · npm latest 2026.3.13

Is that telling you an update is available? Or that you're already on the latest? If you're a human, you might guess. If you're a cron agent parsing stdout, you have no idea.

Contributor dongzhenye added an explicit “up to date” label when local and npm versions match. Seven tests cover both git and package-manager installations. A follow-up commit prevents duplicate labels when a git installation also matches npm. It's a two-commit fix for a problem that should have been caught in the original implementation.

But it wasn't. Because the original implementation was tested by humans reading terminal output, not by automation parsing it. And that gap — between human-readable and machine-parseable — is one of the oldest, most persistent failures in developer tooling.

The Work Nobody Celebrates

These three PRs will generate zero tweets. They solve no customer-facing problem. They add no feature to any changelog. But they're the difference between a project that scales and one that collapses under its own weight.

Memory leaks in test infrastructure mean flaky CI, which means developers stop trusting green builds. Monolithic diagnostic tools mean nobody adds new health checks, which means problems go undetected. Ambiguous CLI output means automation breaks, which means operators go back to doing things manually.

OpenClaw fixed all three on the same day. That's not glamorous. It's just good engineering.

Deploy OpenClaw without managing the infrastructure yourself

DeployClaw handles hosting, updates, and scaling. Upstream fixes ship automatically.

DeployClaw News · Reporting by Carlos Simpson · DeployClaw hosts OpenClaw instances. Upstream fixes ship automatically. This publication covers development independently.