OpenClaw's Test Suite Was Leaking 35 Megabytes Per Run and Nobody Noticed

Nobody writes articles about CI pipelines. That's the problem.

The most important infrastructure in any open-source project is the one that tells you whether your code works. When that infrastructure quietly degrades — when test workers leak memory, when build times creep upward, when flaky tests get ignored instead of fixed — the entire project's quality starts eroding from the inside.

Three PRs merged on March 21 are the kind of maintenance work that never trends on Hacker News. They're also the kind of work that determines whether a project with 150K stars is genuinely healthy or just popular.

The Memory Leak Was in Vitest Itself

Vincent Koc took heap snapshots of OpenClaw's unit-fast CI workers and found something ugly. One worker held stable memory. The other grew continuously. The culprit: JSArrayBufferData allocations from Vite's SSR transform cache, growing by 35.04 MB per run. Every test file transformed got cached. The cache never cleared. The worker never recycled.

The fix is pragmatic, not elegant: a new splitFilesByDurationBudget helper divides test files into time-bounded batches, targeting 45 seconds each. In CI, what was one long-lived worker becomes seven short-lived ones. Each batch gets a fresh process. The transform cache dies with the process. Local behavior is unchanged — developers still get a single test lane.

The PR also lightens supporting modules: stub objects replace full plugin instances in test targets, and a stale Discord import in the schema help file gets cleaned up. These are the micro-cleanups that prevent the next memory leak.

The Doctor Command's Second Surgery

OpenClaw's doctor command — the built-in diagnostic tool that checks your installation's health — was a monolithic function that nobody wanted to touch. Last week, PR #51753 started splitting it into provider modules. This week, PR #51876 continues the extraction with three new focused helpers:

exec-safe-bins.ts — scanning and repairing executable safe-bin profiles
legacy-tools-by-sender.ts — deprecated tool-sender configurations
default-account-warnings.ts — missing default-account alerts

Each module gets dedicated unit tests. The Aisle security scanner flagged a medium-severity issue: the doctor's auto-repair creates empty profile objects that bypass safe-bin argument restrictions. That's the kind of finding that only emerges when code gets modularized — when a function is buried in a 400-line monolith, nobody audits its side effects.

Greptile also caught duplicated helper functions in one new module that already existed elsewhere, with subtle behavioral differences. This is the classic refactoring discovery: you split code apart and find that past developers solved the same problem twice, slightly differently, and now you have to decide which version is canonical.

When Cron Jobs Can't Read Your Output

PR #51409 is the smallest of the three, and in some ways the most telling. OpenClaw's update command displayed this when your package was current:

Update pnpm · npm latest 2026.3.13

Is that telling you an update is available? Or that you're already on the latest? If you're a human, you might guess. If you're a cron agent parsing stdout, you have no idea.

Contributor dongzhenye added an explicit “up to date” label when local and npm versions match. Seven tests cover both git and package-manager installations. A follow-up commit prevents duplicate labels when a git installation also matches npm. It's a two-commit fix for a problem that should have been caught in the original implementation.

But it wasn't. Because the original implementation was tested by humans reading terminal output, not by automation parsing it. And that gap — between human-readable and machine-parseable — is one of the oldest, most persistent failures in developer tooling.

The Work Nobody Celebrates

These three PRs will generate zero tweets. They solve no customer-facing problem. They add no feature to any changelog. But they're the difference between a project that scales and one that collapses under its own weight.

Memory leaks in test infrastructure mean flaky CI, which means developers stop trusting green builds. Monolithic diagnostic tools mean nobody adds new health checks, which means problems go undetected. Ambiguous CLI output means automation breaks, which means operators go back to doing things manually.

OpenClaw fixed all three on the same day. That's not glamorous. It's just good engineering.

OpenClaw's Test Suite Was Leaking 35 Megabytes Per Run and Nobody Noticed for Months

Changes at a Glance

Recycle unit-fast CI batches

Continue extracting shared doctor helpers

Make up-to-date package status explicit

The Memory Leak Was in Vitest Itself

The Doctor Command's Second Surgery

When Cron Jobs Can't Read Your Output

The Work Nobody Celebrates

Deploy OpenClaw without managing the infrastructure yourself