All News
InvestigationPlatform Reliability

OpenClaw's WhatsApp Users Filed Bugs for Months. The Culprit Was a Build Tool Nobody Audited.

Here's a story about three bugs that have nothing in common except the thing that matters most: nobody was looking.

March 22, 20268 min read

Let me tell you what I think about when I hear the words “AI agent platform.” I think about the WhatsApp users who spent months filing bug reports that said “outbound messages don't work,” and the maintainers who couldn't reproduce it, and the patch scripts the community wrote because they couldn't wait for an official fix. I think about the disconnect between a project with 150,000 GitHub stars and a build system that silently duplicates runtime state into seven separate copies.

That's not a typo. Seven copies. Of the same Map. In the same Node.js process.

Evidence A · PR #47433

The WhatsApp Listener That Couldn't Listen

OpenClaw's WhatsApp integration stores active web listeners in a module-level Map. When you send a message, the system looks up the listener. Simple enough — until Rolldown's code-splitting optimizer decided that module should exist in seven separate chunks.

Each chunk got its own Map instance. Chunk A registered the listener. Chunk D tried to find it. The Map was empty. “No active WhatsApp Web listener.” Auto-replies worked because they bypassed the Map entirely, using direct socket references. Outbound messages failed. Every time.

The fix, by clawdia67, is three lines: pin the Map to globalThis with a namespaced key. The first-loading chunk creates it. Every subsequent chunk reuses it. That's it.

Three lines. Months of broken outbound messaging. Community patch scripts circulating on GitHub because the official release hadn't shipped.

The Fix

const GLOBAL_KEY =
  '__openclaw_wa_listeners';

globalThis[GLOBAL_KEY] =
  globalThis[GLOBAL_KEY]
    ?? new Map();

const listeners =
  globalThis[GLOBAL_KEY];

All seven chunks now share one Map.

“The build tool was doing exactly what it was designed to do. The problem is nobody checked whether what it was designed to do was compatible with how the code actually worked.”

Evidence B · PR #52150

Zero Means Zero, Except When It Means “Create a Poll”

If you sent a message through OpenClaw with pollDurationHours: 0 in the payload — a perfectly reasonable default from serialized tool schemas — the system interpreted that zero as poll creation intent. Your message.send call would fail with: “Poll fields require action 'poll'; use action 'poll' instead of 'send'.”

This is the oldest class of bug in programming: falsy-value confusion. JavaScript's 0 is finite, so it passed the “is this a real number?” check. The guard tested for existence, not intent. Bartok9 fixed it with a value > 0 check. The PR closed issue #52118 and added regression tests.

I've covered AI platforms for years. The biggest threats to adoption are never the headline failures. They're the ones where a developer wires up an integration, sends a message, gets an error about polls they never created, and quietly switches to a different platform. You never hear from them. You just see the churn.

Evidence C · PR #52193

The Test Suite That Couldn't Test Itself

While the WhatsApp bug lived in production for months and the poll bug silently broke integrations, the gateway test suite — the thing that should have caught both — was itself failing. Forty-eight tests across thirteen files.

ImLukeF's PR #52193 doesn't fix any product bugs. It fixes the tests. Specifically: Vitest execution-order conflicts that caused seeded session stores to vanish between test runs. The fix moves testState into vi.hoisted() so mock factories can access it during module initialization, and adds on-disk config persistence for session store paths.

The result: forty-eight failing tests dropped to twenty-three. That's progress. It's also an admission that a project with 150,000 stars was merging PRs into a codebase where more than a quarter of the gateway tests were red.

48

failing tests before

23

failing tests after

13

files affected

The Pattern Nobody Wants to Talk About

I've watched the AI infrastructure space for long enough to recognize this pattern. A project ships a feature — WhatsApp support, poll integration, multi-agent orchestration — and the launch gets the stars and the blog post and the Hacker News thread. Then the feature sits in production for months, accumulating subtle failures that the test suite can't catch because the test suite is broken too.

The people who fix these bugs — clawdia67 with the globalThis singleton, Bartok9 with the zero-value guard, ImLukeF with the Vitest hoisting fix — don't get launch posts. They get squash-merged into a commit hash.

Every AI company I talk to says reliability is their differentiator. Very few of them can tell me what their test failure rate was last week. OpenClaw at least has the data. The data says: 48 out of — well, the PR doesn't say the total. That tells you something too.

The bottom line

Three bugs. Three contributors. Zero fanfare. The WhatsApp singleton should have been caught by the build pipeline's integration tests. The poll parameter bug should have been caught by type-level validation. The test instability should have blocked merges until it was fixed. None of that happened. The fixes are good. The process that allowed the bugs to live this long is the actual story.

DeployClaw News · Investigation by Carlos Simpson

DeployClaw hosts OpenClaw instances. Upstream fixes ship automatically. This publication covers development independently.