OpenClaw's Discord Agents Kept Running After Users Walked Away. For Hours.
A two-line wiring fix. A dedup cache that ate messages. An agent config system that shipped without per-agent defaults. Three PRs from March 22 that expose how little thought went into the most important question in AI agents: what happens when they're supposed to stop?
March 22, 20266 min read
Here's a horror story for the AI infrastructure crowd. You deploy an OpenClaw agent on Discord. A user interacts with it. The interaction times out on Discord's side — maybe the user closed the app, maybe the network hiccuped, maybe Discord's own worker hit its limit. The Discord connection dies. The user moves on.
Your agent doesn't.
The ACP session — the autonomous cognitive process running behind that Discord interaction — keeps going. It's making API calls. It's consuming tokens. It's writing to memory. It's doing everything it was asked to do, for a user who left the building. And it will keep doing it until it hits the session timeout, which, if you recall from a previous article, was set to ten minutes by default. That's ten minutes of ghost compute per abandoned interaction.
PR #52148 · dutifulbob
The abort signal that never arrived
Before
1. Discord worker receives message
2. Dispatches to ACP manager via runTurn()
3. Discord worker times out → abortSignal fires
4. ACP session never receives the signal. Keeps running.
After
1. Discord worker receives message
2. Dispatches to ACP manager with signal: abortSignal
This is the part that gets me. The acpManager.runTurn() method already accepted a signal parameter. The abort machinery was built. The graceful termination path existed. Somebody designed it. Somebody tested it. Then nobody wired it up.
The Discord dispatch code called runTurn() without passing the signal. The ACP session started with no awareness that it could be cancelled. The mechanism was there. The connection wasn't. It's the exact same class of bug as the clientTools fix from CharZhou — everything works at both ends, but the wire in the middle is missing.
Meanwhile, the Dedup Guard Was Eating Messages — PR #51950
Takhoffman's fix is about the opposite problem: messages that should have been processed but weren't. OpenClaw's Discord deduplication cache marked messages as “seen” the moment they arrived — before preflight checks, before worker processing. If the worker failed, the message was already in the dedup cache. The 5-minute TTL meant no retry was possible for five full minutes.
The fix moves the dedup claim after bot-self filtering but before the debounce queue, and releases the claim on failure. It's the difference between “optimistic” and “correct.”
Agents That Couldn't Remember Their Own Settings — PR #51974
vincentkoc's PR adds per-agent defaults for thinking, reasoning, and fast mode. Before this, every agent in a multi-agent setup inherited the same global defaults. If you wanted Agent A to think and Agent B to run fast, you couldn't configure that at the agent level. You had to override it per-session or per-message.
A code reviewer flagged that the initial implementation let an agent configured with both reasoningDefault: "on" and a non-off thinkingDefault trigger simultaneous reasoning and thinking — producing internal blocks visible to users. vincentkoc added a guard. The review worked. The original design didn't.
The Lifecycle Question Nobody's Answering
I keep coming back to the same concern. OpenClaw is building an autonomous agent platform. The agents are supposed to run for extended periods, make decisions, call tools, interact with external services. That makes lifecycle management the single most important architectural concern. When does an agent start? When does it stop? Who decides? What happens to in-flight work?
And here we are, in March 2026, fourteen months after multi-agent shipped, fixing the fact that agents couldn't be stopped from Discord and couldn't remember their own preferences. The abort signal machinery was already there. The per-agent config schema was already there. Nobody connected the dots.
10 min
Default agent timeout
5 min
Dedup TTL that blocked retries
0
Abort signals forwarded before fix
dutifulbob's fix covers both ACP dispatch paths: normal turns and tail-after-reset sequences. But the Greptile review raised a cosmetic point that's actually architectural: the conditional spread pattern ...(params.abortSignal ? { signal: params.abortSignal } : {}) suggests the signal is treated as an afterthought — something bolted on — rather than a first-class parameter of every agent turn. When your abort path looks like an optional plugin, your architecture is telling you something.
The zombies are dead. The dedup guard releases on failure. The agents remember their settings. But if this is the state of lifecycle management on the platform's most mature integration, I have questions about the ones that are less mature.
DeployClaw News · Investigation by Carlos Simpson
DeployClaw hosts OpenClaw instances. Upstream fixes ship automatically. This publication covers development independently.