Codex Was Killing Agents That Were Still Working
Binary stuck-or-active classification couldn't distinguish a genuinely stalled session from one running a long model inference
Three States Instead of Two
Until now, OpenClaw's Codex system had a blunt instrument for monitoring session health: a session was either active or stuck. If a session looked quiet for too long, the recovery system assumed it was stale and killed it. The problem was that “quiet” and “stuck” aren't the same thing. An agent running a complex model inference, executing a long tool chain, or processing an embedded run could sit without producing visible output for minutes — and still be perfectly healthy.
The new classification splits session liveness into three distinct states. session.long_running covers sessions with active model, tool, or embedded work that's progressing normally. session.stalled identifies sessions where work exists but has stopped advancing. session.stuck remains for genuinely stale sessions with queued work and zero active operations.
The system now tracks embedded run execution, model inference calls, tool invocations, and Codex progress events. Recovery only triggers for truly stale sessions — those with queue depth but no active work. A session running a ten-minute model call will no longer get axed at the five-minute mark because the monitoring system couldn't tell busy from broken.
There's a caveat worth noting: the activity tracking maps could accumulate unbounded session entries over time. The implementation doesn't include TTL pruning or reference counting yet, which means long-running deployments may need to watch memory consumption around this feature.
Heartbeat Monitoring Upgrades From Text Parsing to Structured Tool Calls
Agents can now make deliberate notification decisions via explicit tool calls instead of encoding control flow in output text
The End of HEARTBEAT_OK
OpenClaw's background heartbeat system has been running on a convention: agents would output text like HEARTBEAT_OK to signal whether background checks should notify users. It worked, barely. The problem was that tool-capable AI models — particularly those running under the Codex harness — couldn't make structured decisions about notifications. Control flow was encoded in final text output, creating a weak contract that broke easily when models didn't follow the convention precisely.
The new heartbeat_respond tool replaces that convention with explicit, structured fields: outcome for status, notify as a boolean for user notification, summary for concrete status text, optional notificationText for custom notification content, reason for explanation, priority for classification, and nextCheck for scheduling the next heartbeat.
The practical effect is that an agent monitoring a service can now decide “this needs attention, notify the user at normal priority, and check again in 30 minutes” through a single structured tool call instead of trying to encode all of that in a text string. The old text fallback remains for legacy runs, and explicit user configuration still takes precedence over defaults.
A related change makes Codex harness source replies default to the message tool when messages.visibleReplies isn't explicitly configured. The agent finishes its Codex turn privately and only posts to the channel when it deliberately calls message(action='send'). This prevents accidental message leakage during background processing.
Config Files Can Now Include Content From External Directories
A new OPENCLAW_INCLUDE_ROOTS environment variable lets operators approve external config sources with symlink-escape protection
Breaking Out of Config Jail, Safely
OpenClaw's $include directive for config files has always been confined to the application's config directory. That meant shared config fragments — common agent defaults, organization-wide model settings, team policy files — had to be copied into each instance's config directory. No symlinks. No references to shared mounts. Just copies.
The new OPENCLAW_INCLUDE_ROOTS environment variable lets operators specify approved root directories for config includes. The mechanism uses two layers of validation: lexical path checking against the approved roots, and runtime realpath validation during file reads to prevent symlink escapes. If a symlink tries to escape an approved root, the read fails securely with a specific error — no silent fallback, no ambiguous behavior.
The implementation distinguishes between ENOENT (file not found) and EACCES/ELOOP/EIO errors. Permission issues and symlink loops fail hard rather than silently falling through. When the path checks pass and the realpath confirms the file is within bounds, the include resolves normally. Default confinement to the config directory remains unchanged when the environment variable isn't set.
Doctor Command Clarifies What It Will and Won't Auto-Repair
Gateway service installs require interactive confirmation; lifecycle operations like restarts remain automatic
Drawing the Line Between Safe and Destructive Repairs
OpenClaw's doctor command has a --repair flag that applies recommended fixes and a --non-interactive flag for CI and automation pipelines. The documentation has been vague about which repairs fall into which category, leading to uncertainty about what would happen when operators ran doctor --fix --non-interactive in production.
The updated documentation now draws a clear boundary. Missing gateway service installations and stale service definition rewrites require interactive confirmation — they won't execute in non-interactive mode. Gateway LaunchAgent bootstrap, service starts, service restarts, and legacy service cleanup remain auto-repairable without confirmation. The distinction: creating or rewriting service definitions is potentially destructive, while managing existing service lifecycle is routine maintenance.
The previous wording suggested that all service repairs were blocked in non-interactive mode, which was broader than the actual implementation. Operators running automated health checks can now confidently use --non-interactive knowing that routine lifecycle repairs will execute while destructive definition changes will be reported but skipped.
Sandbox Setup Scripts Were Invisible to npm Users
Three critical Docker setup scripts only ship with source checkouts, but documentation pointed everyone to them
The Scripts That Didn't Ship
OpenClaw's sandbox system uses Docker containers to isolate agent code execution from the host machine. Three setup scripts — sandbox-setup.sh, sandbox-common-setup.sh, and sandbox-browser-setup.sh — build the Docker images that make sandboxing work. The documentation referenced these scripts as the standard setup path. The problem: they don't exist for anyone who installed OpenClaw via npm.
The scripts are deliberately excluded from the npm package via the package.json files allowlist. They're development artifacts that only make sense with a full source checkout. But the documentation didn't make that distinction. Users who ran npm install -g openclaw and followed the sandboxing guide hit file-not-found errors with no explanation.
The fix spans five documentation files across the gateway and installation guides. Each reference to the setup scripts now includes a source-checkout qualifier, and the primary sandboxing page adds inline Docker build commands as an alternative for npm users. The Ansible and Docker installation guides received matching updates. It's a straightforward fix for a frustrating gap — the kind of documentation bug that makes users feel like they're doing something wrong when the tooling simply wasn't packaged for their install path.
Documentation Changes at a Glance
docs/extensions/codex/diagnostics.md
majorThree-state session liveness classification (long_running, stalled, stuck) documented with activity tracking details
docs/agents/heartbeat.md
majorNew heartbeat_respond tool schema and structured fields documented, text fallback behavior preserved
docs/gateway/configuration.md
majorOPENCLAW_INCLUDE_ROOTS environment variable and symlink-escape protection documented
docs/cli/doctor.md
updatedInteractive vs auto-repairable service repair categories clarified for --repair and --non-interactive flags
docs/gateway/sandboxing.md
updatedSource-checkout qualifiers added, inline Docker build commands for npm users
docs/gateway/config-agents.md
updatedSource-checkout requirements for sandbox scripts clarified
docs/install/docker.md
updatedSandbox script references updated with install-path context
docs/install/ansible.md
updatedSandbox script references updated with install-path context
The Nuance Tax
Thursday's documentation batch is about replacing blunt instruments with precise ones. Binary session monitoring becomes three-state. Text-convention heartbeats become structured tool calls. All-or-nothing config confinement becomes allowlisted expansion. A vague “repair mode” becomes two clearly separated categories. A one-size-fits-all sandbox guide becomes install-path-aware.
Each change individually is modest. Together, they reveal something about where OpenClaw is in its maturity arc. Early projects ship features. Maturing projects ship the distinctions that features should have had from the start. The session diagnostics work is the clearest example: killing a busy agent and killing a stuck agent are fundamentally different operations, and the system finally knows the difference. The cost of these refinements is complexity — three states to monitor instead of two, structured schemas instead of simple strings, two-layer path validation instead of a directory check. Whether that complexity pays for itself depends on scale. For operators running a single personal instance, the binary approach was fine. For teams running production Codex deployments, these distinctions are the difference between reliability and frustration.