OpenClaw missing chunk after upgrade: safe recovery without data loss
Problem statement: you upgrade OpenClaw with npm while the gateway is still alive. A few minutes later the
gateway cannot shut down cleanly, commands hang, or the logs mention a JavaScript file under dist/ that no longer exists.
The instance may look half-upgraded: the new package is on disk, the old process is still in memory, and recovery is blocked by a
hash-named chunk that disappeared during the install.
This guide is for operators who need the instance back quickly without deleting sessions, secrets, cron jobs, channel config, or browser state. The fix is not "keep reinstalling until it works." The fix is to separate process state from package state, preserve the data directories, stop the process that still points at the old chunks, and restart from one coherent installation.
- On May 4, 2026, OpenClaw issue #77087 documented a gateway that could not recover from npm chunk-hash replacement during an in-place upgrade. The report describes old chunk names being replaced while graceful shutdown still depends on the now-missing file.
- The 2026.5.4 release notes include multiple gateway startup and performance changes, including deferred sidecars, reduced hot-path imports, startup phase spans, and plugin-loader error preservation. That release context matters because upgrade incidents can expose both package replacement behavior and genuine startup regressions.
- In hosted OpenClaw operations, the durable recovery pattern is to snapshot config before package changes, keep a clear last-known-good path, and verify channel delivery after restart. A green process check alone is not enough; we also verify a model turn and at least one configured channel.
- Related same-week reports include high file descriptor failures during spawn (#77750) and gateway performance/status work in the release notes. Those are separate bugs, but they point to the same operational rule: capture evidence before forcing restarts.
What the failure looks like
The visible symptom is usually vague: OpenClaw updated, then the gateway stopped responding normally. The useful symptom is the exact missing module or missing file line. You may see a path under the globally installed package, a hash-suffixed JavaScript chunk, and a stack trace triggered during shutdown, config repair, plugin loading, or gateway restart. The old process still has code loaded, but not every lazy import was already loaded before npm replaced the package files.
Hash-named chunks are common in bundled Node applications. They let the build split code into smaller files. They also make in-place replacement risky when a long-running process can still request a file by its old generated name. If that file has been removed, the process can fail exactly when you need it to stop cleanly.
Do this first: preserve evidence and state
Before you run another install command, save enough information to recover and to understand what happened. You do not need a long forensic exercise, but you do need the exact error and the data paths that matter.
- Copy the first missing-file stack trace, not only the final retry error.
- Record the current OpenClaw version, the package manager used, and the install path returned by the shell.
- Snapshot config, auth profile files, session storage, and cron definitions before deleting anything.
- Pause scheduled jobs that could trigger new turns while the gateway is inconsistent.
- Tell team channels that replies may be delayed until recovery is complete.
Likely causes
Missing chunk incidents usually come from one of four causes. Treat them separately so you do not fix the wrong thing.
- In-place global npm replacement: the package directory changed while the gateway process was still running.
- Mixed binary paths: the shell points to one OpenClaw install, while the service manager starts another.
- Interrupted install: the new package directory is incomplete because the install was killed or disk space ran out.
- Plugin path drift: an externalized plugin or migrated bundled plugin points to an old path after the release-channel sync.
Recovery path: get to one coherent runtime
The goal is simple: one OpenClaw process, one package install, one config root. Do not keep a half-upgraded process alive while trying to repair lazy imports underneath it.
1. Stop active work without starting new turns
If the Control UI still works, stop long-running tasks from the UI. If it does not, pause external triggers first: cron jobs, webhooks, and noisy channels. This prevents new agent turns from writing partial state while you are recovering the gateway.
2. Save a small backup of the important state
Preserve your OpenClaw config directory, auth profiles, session files, and any custom skill or plugin config. You do not need to copy the global package directory unless you are debugging the package itself. The data directories are what keep the assistant's identity, channels, cron jobs, and memory intact.
3. Stop the old gateway process
A graceful stop may fail if it imports the missing chunk. If that happens, stop the process at the service or process-manager level. The important part is that no old OpenClaw process remains alive after package replacement. Avoid a loop where a supervisor restarts the old process repeatedly while you are inspecting the install.
4. Verify the binary and package path
Check which binary your shell will run and which binary your service manager will run. These must agree. If one path comes from npm and another from a different package manager, fix the service command before starting OpenClaw again. Mixed paths create the same symptoms as a broken upgrade even when the package is fine.
5. Repair the package only if the install is incomplete
If the new package directory is missing many expected files, reinstall the same target version once. Do not bounce between versions blindly. If the package is complete, reinstalling is less useful than stopping the old process and starting the new one cleanly.
6. Start the gateway and verify behavior by layer
After restart, verify the gateway, then the model path, then channels. A basic health endpoint proves the process is alive. It does not prove Telegram, Discord, Slack, WhatsApp, browser control, or cron delivery is healthy.
- Gateway reports ready and stays ready for several minutes.
- A new chat turn returns a short answer with the expected model.
- Configured channels can receive and deliver one test reply.
- Cron jobs remain present but do not run until you re-enable them intentionally.
- The logs no longer mention missing chunk files under the previous package build.
Fix once. Stop recurring upgrade recovery work.
If this keeps coming back, you can move your existing setup to managed OpenClaw cloud hosting instead of rebuilding the same stack. Import your current instance, keep your context, and move onto a runtime with lower ops overhead.
- Import flow in ~1 minute
- Keep your current instance context
- Run with managed security and reliability defaults
If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.
Edge cases that change the fix
The service restarts too quickly
Disable automatic restart while you repair the install. A tight crash loop can hide the original stack trace and write noisy logs over the evidence you need. Re-enable supervision only after one clean manual start succeeds.
The package path is correct but plugins fail
Treat plugin failures as plugin migration issues, not as proof that the whole gateway is corrupt. Check whether the failing plugin is bundled, externalized, installed from npm, or installed from ClawHub. Then update or reinstall that plugin through the supported path.
The process is alive but replies are slow
Missing chunk recovery may expose a second issue: event-loop pressure, file descriptor exhaustion, or provider latency. If the missing file error is gone but replies remain slow, switch to timing diagnostics instead of repeating the upgrade repair.
Typical mistakes
- Deleting the wrong directory: removing session or auth data while trying to remove the package.
- Restarting before pausing triggers: letting cron or channels start new work during recovery.
- Ignoring the service binary path: testing one binary in the shell while the service uses another.
- Reinstalling repeatedly: replacing files again without first stopping the old process that references old chunks.
- Calling it fixed after health only: skipping model, channel, and cron verification.
Prevention: stop before upgrade
The safest upgrade runbook is boring: drain, stop, snapshot, upgrade, start, verify. For a personal instance, that may take five minutes. For a team instance with live channels, it prevents a confusing half-upgraded state where the old gateway can still receive messages but cannot load the files it needs for cleanup.
If your assistant is becoming part of daily operations, treat upgrades like production changes. Keep a short rollback note, record the package version you started from, and verify one real user path after the change. That discipline is what turns OpenClaw from an exciting local agent into a dependable workflow system.
FAQ
Can I fix this without losing my agents?
Yes, if you preserve the data directories and only repair the runtime package. Agents, channels, cron jobs, and memory are not supposed to live inside the global npm package directory.
Is this the same as a normal gateway crash loop?
Not exactly. A normal crash loop can come from invalid config, port conflicts, plugin errors, or host limits. A missing chunk after an upgrade specifically points to package files changing while a process or service still expects the previous build's files.
When should I move the instance instead of repairing it?
Move it when upgrades repeatedly interrupt live work, when you cannot preserve a reliable rollback path, or when channel uptime matters more than controlling the host yourself. OpenClaw Setup is designed for teams that want the agent without owning every runtime upgrade.