OpenClaw Cron CLI Hangs After Success: Safe Fix Guide

Fix OpenClaw cron commands that succeed but hang afterward. Learn how to verify real job state, avoid duplicate retries, diagnose misleading status, and recover safely.

Troubleshooting

Problem statement

You run an OpenClaw cron command, it looks like the requested action happened, and then the CLI just sits there. Maybe cron add appears to create the job, but never exits. Maybe cron list --json prints part of what you need and then hangs. Maybe a one-shot job seems registered, but later it vanishes or never fires. That combination is dangerous because it makes operators trust the wrong signal.

The real question is not whether the command exited cleanly. The real question is whether the job was persisted, whether it is still scheduled, and whether the work actually completes when it runs. Fresh issue reports over the last week show all three failure shapes: commands that succeed but hang, one-shot cron jobs that disappear silently, and isolated cron runs that report success even when they return incomplete work.

If you respond by retrying the same command several times, you can turn one control-path bug into a duplicate-job mess. If you trust a shallow status too quickly, you can miss that the real run never delivered what you expected. The safest fix is to separate command behavior from job truth, then verify the actual cron state step by step.

Evidence from the field

We have a useful first-party reference point here because our hosted cron integration was built against live OpenClaw behavior, not against a guess about how cron should work. In our own implementation work on hosted cron management, we confirmed three details that matter for this class of incident.

  • Cron jobs are persisted inside the instance at /home/node/.openclaw/cron/jobs.json.
  • The canonical control surface is the gateway RPC API, including cron.status, cron.list, cron.add, cron.update, cron.run, and cron.remove.
  • Safer integrations go through backend-to-gateway RPC instead of editing cron storage files directly.

That is why this failure pattern is so important. When the CLI becomes unreliable after the operation itself, the right recovery path is not to poke the jobs file manually or keep smashing the same command until the terminal returns. The right path is to verify whether the gateway accepted the change, whether persistence reflects it, and whether the scheduled execution behaves the way you intended.

The public reports from the last few days line up with exactly that operational distinction. One report describes cron commands succeeding but hanging. Another shows --at jobs arming and then disappearing without firing. Another shows isolated cron runs marked ok even though the task only produced a partial result. These are not the same bug, but they produce the same operator trap: a control surface that encourages false confidence.

What usually causes this

1. The command path and the cron state path drift apart

The CLI may successfully submit a request, but hang while waiting for a response tail, cleanup step, or follow-up confirmation that never arrives. In that case, the job can exist even though the terminal looks stuck.

2. One-shot schedules are more fragile than recurring jobs

A one-time schedule gives you only one chance to validate timing, persistence, and next-run computation. If the scheduling layer loses that next fire time, the job can appear armed and then disappear before you notice.

3. "Status ok" does not always mean "work complete"

A run can finish at the transport or session layer while still failing at the practical task level. That is especially important for cron jobs that depend on long model responses, failover chains, or channel delivery formatting.

4. Duplicate retries create secondary damage

When an operator assumes the first command failed and immediately retries, the instance may end up with multiple near-identical jobs, conflicting schedules, or a job that gets removed after the wrong identifier is targeted.

Diagnostics: prove the real state before you change anything

Work through these checks in order. Do not skip from a hung CLI straight to deleting files or rebuilding cron state.

Step 1: Assume the operation may already have succeeded

If a cron add, edit, or remove call hangs after you submit it, pause for a moment and treat the terminal as untrusted. The worst next move is to fire the same command again without inspection.

Step 2: Check the authoritative job list

Use the most direct available job-list view tied to gateway RPC behavior. In practice that means checking the hosted dashboard if you have it, or running a separate read-only list/status command rather than repeating the original write command. The goal is to answer a simple question: does the job exist with the values you intended.

Step 3: Compare against persisted jobs

If you have shell access, inspect /home/node/.openclaw/cron/jobs.json. You are not editing it. You are using it as a persistence check. If the job is present there, the add probably succeeded even if the CLI never exited.

jq '.' /home/node/.openclaw/cron/jobs.json

Look for the exact schedule type, payload, delivery mode, and enabled flag you expected. If the file matches the requested change, treat the original command as operationally successful and move on to verification instead of retrying.

Step 4: Validate next execution behavior

For recurring jobs, confirm the next run time advances correctly. For one-shot jobs, confirm there is still a pending fire time and that the job has not silently disappeared. If you only check existence, you can miss that a bad timer state makes the job inert.

Step 5: Inspect the run outcome, not just the scheduler outcome

If the job did fire, verify the practical result. Did it produce the full answer, trigger the expected delivery, and complete within the timeout budget. A green scheduler flag is not the same as a useful task outcome.

Step-by-step recovery

Recovery path A: the job exists and looks correct

  1. Stop retrying the original command.
  2. Record the job identifier and the current schedule details.
  3. Run a read-only verification pass through status or list.
  4. Trigger a controlled manual run only if that makes sense for the job.
  5. Confirm the run output, timeout behavior, and delivery behavior separately.

In this case the problem is usually the CLI control path, not job creation itself. Your safest response is disciplined verification, not more writes.

Recovery path B: the job exists, but the payload or schedule is wrong

  1. Do not patch the JSON file by hand.
  2. Use one clean update through the canonical control surface.
  3. Immediately verify the persisted state after the update.
  4. Run a short validation schedule or controlled manual trigger before trusting the job in production.

Editing storage directly can leave you with a state that looks correct in the file but is out of sync with how the scheduler has already loaded the job.

Recovery path C: the job vanished or one-shot timing looks broken

  1. Capture the intended schedule and payload outside the instance.
  2. Recreate the job once, carefully.
  3. Prefer a short verification window so you can watch the full lifecycle.
  4. Confirm the job remains registered until the expected fire time.
  5. After it runs, confirm both execution and final cleanup behavior.

If the schedule is critical, avoid depending on a fragile one-shot path until you have watched at least one clean test from creation through completion.

Edge cases that waste the most time

  • Assuming a hung remove failed: you retry, then accidentally remove the recreated job too.
  • Trusting partial delivery: the run "worked" but the user only received a chopped summary or the wrong amount of text.
  • Ignoring timeout budget: a long-running agent task may need a different timeout and failover shape than a short system event.
  • Mixing storage edits with RPC edits: this creates debugging noise instead of clarity.
  • Using one-shot jobs for critical tasks without rehearsal: you find timer problems only after the window is gone.

How to verify the fix

You have resolved this problem when all of the following are true:

  • The intended job appears exactly once in the authoritative list.
  • The persisted job definition matches the requested change.
  • The next run time or one-shot timing behaves as expected.
  • A controlled run produces the complete intended outcome.
  • You can make one additional small cron change without creating duplicate entries or conflicting state.

That final check matters. The goal is not only to recover one job. The goal is to restore trust in the control path you use day to day.

Fix once. Stop recurring cron control-path incidents.

If this keeps coming back, you can move your existing setup to managed OpenClaw cloud hosting instead of rebuilding the same stack. Import your current instance, keep your context, and move onto a runtime with lower ops overhead.

  • Import flow in ~1 minute
  • Keep your current instance context
  • Run with managed security and reliability defaults

If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.

OpenClaw import first screen in OpenClaw Setup dashboard (light theme) OpenClaw import first screen in OpenClaw Setup dashboard (dark theme)
1) Paste import payload
OpenClaw import completed screen in OpenClaw Setup dashboard (light theme) OpenClaw import completed screen in OpenClaw Setup dashboard (dark theme)
2) Review and launch

Typical mistakes

  • Retrying the same add command three times because the first terminal did not return.
  • Declaring success because the job exists, without verifying it executes correctly.
  • Declaring failure because the CLI hangs, even though the scheduler already accepted the change.
  • Editing jobs.json directly and creating a harder-to-diagnose scheduler mismatch.
  • Using production cron runs as the first test instead of validating on a short controlled schedule.

When reliability matters more than cron archaeology

If you are spending more time proving cron state than using it, that is the real signal. Review OpenClaw cloud hosting, compare the operating tradeoffs on the self-hosted vs managed page, and start from OpenClaw Setup if you want a clearer path into production. When browser-driven tasks are part of the workflow, keep Chrome Extension relay in the design so browser automation stays attached to the right control model.

FAQ

Can I fix this by editing the cron jobs file directly?

That should be a last-resort forensic step, not your standard recovery method. Our own hosted integration work treats gateway RPC as the canonical control surface for a reason. Direct file edits are too easy to get wrong.

Should I stop using one-shot cron jobs?

Not necessarily. But you should test them more carefully than recurring jobs because they have less room for operator error and less room for timer drift to reveal itself before the useful moment passes.

What if the run says ok but the task output is incomplete?

Treat that as a real failure. Increase verification depth. Check timeout settings, delivery behavior, and whether the run was expected to perform longer model work than the current execution lane reliably supports.

What if my team needs cron jobs but not shell-level debugging?

Standardize on a safer control surface, keep job definitions documented, and reduce the number of ad hoc cron writes in production. That discipline prevents most duplicate-job incidents even before an upstream fix lands.

Cookie preferences