Blog

OpenClaw Run Now says enqueued=true, but nothing actually runs

Problem statement: you click Run Now in Cron UI, receive a success response with enqueued: true, but no execution appears in cron run history and no worker activity starts. This is one of the most frustrating reliability failures because the interface says success while your business process quietly does nothing.

Recent technical signal

A fresh GitHub report documents this exact behavior in production-like usage: UI returns ok: true, enqueued: true and still no run file is created. That confirms this is not an isolated local misconfiguration and should be handled as a real incident pattern.

What is failing under the hood

In healthy flow, manual trigger does four things in order: validate cron definition, enqueue execution, persist run metadata, and dispatch worker processing. In this incident pattern, validation and queue acknowledgment happen, but execution state never materializes as an actual run. This can come from a regression in the manual trigger code path, queue-to-worker handoff mismatch, or environment-specific behavior where isolated targets are accepted but not instantiated.

Why this matters: silent non-execution is worse than a visible crash. A crash triggers immediate response. False success creates delayed detection, missed customer actions, and broken trust in automations. Your response plan must optimize for early detection and controlled fallback, not just eventual patching.

Fast diagnosis: 10-minute incident triage

  1. Confirm scheduler health first: run openclaw status and verify gateway/worker components are healthy. If base services are down, this is not the same incident.
  2. Test one known-good scheduled job: wait for the next expected schedule tick. If scheduled runs continue but manual trigger fails, you have path-specific regression.
  3. Inspect run artifacts: check cron run history in UI and storage path. If no new run record exists, the failure occurs before execution lifecycle creation.
  4. Compare target type: note whether the job uses isolated target/session target settings. Several recent failures cluster around manual invocation with isolated execution routing.
  5. Capture one reproducible request: save request timestamp, job id, and API response body for maintainers. This dramatically reduces time-to-fix.

Step-by-step recovery plan that keeps delivery moving

Step 1: split production jobs and test jobs

Do not debug against the same cron that handles customer-facing delivery. Clone the job into a short-interval staging schedule (for example every 5 or 10 minutes), and keep production cadence unchanged. This avoids a second incident created by troubleshooting itself.

Step 2: replace manual trigger with controlled fallback

If your team uses Run Now for urgent dispatch, route urgent dispatch through a known-working path temporarily. Most teams either (a) trigger the same business function through sessions API or (b) reduce cron interval during incident window. The key is to maintain predictable execution while isolating the broken button path.

Step 3: add explicit run verification after every trigger

Treat “enqueued=true” as insufficient proof. After trigger, verify that a run record exists and state changes from queued to running/completed. If your team has dashboards, add a temporary guardrail alert: “manual trigger acknowledged but no run record after N seconds.”

Step 4: pin incident evidence before patching

Before applying updates, record current environment: install method, OS, runtime versions, and one failing job definition snapshot. Without this, teams often lose the ability to verify that the update actually fixed the same root problem.

Step 5: validate fix with acceptance criteria

  • Run Now creates a visible run record every time.
  • At least five consecutive manual triggers succeed for the same job.
  • Scheduled runs remain healthy while manual path is restored.
  • No duplicated runs appear after retry clicks.
  • Execution latency is stable versus your pre-incident baseline.

Edge cases teams miss (and pay for later)

Edge case 1: Double-click storms. Operators often click Run Now repeatedly after no visible progress. If delayed queue dispatch resumes later, that can produce duplicate actions (duplicate messages, duplicate tickets, repeated webhooks). Add operator guideline: one click, then verify artifact creation.

Edge case 2: target mismatch after config edits. A cron copied between environments may keep stale target references. Validation may pass because shape is valid, but runtime resolution points to an unavailable target. Always test a newly cloned cron in the destination environment before relying on it.

Edge case 3: incident masked by retries. If your business action has retries at another layer, you may not notice missed manual runs until customer SLAs slip. Monitor time to first execution, not only eventual success.

How to verify the incident is truly closed

  1. Run a 24-hour observation with both manual and scheduled triggers active.
  2. Measure success ratio and median trigger-to-start latency for each path.
  3. Review logs for queue accept without run creation anomalies.
  4. Confirm no missed business events tied to manual emergency triggers.
  5. Write post-incident note with what changed and which safeguards remain permanent.

Need faster reliability without cron firefighting?

If your team is repeatedly debugging scheduler edge cases instead of shipping product work, move to a setup that gives you predictable operations and a clean migration path.

Common mistakes during remediation

  • Treating enqueue acknowledgment as proof of business completion.
  • Hot-editing production crons while incident is active.
  • Skipping explicit acceptance criteria after upgrading.
  • Testing only one environment and assuming global resolution.
  • Ignoring run-history gaps because scheduled jobs “mostly work.”

FAQ

Should we disable cron while debugging this?

Usually no. Keep scheduled production jobs alive if they are healthy. Isolate testing to a cloned staging cron and only pause production if duplicate or unsafe actions become likely.

Is this only a Windows issue?

The documented report uses Windows, but the safer assumption is path-level regression until proven otherwise. Validate on your actual deployment stack and avoid overfitting your conclusion to one OS.

Can this break SLAs even if scheduled jobs still run?

Yes. Many teams use manual triggers for urgent customer operations or recovery actions. Losing manual trigger reliability can still violate response-time commitments.

What internal links should our team review next?

Start with the core OpenClaw setup guide, then evaluate self-hosted vs managed operations and the managed hosting path if incident load is recurring.

Sources

Fix once. Stop recurring cron manual trigger incidents.

If this keeps coming back, you can move your existing setup to managed OpenClaw cloud hosting instead of rebuilding the same stack. Import your current instance, keep your context, and move onto a runtime with lower ops overhead.

  • Import flow in ~1 minute
  • Keep your current instance context
  • Run with managed security and reliability defaults

If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.

OpenClaw import first screen in OpenClaw Setup dashboard (light theme) OpenClaw import first screen in OpenClaw Setup dashboard (dark theme)
1) Paste import payload
OpenClaw import completed screen in OpenClaw Setup dashboard (light theme) OpenClaw import completed screen in OpenClaw Setup dashboard (dark theme)
2) Review and launch
Cookie preferences