OpenClaw OpenRouter empty replies: how to diagnose billed requests with no answer
Problem statement: OpenClaw accepts your prompt, OpenRouter shows real usage, tokens are billed,
and sometimes openclaw models status --probe even looks healthy, but the user still gets
Agent couldn't generate a response or a blank assistant turn. This is one of the worst failure modes
because it looks like the model is alive while the actual reply path is broken.
-
GitHub issue #67575 was opened on
2026-04-16 after OpenClaw sent requests to OpenRouter successfully, OpenRouter billed the calls, but the
user saw no reply across CLI, TUI, and Telegram. Logs showed
incomplete turn detectedwithpayloads=0. -
GitHub issue #66768 was opened on
2026-04-14 after operators running
ghcr.io/openclaw/openclaw:2026.4.14saw completed assistant turns with empty content and zero token usage, while a downgrade to2026.4.12restored expected behavior. - Across OpenClaw Setup operating patterns, the reliable lesson is simple: a successful probe is not enough. We treat real end-to-end reply delivery as the acceptance test, because auth, model reachability, and user-visible output can fail in different layers.
What this failure usually means
In a normal OpenClaw flow, several things have to work in sequence. The gateway must route the request to the right provider, the provider must return a valid completion, OpenClaw must parse that result into an assistant payload, and the payload must reach the final surface, such as Telegram, the CLI, or the built-in chat. Empty reply incidents often mean the break happened in the middle of that chain, not at the beginning.
That distinction matters. Many operators waste time rotating API keys or rebuilding provider configuration when the upstream request already succeeded. If OpenRouter shows usage and your direct API tests work, you are no longer debugging provider access. You are debugging reply handling, version behavior, transcript state, or a regression in how OpenClaw turns model output into a user-visible message.
How to recognize the empty-turn pattern fast
- OpenRouter usage appears normal, including request count or spend.
- OpenClaw probe commands pass, which creates false confidence.
- The user-facing result is blank or generic, usually “Agent couldn't generate a response.”
- Logs mention incomplete turn or payloads=0.
- The problem reproduces across multiple surfaces, such as CLI plus Telegram, not just one UI.
- A downgrade or older image behaves better, which strongly suggests a release-level regression.
Likely causes, in order of probability
1) A release regression in turn assembly
The cleanest clue in the fresh reports is version sensitivity. Issue #66768 points to 2026.4.14
behaving badly while 2026.4.12 works. That does not guarantee every empty reply is caused by the
exact same bug, but it does tell you not to assume your local setup is uniquely broken. If a known-good build
restores replies, you should treat that as meaningful evidence, not a coincidence.
2) Valid provider output that OpenClaw fails to surface
The 2026-04-16 report is especially useful because the operator validated the same OpenRouter credentials and model outside OpenClaw with a direct API call. That narrows the problem sharply. The model was not dead. The API key was not invalid. The failure happened after the provider call returned. In practice, that means your investigation should move toward runtime parsing, transcript generation, or response normalization.
3) Broken session or transcript state
Empty assistant turns often leave behind damaged session evidence: an assistant message with empty
content, zero usage, or a misleading stop reason. Once that state exists, operators can start
debugging the wrong layer. That is why the safest containment step is to test with a fresh session after each
change instead of trusting one poisoned transcript to recover itself.
4) Surface-specific symptoms hiding a deeper routing failure
Telegram, TUI, and CLI can all look slightly different while still sharing the same root cause. One surface may show a generic error, another may appear idle, and another may display a blank turn. Do not assume these are separate incidents. If the same model and runtime produce no usable assistant payload anywhere, treat the issue as a shared execution-path failure until proven otherwise.
Step-by-step diagnostic flow
Step 1: prove whether OpenRouter is really the broken layer
Start with the fastest falsification test. Send a minimal direct request to OpenRouter using the same API key and model family. If that succeeds, stop spending time on auth theory. Your next task is proving where OpenClaw loses the reply.
- Check OpenRouter usage dashboard for matching timestamps.
- Use the same model ID that OpenClaw is configured to use.
- Save the exact provider response for comparison.
Step 2: inspect logs for incomplete-turn clues
Follow OpenClaw logs during a fresh reproduction. The signal you want is not just “an error happened.” You are looking for lines showing a completed stop reason but zero payloads, incomplete turns, or empty assistant content. Those clues separate this incident from ordinary timeout or auth failures.
Step 3: inspect the session transcript, not just the console
If a session transcript contains an assistant object with empty content and zero token usage, that is strong evidence that the model response never became a usable output. At that point the user-facing surface is only reporting the last stage of the failure. The transcript tells you where the execution really stopped.
Step 4: compare one fresh session on the current build and one on a known-good build
Version comparison is the fastest way to decide whether you are dealing with local misconfiguration or a fresh regression. If a downgraded image or earlier package release restores normal replies using the same provider and the same prompt, document that before you change anything else.
Step 5: isolate the surface
Reproduce in two places, not one. If CLI and Telegram both fail the same way, your problem is almost never a Telegram formatting issue. If one surface works and another fails, then you can narrow the incident to the delivery layer. This saves a lot of useless channel-level debugging.
Practical containment steps that actually help
Use a temporary known-good version
When a fresh release turns working prompts into empty replies, the business-safe move is containment, not stubbornness. Pin to the last version that passes a real end-to-end reply test. This is especially important if OpenClaw is driving customer-facing or operationally important workflows.
Reset the test session between changes
Do not keep reusing the same session transcript while you change versions, models, or provider settings. A fresh session gives you cleaner evidence and avoids misreading leftover broken state as a live failure.
Keep your test prompt boring
Use a minimal prompt like “say hello” during incident work. Complex tool-using prompts create more branches and make logs harder to compare. Your first goal is not to prove all capabilities. It is to prove that a plain assistant reply can survive the full execution path.
Document model, version, and surface together
Empty-turn bugs are easy to report badly. Always capture the exact OpenClaw version, install method, model ID, provider chain, and which surfaces failed. That level of detail made the fresh GitHub reports valuable. Vague “OpenRouter is broken” reports do not.
Fix once. Stop recurring OpenRouter empty replies.
If this keeps coming back, you can move your existing setup to managed OpenClaw cloud hosting instead of rebuilding the same stack. Import your current instance, keep your context, and move onto a runtime with lower ops overhead.
- Import flow in ~1 minute
- Keep your current instance context
- Run with managed security and reliability defaults
If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.
How to verify the incident is actually closed
- Run the same short prompt in a fresh session on the target version.
- Confirm the visible reply contains real assistant text.
- Check the transcript for non-empty assistant content and non-zero usage where expected.
- Repeat on a second surface such as Telegram or built-in chat.
- Watch one production workflow long enough to ensure the fix survives real traffic, not just one test turn.
Typical mistakes that slow recovery
- Treating a successful probe as proof that the user-visible reply path is healthy.
- Rotating API keys before checking whether OpenRouter already processed the request.
- Reusing the same broken session transcript after every config change.
- Testing only Telegram and assuming the model itself is broken.
- Upgrading repeatedly during an active incident without documenting which build last worked.
When managed hosting becomes the rational next step
Not every OpenClaw deployment needs managed hosting. But this incident pattern is a good test of your current tolerance for runtime drift. If critical workflows can silently fail while still consuming model spend, and if your team keeps losing time to version archaeology, the real cost is no longer just infrastructure. It is the decision tax of constantly re-proving that the stack is trustworthy.
If that sounds familiar, compare your current setup against managed versus self-hosted tradeoffs, review OpenClaw cloud hosting, or import the instance you already run instead of rebuilding the same fragile path again.
FAQ
Can this happen only with OpenRouter?
No. The general pattern is “provider call succeeds, user-visible reply fails.” OpenRouter is just the clearest fresh example because recent reports included upstream billing proof and transcript evidence.
Should I switch models first or switch versions first?
Switch versions first if you already suspect a release regression. Model hopping without a version comparison creates noise and often hides the real fault line.
Does a fresh install guarantee a clean test?
No. One of the fresh reports came from a clean setup, which is exactly why version-level investigation mattered. A fresh install only removes some local drift. It does not protect you from a bad release.