OpenClaw subagent env vars missing: how to fix provider keys without leaking secrets
Problem statement: a provider works from the OpenClaw CLI, but the same model fails when a
spawned subagent tries to use it. The usual symptom is a provider auth error such as
401 Unauthorized - Invalid API-key provided, even though the key is present in your service file
or shell.
-
GitHub issue #82481 reports a concrete
case where direct CLI inference returned HTTP 200, but a spawned subagent using the same provider failed
with
401 Unauthorized. - The reported setup used a systemd environment variable, a provider config with an env reference, and a subagent model route. That is exactly the kind of boundary where a shell test can pass while a child runtime fails.
- The security review on the issue called out the important constraint: fixing this by forwarding the full host environment is risky. Provider credentials need a narrow handoff, not a blanket copy of everything the gateway can see.
What is actually failing
Environment variables are easy to misunderstand in OpenClaw because there are several execution contexts. Your terminal, the gateway service, the provider resolver, a native subagent, and an ACP process can all have different views of the environment. A successful command in your login shell proves that one context has the key. It does not prove that every child runtime has the same key.
This matters most for providers configured with values like ${ALIBABA_CLOUD_API_KEY},
${QWEN_API_KEY}, ${OPENAI_API_KEY}, or a custom provider-specific secret.
If the provider resolver sees the variable, the request signs correctly. If the spawned runtime sees an empty
value, the same model route becomes an auth failure.
Why CLI success is not enough
A direct CLI test often runs from the same shell where you exported the API key. A gateway managed by systemd, Docker, a process manager, or a hosted runtime may not inherit that shell. A spawned subagent can add one more boundary. The result is a confusing split: the provider appears healthy during manual testing, but automation fails as soon as OpenClaw delegates work.
The fastest way to avoid wasted debugging is to test the exact path that failed. If the incident happened in a subagent, do not stop after a CLI probe. Reproduce with the same action that creates the child session, then compare logs from the parent and child paths.
Common causes
1. The key exists only in your interactive shell
Adding export PROVIDER_KEY=... to a terminal session does not update a systemd service that is
already running. It also does not update Docker containers, scheduled jobs, or long-lived gateway processes.
If OpenClaw was started before the variable existed, the runtime may never see it.
2. The gateway has the key, but the child runtime does not
This is the pattern reported in the GitHub issue. The parent process may have enough configuration to run one provider call, while the spawned child path loses the env-backed credential reference. That can happen through runtime isolation, process spawn filtering, or a provider config path that is not shared with child sessions.
3. ACP deliberately strips provider credentials
Some child process paths scrub known provider auth variables to reduce secret exposure. That is usually a good security default. The problem is not that OpenClaw should blindly keep every variable. The problem is that the operator needs a predictable way to make approved provider credentials available to the specific runtime that needs them.
4. A custom provider uses a nonstandard variable name
Built-in providers often document expected variable names. Custom providers may use names that are valid in one local setup but invisible to compatibility checks, migrations, or helper tooling. When a custom route fails, verify both the provider config and the runtime variable name.
Safe diagnostic workflow
- Write down the failing path.
Record whether the failure happens in CLI, built-in chat, cron, native subagent, ACP, Telegram, Slack, or a custom tool. - Run one direct provider test.
Confirm that the provider and model can answer from the environment you believe is configured. - Inspect the service runtime, not just your shell.
For systemd, check the unit environment and restart state. For Docker, check the container environment and recreate the container after changes. - Reproduce with a spawned subagent.
Use a tiny prompt such assay helloon the same provider route. If the child fails while the parent succeeds, you have confirmed a runtime-boundary problem. - Read logs for the first auth failure.
Look for missing-key, invalid-key, empty credential reference, unauthorized, or provider-specific auth messages. - Test a narrow credential handoff.
Prefer a documented provider config, managed secret, or runtime-specific allowlist over forwarding the entire process environment.
Reference checks for self-hosted Linux
These examples are for a self-hosted Linux service. Adjust names and paths for your install method. Do not paste secret values into shared tickets, screenshots, or public logs.
# Check whether the current shell has the variable.
printenv ALIBABA_CLOUD_API_KEY >/dev/null && echo "shell has key"
# Inspect a systemd unit without printing the secret value.
systemctl show openclaw --property=Environment
# Restart after changing service-level environment.
systemctl daemon-reload
systemctl restart openclaw
# Confirm the gateway process was restarted recently.
systemctl status openclaw --no-pager
# Run a direct model test from the intended runtime context.
openclaw infer model run --provider alibaba-token-plan --model qwen3.6-plus "say hello" Docker and process manager edge cases
Docker Compose users often change an .env file and then run docker compose restart.
That is not always enough to reload env-file values into an existing container. If your OpenClaw container was
created with old variables, recreate it after changing provider secrets.
# See variable names visible inside the running container without printing values.
docker compose exec openclaw sh -lc 'env | cut -d= -f1 | sort | grep -E "QWEN|ALIBABA|OPENAI|ANTHROPIC"'
# Recreate the service when env-file values changed.
docker compose up -d --force-recreate openclaw PM2, Supervisor, and custom launch scripts have similar traps. A variable exported in your terminal may not be part of the managed process environment. Restart through the same process manager that owns the gateway.
The security line: do not pass the whole environment
It is tempting to solve the problem by telling every child process to inherit everything from the gateway. That is a bad default for an agent runtime. The gateway process may have provider keys, channel tokens, database credentials, deploy keys, telemetry settings, and host-specific secrets. A child agent that does not need those values should not receive them.
A safer design is explicit. Provider credentials should be stored in a known configuration surface, resolved by OpenClaw, and made available only to the model call that needs them. For custom providers, document the variable name, model route, and child-runtime behavior so future operators do not rediscover the same split.
How OpenClaw Setup handles the operational side
In OpenClaw Setup, environment variables and provider credentials are managed from the dashboard instead of being scattered across shell profiles, systemd drop-ins, and local process managers. That does not remove the need for good provider boundaries, but it reduces the most common operator mistake: changing a variable in one place while the real runtime keeps using another.
The managed path is most useful when you rely on subagents, scheduled jobs, and provider routing together. Those features multiply runtime boundaries. A dashboard-controlled restart flow, visible runtime config, and isolated instance state make it easier to prove which configuration is actually live.
Fix once. Stop recurring subagent provider auth failures.
If this keeps coming back, you can move your existing setup to managed OpenClaw cloud hosting instead of rebuilding the same stack. Import your current instance, keep your context, and move onto a runtime with lower ops overhead.
- Import flow in ~1 minute
- Keep your current instance context
- Run with managed security and reliability defaults
If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.
Verification checklist
- Direct provider inference returns a real answer.
- Built-in chat returns a real answer on the same provider route.
- A spawned subagent returns a real answer on the same provider route.
- Logs no longer show empty credential references, missing API key, or unauthorized provider errors.
- The key is stored in the intended runtime configuration surface, not only in an interactive shell.
- No broad child-runtime environment forwarding was introduced as a workaround.
Typical mistakes
- Testing only the CLI. The CLI can pass while subagents still fail.
- Editing
.bashrcfor a systemd service. Login shell files do not update an already-running service. - Restarting without recreating containers. Env-file changes may not reach existing containers.
- Forwarding every secret to every child. That fixes one symptom by expanding the blast radius.
- Using custom variable names without documentation. Future operators will not know which runtime needs which key.
FAQ
Can I just put the API key directly in provider config?
Avoid plaintext provider keys in committed or shared config. Use a managed secret, an approved environment variable path, or the dashboard credential store for your hosting setup.
Is this only an Alibaba or Qwen issue?
No. The reported issue used an Alibaba Cloud route, but the pattern can affect any provider that depends on environment-variable-based credentials and child runtime delegation.
Does light context change provider credential inheritance?
No. Context size and credential visibility are separate problems. If the child runtime cannot see a provider key, reducing prompt context will not fix authentication.
What should I report upstream?
Include install method, provider config shape, variable name, direct CLI result, spawned subagent result, and the first auth error. Do not paste real API keys.