Blog

OpenClaw instance startup fails with EMFILE: complete inotify fix guide

Problem statement: you deploy a new OpenClaw instance, the pod or container crashes during startup, and logs show EMFILE errors related to file watchers. The gateway cannot start its watcher, QMD memory indexing fails, and your instance never becomes healthy.

Incident evidence
  • Documented in production: 2026-04-08 worklog entry for inotify capacity issues.
  • New instance failed with EMFILE during gateway/qmd watcher startup on dedicated OpenClaw nodes.
  • Host-level fs.inotify limits were too low for multiple instances sharing the same node.
  • Recovery required raising sysctl limits and adjusting configuration to reduce watch dependency.
  • Previously failing instance recovered after node changes and fresh pod rollout.

Why EMFILE happens during OpenClaw startup

OpenClaw uses file system watchers for several core features: gateway configuration reload, QMD memory indexing, and workspace change detection. Each watcher consumes inotify resources from the host kernel. When you run multiple instances on the same node or host, you can exceed the default inotify limits that Linux ships with.

The dangerous part is that the failure happens silently during startup. The instance crashes, the orchestrator restarts it, and it crashes again. You see CrashLoopBackOff or similar restart loops, but the root cause is a kernel resource limit, not an application bug or configuration error.

How inotify limits work

Linux kernel exposes three main inotify limits through sysctl:

  • fs.inotify.max_user_instances — Maximum number of inotify instances per user (default: 128).
  • fs.inotify.max_user_watches — Maximum number of watches per user (default: 8192 on older systems, higher on newer).
  • fs.inotify.max_queued_events — Maximum number of queued events (default: 16384).

Each OpenClaw instance can consume multiple watches for workspace files, configuration directories, and memory indexing. When multiple instances share a host, you exhaust these limits quickly.

How to recognize inotify-related EMFILE failures

  • Instance crashes during startup with EMFILE or "too many open files" in logs.
  • Logs mention watcher setup, file monitoring, or inotify around the time of failure.
  • Gateway or QMD components fail to initialize their watchers.
  • CrashLoopBackOff or repeated restart loops in Kubernetes.
  • Problem appears after adding more instances to the same node.
  • Issue resolves temporarily when other instances are stopped or moved.
  • Host is Linux with default inotify limits.

Immediate fix: raise inotify limits

Step 1: Check current limits

# Check current inotify limits
cat /proc/sys/fs/inotify/max_user_instances
cat /proc/sys/fs/inotify/max_user_watches
cat /proc/sys/fs/inotify/max_queued_events

# Or use sysctl
sysctl fs.inotify.max_user_instances
sysctl fs.inotify.max_user_watches
sysctl fs.inotify.max_queued_events

Step 2: Create a sysctl configuration file

# Create or edit OpenClaw-specific sysctl configuration
sudo tee /etc/sysctl.d/99-openclaw.conf > /dev/null <<EOF
# Raise inotify limits for OpenClaw instances
fs.inotify.max_user_instances = 512
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 65536
EOF

Step 3: Apply the new limits

# Apply the configuration immediately
sudo sysctl -p /etc/sysctl.d/99-openclaw.conf

# Verify the new values are active
sysctl fs.inotify.max_user_instances
sysctl fs.inotify.max_user_watches
sysctl fs.inotify.max_queued_events

Step 4: Restart affected instances

After raising limits, restart the affected OpenClaw instances to pick up the new capacity.

# For systemd deployments
sudo systemctl restart openclaw

# For Kubernetes, delete the pod to trigger a fresh rollout
kubectl delete pod -l app=openclaw-instance

# For Docker Compose
docker-compose restart openclaw

Configuration changes to reduce inotify dependency

Even with higher limits, you can reduce inotify usage by adjusting OpenClaw configuration to use polling or event-based indexing instead of continuous file watching.

Use hybrid reload mode for gateway

# In openclaw.json or via environment variables
{
  "gateway": {
    "reload": {
      "mode": "hybrid"
    }
  }
}

# Or via environment
export OPENCLAW_GATEWAY_RELOAD_MODE=hybrid

Hybrid mode combines periodic polling with event-driven updates, reducing reliance on continuous file watching.

Disable watch for memory search sync

# In openclaw.json, adjust memory search to use event-based indexing
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "sync": {
          "watch": false,
          "onSessionStart": true,
          "onSearch": true
        }
      }
    }
  }
}

This configuration keeps memory indexing active without continuous file watching. Indexing happens when sessions start or when searches are performed, which covers most use cases without requiring persistent watchers.

Kubernetes-specific considerations

Spread instances across nodes

In Kubernetes, use topologySpreadConstraints to distribute instances across multiple nodes instead of concentrating them on a single node. This prevents inotify exhaustion on any one node.

# Add to your OpenClaw instance deployment
spec:
  topologySpreadConstraints:
  - maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      app: openclaw-instance

Use dedicated node pools

If you run many OpenClaw instances, consider using a dedicated node pool with tuned inotify limits. This isolates OpenClaw from other workloads and lets you set limits appropriate for your instance count.

Verification commands

# Step 1: Confirm new limits are active
sysctl fs.inotify.max_user_instances
sysctl fs.inotify.max_user_watches

# Step 2: Check current inotify usage
# Count inotify instances for the openclaw user
ps -u openclaw -o pid= | xargs -I {} ls -l /proc/{}/fd 2>/dev/null | grep inotify | wc -l

# Step 3: Monitor instance startup logs for watcher messages
openclaw logs --follow | grep -i "watch|inotify|EMFILE"

# Step 4: Verify instance health after restart
openclaw status
openclaw gateway status

What to do if the issue persists

If raising limits does not help

Check whether another process on the host is consuming inotify resources. Use ls -l /proc/*/fd to find processes with many inotify handles. You may need to identify and relocate non-OpenClaw workloads.

Review logs for other errors besides EMFILE. The inotify issue may have masked a second problem. Once watchers can initialize, look for configuration errors, missing dependencies, or network issues.

If you cannot raise host limits

Use the configuration changes above to minimize inotify usage, or reduce the number of instances per host. Managed hosting environments handle these limits at the platform level.

Edge cases that complicate diagnosis

Edge case: intermittent failures

If instances sometimes start and sometimes fail, you may be at the margin of your inotify capacity. One instance starting successfully can consume enough watches to push the next attempt over the limit.

Edge case: only one instance fails

Check whether that instance has a larger workspace, more repositories configured, or additional file-intensive features. Some configurations consume more watches than others.

Edge case: failure after upgrade

Newer OpenClaw versions may add more watchers for new features. If an upgrade introduces EMFILE failures, the fix is the same: raise limits or reduce watch dependency.

Verification checklist

  • You confirmed current inotify limits before making changes.
  • You raised max_user_instances, max_user_watches, and max_queued_events.
  • You applied sysctl changes and verified new values are active.
  • You adjusted gateway.reload.mode to "hybrid".
  • You set memory search sync to use onSessionStart and onSearch instead of watch.
  • You restarted affected instances and confirmed they start successfully.
  • You added topology spread constraints if running on Kubernetes.

Common mistakes that make this harder to resolve

  1. Mistake: only raising one limit instead of all three.
    Correction: OpenClaw needs instances, watches, and queued events. Raise all three together.
  2. Mistake: not applying sysctl changes with sysctl -p.
    Correction: creating the file is not enough; you must apply it or reboot.
  3. Mistake: restarting instances before applying sysctl changes.
    Correction: apply the new limits first, then restart instances.
  4. Mistake: concentrating all instances on one node in Kubernetes.
    Correction: use topology spread constraints to distribute instances.

When to consider managed hosting

Inotify limits are one example of host-level resources that require operational attention. Repeated resource exhaustion, manual sysctl tuning, and capacity planning are the hidden costs of self-hosting. If your team keeps paying this tax, compare the real effort against setups where these limits are handled at the platform level.

Need OpenClaw without managing kernel limits?

Hosted OpenClaw handles inotify limits, resource allocation, and instance distribution automatically. Your instances start reliably without touching sysctl or managing node capacity.

Explore managed OpenClaw hosting

FAQ

Do I need to restart after changing sysctl settings?

No. Running sysctl -p /etc/sysctl.d/99-openclaw.conf applies changes immediately. You only need to restart OpenClaw instances to pick up the new capacity.

Will these changes affect other applications on the host?

Raising inotify limits is safe. The new values become the system-wide maximum, but other applications only consume what they need. Higher limits do not cause problems unless the host is already low on memory.

What should I link to next if I am evaluating managed vs self-hosted?

Start with the OpenClaw comparison page to understand tradeoffs, review managed hosting benefits for operations overhead reduction, and see memory troubleshooting if you are running on resource-constrained hardware.

Cookie preferences