ðŸĶĪ OpenClaw Stability Guide

Never Break Your Gateway Again

Battle-tested from 7 breakdowns â€Ē 30+ days production use â€Ē ThinkPad X240

📋 Table of Contents

Part 1: Understanding the Break Points

"Know your enemy and know yourself, and you can fight a hundred battles with no danger of defeat." — Sun Tzu

Chapter 1: The 5 Most Common Failures

Failure #1: Tool Profile Reset (40% of breakdowns)

What happens: OpenClaw updates overwrite openclaw.json, resetting your tool profile from "full" to "messaging-only".

Symptoms:

Root cause: Update process writes default config, which uses conservative "messaging-only" profile for safety.

How to detect:

cat ~/.openclaw/config/openclaw.json | grep '"profile"'
# Should show: "profile": "full"
# If shows: "profile": "messaging-only" → BROKEN

Fix time: 5-15 minutes | Frequency: Every 2-4 updates

⚠ïļ This is the #1 failure. Happened twice in March 2026 alone. Always check tool profile after updates.

Failure #2: JSON Syntax Errors (20% of breakdowns)

What happens: Manual config edits introduce syntax errors (missing commas, braces, quotes).

Symptoms:

How to detect:

python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null
# If error → syntax is broken

Fix time: 10-30 minutes | Frequency: 1-2 times per month

Failure #3: Token/Authentication Mismatch (15% of breakdowns)

Symptoms:

Root cause: Multiple sources of truth for tokens (openclaw.json, systemd env vars, client configs).

Fix time: 30-60 minutes | Frequency: Rare but catastrophic

Failure #4: Channel/Integration Config Loss (10% of breakdowns)

Symptoms:

Fix time: 20-40 minutes (restore from backup)

Failure #5: Resource Exhaustion (15% of breakdowns)

Symptoms:

How to detect:

# Check disk
df -h ~/.openclaw

# Check RAM
free -h

# Check processes
ps aux | grep openclaw | wc -l

Chapter 2: Why Updates Break Things

The Update Process:

  1. openclaw gateway update
  2. Downloads new version from GitHub/npm
  3. Stops current gateway process
  4. Replaces binary/node_modules
  5. ⚠ïļ MAY overwrite config files (depends on version)
  6. Restarts gateway

ðŸŽŊ Update Risk Matrix

Update TypeRisk LevelConfig Safe?Action Required
Patch (1.2.3 → 1.2.4)Low✅ UsuallyVerify tool profile
Minor (1.2 → 1.3)Medium⚠ïļ Check afterFull verification
Major (1.x → 2.0)High❌ Likely brokenBackup + migration
Nightly/DevVery High❌ Not safeTest environment only

Chapter 3: Anatomy of a Breakdown

Timeline of a Typical Failure:

06:00 — Gateway running fine
06:15 — Auto-update runs (cron job)
06:16 — Config overwritten (tool profile reset)
06:17 — Gateway restarts with broken config
06:17 — Browser tool unavailable (but no one notices yet)
08:00 — You wake up, ask agent to screenshot something
08:01 — ❌ FAILS: "Tool 'browser' not available"
08:02 — You investigate, find config broken
08:15 — Manual fix + restart
08:16 — System working again

Downtime: 2 hours (from 06:16 to 08:16)
Actual fix time: 15 minutes
Detection delay: 1 hour 45 minutes

✅ Key Insight: Prevention + monitoring reduces detection delay from hours to minutes.

Part 2: Prevention (90% of Uptime)

"An ounce of prevention is worth a pound of cure." — Benjamin Franklin

Chapter 4: Pre-Flight Checklist

Daily Pre-Flight (2 minutes)

# Quick health check
openclaw gateway status

# Check tool profile
cat ~/.openclaw/config/openclaw.json | grep '"profile"'

# Verify disk space
df -h ~/.openclaw | tail -1 | awk '{print $4 " free"}'

# Check for zombie processes
ps aux | grep -c "[o]penclaw"  # Should be 1-3, not 10+

Green flags: ✅ Gateway running, ✅ Profile: full, ✅ Disk: 5GB+ free, ✅ Processes: 1-3

Red flags: ❌ Gateway stopped, ❌ Profile: messaging-only, ❌ Disk: <1GB free, ❌ Processes: 10+

Chapter 5: Config Backup Strategy

The 3-2-1 Backup Rule

Automated Backup Script

#!/bin/bash
# scripts/backup-openclaw-config.sh

BACKUP_DIR=~/backups/openclaw
CONFIG_FILE=~/.openclaw/config/openclaw.json
TIMESTAMP=$(date +%Y%m%d-%H%M%S)

mkdir -p "$BACKUP_DIR"
cp "$CONFIG_FILE" "$BACKUP_DIR/openclaw-$TIMESTAMP.json"

# Validate backup
python3 -m json.tool "$BACKUP_DIR/openclaw-$TIMESTAMP.json" > /dev/null && echo "✅ Backup valid"

# Keep only last 10 backups
cd "$BACKUP_DIR"
ls -t openclaw-*.json | tail -n +11 | xargs -r rm

Cron Job (Weekly Auto-Backup)

# Edit crontab: crontab -e
# Add this line (every Sunday at 3 AM):
0 3 * * 0 /home/YOUR_USERNAME/.openclaw/workspace/scripts/backup-openclaw-config.sh

Chapter 6: Safe Update Procedure

ðŸ›Ąïļ The Safe Update Protocol

  1. Pre-Update Backup (3 min) — Run backup script, note current profile
  2. Run Update (2 min) — openclaw gateway update
  3. Immediate Verification (5 min) — Check gateway status, verify tool profile
  4. Rollback if Needed (5 min) — Restore from backup

Step-by-Step Commands

# 1. Pre-update backup
./scripts/backup-openclaw-config.sh
echo "Profile before: $(cat ~/.openclaw/config/openclaw.json | grep '"profile"')"

# 2. Run update
openclaw gateway update

# 3. Verify (CRITICAL!)
sleep 10
openclaw gateway status
PROFILE=$(cat ~/.openclaw/config/openclaw.json | grep '"profile"' | cut -d'"' -f4)
if [ "$PROFILE" != "full" ]; then
    echo "❌ Profile reset to '$PROFILE'! Restoring..."
    cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json
    openclaw gateway restart
fi

Chapter 7: Health Check Automation

Automated Health Check Script

#!/bin/bash
# scripts/openclaw-health-check.sh

LOG_FILE=~/backups/openclaw/health-check.log
ALERT_FILE=~/backups/openclaw/health-alerts.log

# Check 1: Gateway Status
if openclaw gateway status | grep -q "running"; then
    log "✅ Gateway: running"
else
    alert "Gateway is NOT running!"
fi

# Check 2: Tool Profile
PROFILE=$(cat ~/.openclaw/config/openclaw.json | grep '"profile"' | cut -d'"' -f4)
if [ "$PROFILE" = "full" ]; then
    log "✅ Tool profile: full"
else
    alert "Tool profile is '$PROFILE' (should be 'full')"
fi

# Check 3: Disk Space
DISK_FREE=$(df -h ~/.openclaw | tail -1 | awk '{print $4}' | sed 's/G//')
if (( $(echo "$DISK_FREE > 2" | bc -l) )); then
    log "✅ Disk space: ${DISK_FREE}GB free"
else
    alert "Low disk space: ${DISK_FREE}GB free"
fi

Cron Schedule (Every 6 Hours)

# crontab -e
# Add at 00:00, 06:00, 12:00, 18:00 daily:
0 */6 * * * /home/YOUR_USERNAME/.openclaw/workspace/scripts/openclaw-health-check.sh

Part 3: Recovery (When Prevention Fails)

"The best laid schemes o' mice an' men gang aft agley." — Robert Burns

Chapter 8: 5-Minute Diagnostic Flow

⏱ïļ Emergency Triage Timeline

Minute 0:30 — Check gateway status
Minute 1:00 — Check tool profile
Minute 2:00 — Check recent changes (updates, edits)
Minute 3:00 — Check resources (disk, RAM, processes)
Minute 4:00 — Identify failure pattern
Minute 5:00 — Execute appropriate playbook

Diagnostic Commands

# Gateway status
openclaw gateway status

# Tool profile
cat ~/.openclaw/config/openclaw.json | grep '"profile"'

# Recent changes
ls -lt ~/.openclaw/config/openclaw.json | head -1

# Resources
df -h ~/.openclaw
free -h
ps aux | grep -c "[o]penclaw"

Chapter 9: Config Recovery Playbook

Scenario A: JSON Syntax Error

# Validate
python3 -m json.tool ~/.openclaw/config/openclaw.json

# Restore from backup
ls -lt ~/backups/openclaw/openclaw-*.json | head -3
cp ~/backups/openclaw/openclaw-YYYYMMDD.json ~/.openclaw/config/openclaw.json

# Restart
openclaw gateway restart

Scenario B: Token Mismatch

# Check token in config
cat ~/.openclaw/config/openclaw.json | grep -A 3 '"gateway"'

# Check token in systemd
sudo systemctl cat openclaw | grep OPENCLAW_GATEWAY_TOKEN

# Sync tokens
sudo systemctl edit openclaw
# Add: Environment="OPENCLAW_GATEWAY_TOKEN=YOUR_TOKEN"
sudo systemctl daemon-reload
openclaw gateway restart

Chapter 10: Tool Profile Restoration

ðŸ”ī MOST COMMON FIX — You'll use this every 2-4 updates.

Fix Method 1: Manual Edit (Fast)

# 1. Open config
nano ~/.openclaw/config/openclaw.json

# 2. Find and change:
"profile": "messaging-only"  →  "profile": "full"

# 3. Validate
python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null

# 4. Restart
openclaw gateway restart

Fix Method 2: Restore from Backup (Safer)

cp ~/backups/openclaw/openclaw-YYYYMMDD.json ~/.openclaw/config/openclaw.json
openclaw gateway restart

Fix Method 3: gateway config.patch (Safest)

gateway action=config.patch raw='{"tools": {"profile": "full"}}'
openclaw gateway restart

Chapter 11: Gateway Restart Protocols

Normal Restart

openclaw gateway restart
sleep 10
openclaw gateway status

Force Restart

openclaw gateway stop
sleep 5
openclaw gateway start
sleep 10
openclaw gateway status

Emergency Restart

pkill -f "openclaw.*gateway"
sleep 5
openclaw gateway start
sleep 15
openclaw gateway status

Part 4: Long-Term Stability

"Stability is not an accident. It is a choice."

Chapter 12: Monitoring Dashboard

Daily Status Report Script

#!/bin/bash
# scripts/generate-status-report.sh

REPORT_FILE=~/backups/openclaw/daily-status-$(date +%Y%m%d).md

cat > "$REPORT_FILE" << EOF
# OpenClaw Daily Status Report
**Date:** $(date +%Y-%m-%d)

## Gateway Status
$(openclaw gateway status 2>&1)

## Tool Profile
$(cat ~/.openclaw/config/openclaw.json | grep '"profile"')

## System Resources
Disk: $(df -h ~/.openclaw | tail -1 | awk '{print $4 " free"}')
RAM: $(free -h | grep Mem | awk '{print $3 " used / " $2 " total"}')
Processes: $(ps aux | grep -c '[o]penclaw')
EOF

Cron (Daily at 8 AM)

0 8 * * * /home/YOUR_USERNAME/.openclaw/workspace/scripts/generate-status-report.sh

Chapter 13: Weekly Maintenance Routine

📅 The Sunday Checklist (15 minutes)

  1. Review health check logs (3 min) — Look for patterns
  2. Verify backups (2 min) — Should have 4+ backups
  3. Clean old logs (3 min) — Delete logs >30 days
  4. Check for updates (2 min) — Decide: update now or wait
  5. Test recovery procedure (5 min) — Practice restore command

Chapter 14: Building Your Runbook

Runbook Template

# My OpenClaw Runbook
**Last updated:** DATE
**System:** ThinkPad X240 / Ubuntu 24.04 / OpenClaw v1.x

## My Common Failures

### 1. Tool Profile Reset (Happens every 2-3 updates)
**Symptoms:** Browser tool stops working
**Fix:** 
```bash
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json
openclaw gateway restart
```

## My Config Locations
- Config: ~/.openclaw/config/openclaw.json
- Backups: ~/backups/openclaw/
- Logs: ~/.openclaw/logs/

## My Custom Scripts
- scripts/backup-openclaw-config.sh
- scripts/openclaw-health-check.sh
- scripts/verify-after-update.sh

Appendices

Appendix A: Quick Reference Commands

Gateway Management

openclaw gateway status
openclaw gateway start
openclaw gateway stop
openclaw gateway restart
openclaw gateway update
openclaw --version

Config Management

cat ~/.openclaw/config/openclaw.json | grep '"profile"'
python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null && echo "✅ Valid"
cp ~/.openclaw/config/openclaw.json ~/backups/openclaw/openclaw-$(date +%Y%m%d-%H%M%S).json
cp ~/backups/openclaw/openclaw-YYYYMMDD-HHMMSS.json ~/.openclaw/config/openclaw.json

Health Checks

openclaw gateway status
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
df -h ~/.openclaw
free -h
ps aux | grep -c "[o]penclaw"
pgrep -f "brave.*headless" | wc -l

Recovery Commands

pkill -f "openclaw.*gateway"
rm -rf ~/.openclaw/cache/*
rm -rf /tmp/openclaw-*
openclaw gateway stop && sleep 5 && openclaw gateway start
cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json && openclaw gateway restart

Appendix B: Config File Locations

FilePurposeBackup Priority
~/.openclaw/config/openclaw.jsonMain gateway configðŸ”ī Critical
~/.openclaw/.envEnvironment variables, API keysðŸ”ī Critical
~/.openclaw/workspace/Skills, scripts, workspaceðŸŸĄ High
~/.openclaw/logs/Gateway logsðŸŸĒ Low
~/.openclaw/cache/Browser cache, temp filesðŸŸĒ Low

Appendix C: Troubleshooting Decision Trees

Gateway Won't Start

Gateway won't start
    ↓
Can you run 'openclaw gateway status'?
├─ YES, shows "stopped" → Try start
│   └─ openclaw gateway start
└─ NO → Continue ↓

Is openclaw installed?
├─ NO → Reinstall: npm install -g openclaw
└─ YES → Continue ↓

Check logs
├─ tail -50 ~/.openclaw/logs/gateway.log
└─ Match error to playbook

Common errors:
├─ "Port already in use" → Kill process on port 18789
├─ "Config not found" → Restore config from backup
├─ "JSON syntax error" → Validate + restore config
├─ "Token invalid" → Check .env, regenerate token
└─ "Permission denied" → Check file permissions

Tools Not Working

Tools not working (browser, exec, etc.)
    ↓
Check tool profile
├─ cat ~/.openclaw/config/openclaw.json | grep '"profile"'
├─ If "messaging-only" → Restore to "full"
└─ If "full" → Continue ↓

Check gateway status
├─ openclaw gateway status
├─ If not running → Restart gateway
└─ If running → Continue ↓

Check logs
└─ tail -50 ~/.openclaw/logs/gateway.log

Appendix D: Backup Scripts

Full System Backup Script

#!/bin/bash
# scripts/full-backup.sh

BACKUP_DIR=~/backups/openclaw/full
TIMESTAMP=$(date +%Y%m%d-%H%M%S)

mkdir -p "$BACKUP_DIR/$TIMESTAMP"

# Backup config
cp ~/.openclaw/config/openclaw.json "$BACKUP_DIR/$TIMESTAMP/"

# Backup .env
cp ~/.openclaw/.env "$BACKUP_DIR/$TIMESTAMP/"
chmod 600 "$BACKUP_DIR/$TIMESTAMP/.env"

# Backup workspace (exclude node_modules)
rsync -av --exclude 'node_modules' ~/.openclaw/workspace/ "$BACKUP_DIR/$TIMESTAMP/workspace/"

# Compress
cd "$BACKUP_DIR"
tar -czf "$TIMESTAMP.tar.gz" "$TIMESTAMP"
rm -rf "$TIMESTAMP"

echo "✅ Full backup: $BACKUP_DIR/$TIMESTAMP.tar.gz"

Quick Restore Script

#!/bin/bash
# scripts/quick-restore.sh

LATEST=$(ls -t ~/backups/openclaw/openclaw-*.json | head -1)

if [ -z "$LATEST" ]; then
    echo "❌ No backups found!"
    exit 1
fi

echo "🔄 Restoring from: $LATEST"

# Validate backup
python3 -m json.tool "$LATEST" > /dev/null || exit 1

# Stop gateway
openclaw gateway stop || true
sleep 3

# Restore
cp "$LATEST" ~/.openclaw/config/openclaw.json

# Restart
openclaw gateway start
sleep 10

# Verify
openclaw gateway status
echo "✅ Restore complete!"
↑