ð Table of Contents
- Part 1: Understanding the Break Points
- Chapter 1: The 5 Most Common Failures
- Chapter 2: Why Updates Break Things
- Chapter 3: Anatomy of a Breakdown
- Part 2: Prevention (90% of Uptime)
- Chapter 4: Pre-Flight Checklist
- Chapter 5: Config Backup Strategy
- Chapter 6: Safe Update Procedure
- Chapter 7: Health Check Automation
- Part 3: Recovery (When Prevention Fails)
- Chapter 8: 5-Minute Diagnostic Flow
- Chapter 9: Config Recovery Playbook
- Chapter 10: Tool Profile Restoration
- Chapter 11: Gateway Restart Protocols
- Part 4: Long-Term Stability
- Chapter 12: Monitoring Dashboard
- Chapter 13: Weekly Maintenance Routine
- Chapter 14: Building Your Runbook
- Appendices
- Appendix A: Quick Reference Commands
- Appendix B: Config File Locations
- Appendix C: Troubleshooting Decision Trees
- Appendix D: Backup Scripts
Part 1: Understanding the Break Points
"Know your enemy and know yourself, and you can fight a hundred battles with no danger of defeat." â Sun Tzu
Chapter 1: The 5 Most Common Failures
Failure #1: Tool Profile Reset (40% of breakdowns)
What happens: OpenClaw updates overwrite openclaw.json, resetting your tool profile from "full" to "messaging-only".
Symptoms:
- â Browser tool stops working
- â Agents can't execute shell commands
- â File operations fail with "permission denied"
- â Error: "Tool 'browser' not available in current profile"
Root cause: Update process writes default config, which uses conservative "messaging-only" profile for safety.
How to detect:
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
# Should show: "profile": "full"
# If shows: "profile": "messaging-only" â BROKEN
Fix time: 5-15 minutes | Frequency: Every 2-4 updates
Failure #2: JSON Syntax Errors (20% of breakdowns)
What happens: Manual config edits introduce syntax errors (missing commas, braces, quotes).
Symptoms:
- â Gateway won't start
- â Error: "Unexpected token } in JSON at position 1234"
- â Config file rejected on restart
How to detect:
python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null
# If error â syntax is broken
Fix time: 10-30 minutes | Frequency: 1-2 times per month
Failure #3: Token/Authentication Mismatch (15% of breakdowns)
Symptoms:
- â sessions_spawn fails: "device token mismatch"
- â Agents can't connect to gateway
- â Discord/Telegram bots stop responding
Root cause: Multiple sources of truth for tokens (openclaw.json, systemd env vars, client configs).
Fix time: 30-60 minutes | Frequency: Rare but catastrophic
Failure #4: Channel/Integration Config Loss (10% of breakdowns)
Symptoms:
- â Can't send messages to Discord/Telegram
- â Error: "Channel not found: 123456789"
- â Notifications stop working
Fix time: 20-40 minutes (restore from backup)
Failure #5: Resource Exhaustion (15% of breakdowns)
Symptoms:
- â Gateway crashes with OOM (Out of Memory)
- â Error: "No space left on device"
- â System becomes unresponsive
How to detect:
# Check disk
df -h ~/.openclaw
# Check RAM
free -h
# Check processes
ps aux | grep openclaw | wc -l
Chapter 2: Why Updates Break Things
The Update Process:
openclaw gateway update- Downloads new version from GitHub/npm
- Stops current gateway process
- Replaces binary/node_modules
- â ïļ MAY overwrite config files (depends on version)
- Restarts gateway
ðŊ Update Risk Matrix
| Update Type | Risk Level | Config Safe? | Action Required |
|---|---|---|---|
| Patch (1.2.3 â 1.2.4) | Low | â Usually | Verify tool profile |
| Minor (1.2 â 1.3) | Medium | â ïļ Check after | Full verification |
| Major (1.x â 2.0) | High | â Likely broken | Backup + migration |
| Nightly/Dev | Very High | â Not safe | Test environment only |
Chapter 3: Anatomy of a Breakdown
Timeline of a Typical Failure:
06:00 â Gateway running fine 06:15 â Auto-update runs (cron job) 06:16 â Config overwritten (tool profile reset) 06:17 â Gateway restarts with broken config 06:17 â Browser tool unavailable (but no one notices yet) 08:00 â You wake up, ask agent to screenshot something 08:01 â â FAILS: "Tool 'browser' not available" 08:02 â You investigate, find config broken 08:15 â Manual fix + restart 08:16 â System working again
Downtime: 2 hours (from 06:16 to 08:16)
Actual fix time: 15 minutes
Detection delay: 1 hour 45 minutes
Part 2: Prevention (90% of Uptime)
"An ounce of prevention is worth a pound of cure." â Benjamin Franklin
Chapter 4: Pre-Flight Checklist
Daily Pre-Flight (2 minutes)
# Quick health check
openclaw gateway status
# Check tool profile
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
# Verify disk space
df -h ~/.openclaw | tail -1 | awk '{print $4 " free"}'
# Check for zombie processes
ps aux | grep -c "[o]penclaw" # Should be 1-3, not 10+
Green flags: â Gateway running, â Profile: full, â Disk: 5GB+ free, â Processes: 1-3
Red flags: â Gateway stopped, â Profile: messaging-only, â Disk: <1GB free, â Processes: 10+
Chapter 5: Config Backup Strategy
The 3-2-1 Backup Rule
- 3 copies: Live config + Local backup + Remote backup
- 2 different media: Local disk + Cloud storage
- 1 offsite backup: GitHub private repo
Automated Backup Script
#!/bin/bash
# scripts/backup-openclaw-config.sh
BACKUP_DIR=~/backups/openclaw
CONFIG_FILE=~/.openclaw/config/openclaw.json
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"
cp "$CONFIG_FILE" "$BACKUP_DIR/openclaw-$TIMESTAMP.json"
# Validate backup
python3 -m json.tool "$BACKUP_DIR/openclaw-$TIMESTAMP.json" > /dev/null && echo "â
Backup valid"
# Keep only last 10 backups
cd "$BACKUP_DIR"
ls -t openclaw-*.json | tail -n +11 | xargs -r rm
Cron Job (Weekly Auto-Backup)
# Edit crontab: crontab -e
# Add this line (every Sunday at 3 AM):
0 3 * * 0 /home/YOUR_USERNAME/.openclaw/workspace/scripts/backup-openclaw-config.sh
Chapter 6: Safe Update Procedure
ðĄïļ The Safe Update Protocol
- Pre-Update Backup (3 min) â Run backup script, note current profile
- Run Update (2 min) â
openclaw gateway update - Immediate Verification (5 min) â Check gateway status, verify tool profile
- Rollback if Needed (5 min) â Restore from backup
Step-by-Step Commands
# 1. Pre-update backup
./scripts/backup-openclaw-config.sh
echo "Profile before: $(cat ~/.openclaw/config/openclaw.json | grep '"profile"')"
# 2. Run update
openclaw gateway update
# 3. Verify (CRITICAL!)
sleep 10
openclaw gateway status
PROFILE=$(cat ~/.openclaw/config/openclaw.json | grep '"profile"' | cut -d'"' -f4)
if [ "$PROFILE" != "full" ]; then
echo "â Profile reset to '$PROFILE'! Restoring..."
cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json
openclaw gateway restart
fi
Chapter 7: Health Check Automation
Automated Health Check Script
#!/bin/bash
# scripts/openclaw-health-check.sh
LOG_FILE=~/backups/openclaw/health-check.log
ALERT_FILE=~/backups/openclaw/health-alerts.log
# Check 1: Gateway Status
if openclaw gateway status | grep -q "running"; then
log "â
Gateway: running"
else
alert "Gateway is NOT running!"
fi
# Check 2: Tool Profile
PROFILE=$(cat ~/.openclaw/config/openclaw.json | grep '"profile"' | cut -d'"' -f4)
if [ "$PROFILE" = "full" ]; then
log "â
Tool profile: full"
else
alert "Tool profile is '$PROFILE' (should be 'full')"
fi
# Check 3: Disk Space
DISK_FREE=$(df -h ~/.openclaw | tail -1 | awk '{print $4}' | sed 's/G//')
if (( $(echo "$DISK_FREE > 2" | bc -l) )); then
log "â
Disk space: ${DISK_FREE}GB free"
else
alert "Low disk space: ${DISK_FREE}GB free"
fi
Cron Schedule (Every 6 Hours)
# crontab -e
# Add at 00:00, 06:00, 12:00, 18:00 daily:
0 */6 * * * /home/YOUR_USERNAME/.openclaw/workspace/scripts/openclaw-health-check.sh
Part 3: Recovery (When Prevention Fails)
"The best laid schemes o' mice an' men gang aft agley." â Robert Burns
Chapter 8: 5-Minute Diagnostic Flow
âąïļ Emergency Triage Timeline
Minute 0:30 â Check gateway status Minute 1:00 â Check tool profile Minute 2:00 â Check recent changes (updates, edits) Minute 3:00 â Check resources (disk, RAM, processes) Minute 4:00 â Identify failure pattern Minute 5:00 â Execute appropriate playbook
Diagnostic Commands
# Gateway status
openclaw gateway status
# Tool profile
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
# Recent changes
ls -lt ~/.openclaw/config/openclaw.json | head -1
# Resources
df -h ~/.openclaw
free -h
ps aux | grep -c "[o]penclaw"
Chapter 9: Config Recovery Playbook
Scenario A: JSON Syntax Error
# Validate
python3 -m json.tool ~/.openclaw/config/openclaw.json
# Restore from backup
ls -lt ~/backups/openclaw/openclaw-*.json | head -3
cp ~/backups/openclaw/openclaw-YYYYMMDD.json ~/.openclaw/config/openclaw.json
# Restart
openclaw gateway restart
Scenario B: Token Mismatch
# Check token in config
cat ~/.openclaw/config/openclaw.json | grep -A 3 '"gateway"'
# Check token in systemd
sudo systemctl cat openclaw | grep OPENCLAW_GATEWAY_TOKEN
# Sync tokens
sudo systemctl edit openclaw
# Add: Environment="OPENCLAW_GATEWAY_TOKEN=YOUR_TOKEN"
sudo systemctl daemon-reload
openclaw gateway restart
Chapter 10: Tool Profile Restoration
Fix Method 1: Manual Edit (Fast)
# 1. Open config
nano ~/.openclaw/config/openclaw.json
# 2. Find and change:
"profile": "messaging-only" â "profile": "full"
# 3. Validate
python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null
# 4. Restart
openclaw gateway restart
Fix Method 2: Restore from Backup (Safer)
cp ~/backups/openclaw/openclaw-YYYYMMDD.json ~/.openclaw/config/openclaw.json
openclaw gateway restart
Fix Method 3: gateway config.patch (Safest)
gateway action=config.patch raw='{"tools": {"profile": "full"}}'
openclaw gateway restart
Chapter 11: Gateway Restart Protocols
Normal Restart
openclaw gateway restart
sleep 10
openclaw gateway status
Force Restart
openclaw gateway stop
sleep 5
openclaw gateway start
sleep 10
openclaw gateway status
Emergency Restart
pkill -f "openclaw.*gateway"
sleep 5
openclaw gateway start
sleep 15
openclaw gateway status
Part 4: Long-Term Stability
"Stability is not an accident. It is a choice."
Chapter 12: Monitoring Dashboard
Daily Status Report Script
#!/bin/bash
# scripts/generate-status-report.sh
REPORT_FILE=~/backups/openclaw/daily-status-$(date +%Y%m%d).md
cat > "$REPORT_FILE" << EOF
# OpenClaw Daily Status Report
**Date:** $(date +%Y-%m-%d)
## Gateway Status
$(openclaw gateway status 2>&1)
## Tool Profile
$(cat ~/.openclaw/config/openclaw.json | grep '"profile"')
## System Resources
Disk: $(df -h ~/.openclaw | tail -1 | awk '{print $4 " free"}')
RAM: $(free -h | grep Mem | awk '{print $3 " used / " $2 " total"}')
Processes: $(ps aux | grep -c '[o]penclaw')
EOF
Cron (Daily at 8 AM)
0 8 * * * /home/YOUR_USERNAME/.openclaw/workspace/scripts/generate-status-report.sh
Chapter 13: Weekly Maintenance Routine
ð The Sunday Checklist (15 minutes)
- Review health check logs (3 min) â Look for patterns
- Verify backups (2 min) â Should have 4+ backups
- Clean old logs (3 min) â Delete logs >30 days
- Check for updates (2 min) â Decide: update now or wait
- Test recovery procedure (5 min) â Practice restore command
Chapter 14: Building Your Runbook
Runbook Template
# My OpenClaw Runbook
**Last updated:** DATE
**System:** ThinkPad X240 / Ubuntu 24.04 / OpenClaw v1.x
## My Common Failures
### 1. Tool Profile Reset (Happens every 2-3 updates)
**Symptoms:** Browser tool stops working
**Fix:**
```bash
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json
openclaw gateway restart
```
## My Config Locations
- Config: ~/.openclaw/config/openclaw.json
- Backups: ~/backups/openclaw/
- Logs: ~/.openclaw/logs/
## My Custom Scripts
- scripts/backup-openclaw-config.sh
- scripts/openclaw-health-check.sh
- scripts/verify-after-update.sh
Appendices
Appendix A: Quick Reference Commands
Gateway Management
openclaw gateway status
openclaw gateway start
openclaw gateway stop
openclaw gateway restart
openclaw gateway update
openclaw --version
Config Management
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
python3 -m json.tool ~/.openclaw/config/openclaw.json > /dev/null && echo "â
Valid"
cp ~/.openclaw/config/openclaw.json ~/backups/openclaw/openclaw-$(date +%Y%m%d-%H%M%S).json
cp ~/backups/openclaw/openclaw-YYYYMMDD-HHMMSS.json ~/.openclaw/config/openclaw.json
Health Checks
openclaw gateway status
cat ~/.openclaw/config/openclaw.json | grep '"profile"'
df -h ~/.openclaw
free -h
ps aux | grep -c "[o]penclaw"
pgrep -f "brave.*headless" | wc -l
Recovery Commands
pkill -f "openclaw.*gateway"
rm -rf ~/.openclaw/cache/*
rm -rf /tmp/openclaw-*
openclaw gateway stop && sleep 5 && openclaw gateway start
cp ~/backups/openclaw/openclaw-*.json ~/.openclaw/config/openclaw.json && openclaw gateway restart
Appendix B: Config File Locations
| File | Purpose | Backup Priority |
|---|---|---|
~/.openclaw/config/openclaw.json | Main gateway config | ðī Critical |
~/.openclaw/.env | Environment variables, API keys | ðī Critical |
~/.openclaw/workspace/ | Skills, scripts, workspace | ðĄ High |
~/.openclaw/logs/ | Gateway logs | ðĒ Low |
~/.openclaw/cache/ | Browser cache, temp files | ðĒ Low |
Appendix C: Troubleshooting Decision Trees
Gateway Won't Start
Gateway won't start
â
Can you run 'openclaw gateway status'?
ââ YES, shows "stopped" â Try start
â ââ openclaw gateway start
ââ NO â Continue â
Is openclaw installed?
ââ NO â Reinstall: npm install -g openclaw
ââ YES â Continue â
Check logs
ââ tail -50 ~/.openclaw/logs/gateway.log
ââ Match error to playbook
Common errors:
ââ "Port already in use" â Kill process on port 18789
ââ "Config not found" â Restore config from backup
ââ "JSON syntax error" â Validate + restore config
ââ "Token invalid" â Check .env, regenerate token
ââ "Permission denied" â Check file permissions
Tools Not Working
Tools not working (browser, exec, etc.)
â
Check tool profile
ââ cat ~/.openclaw/config/openclaw.json | grep '"profile"'
ââ If "messaging-only" â Restore to "full"
ââ If "full" â Continue â
Check gateway status
ââ openclaw gateway status
ââ If not running â Restart gateway
ââ If running â Continue â
Check logs
ââ tail -50 ~/.openclaw/logs/gateway.log
Appendix D: Backup Scripts
Full System Backup Script
#!/bin/bash
# scripts/full-backup.sh
BACKUP_DIR=~/backups/openclaw/full
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR/$TIMESTAMP"
# Backup config
cp ~/.openclaw/config/openclaw.json "$BACKUP_DIR/$TIMESTAMP/"
# Backup .env
cp ~/.openclaw/.env "$BACKUP_DIR/$TIMESTAMP/"
chmod 600 "$BACKUP_DIR/$TIMESTAMP/.env"
# Backup workspace (exclude node_modules)
rsync -av --exclude 'node_modules' ~/.openclaw/workspace/ "$BACKUP_DIR/$TIMESTAMP/workspace/"
# Compress
cd "$BACKUP_DIR"
tar -czf "$TIMESTAMP.tar.gz" "$TIMESTAMP"
rm -rf "$TIMESTAMP"
echo "â
Full backup: $BACKUP_DIR/$TIMESTAMP.tar.gz"
Quick Restore Script
#!/bin/bash
# scripts/quick-restore.sh
LATEST=$(ls -t ~/backups/openclaw/openclaw-*.json | head -1)
if [ -z "$LATEST" ]; then
echo "â No backups found!"
exit 1
fi
echo "ð Restoring from: $LATEST"
# Validate backup
python3 -m json.tool "$LATEST" > /dev/null || exit 1
# Stop gateway
openclaw gateway stop || true
sleep 3
# Restore
cp "$LATEST" ~/.openclaw/config/openclaw.json
# Restart
openclaw gateway start
sleep 10
# Verify
openclaw gateway status
echo "â
Restore complete!"