# Claw-Stack — Full Documentation

> Complete content of all pages on claw-stack.com/en/ for AI consumption.
> Auto-generated from src/pages/en/. Do not edit manually.

URL: https://claw-stack.com
Generated: 2026-03-09

---

## https://claw-stack.com/en/

URL: https://claw-stack.com/en/
> Home — overview of the Claw-Stack research platform

CAPABILITIES

## Core Capabilities

Three systems working in concert, so your AI never misses a beat.

### Governance & Security

Policy enforcement and real-time threat interception. Every agent action governed and audited before execution.

                claw-guard.log

[CLAW-GUARD] 🛡️ Monitoring...

[AGENT-001] > "aws s3 rb s3://prod --force"

[CLAW-GUARD] 🔴 BLOCKED: destructive op.

[CLAW-GUARD] ⚠️ High Risk (Policy #902).

[CLAW-GUARD] 🔒 Admin approval required.

[CLAW-GUARD] █

### AI Memory

Persistent, searchable memory across every session. Your AI remembers what matters.

                memory-demo

### Live Intelligence

GitHub, HN, Reddit, YouTube — your AI reads the internet so you don't have to.

                  LIVE

            Explore all modules & capabilities →

---

## https://claw-stack.com/en/showcase

URL: https://claw-stack.com/en/showcase
> **#20 of 362**

**#20 of 362**

Top 6%

            bearcatctf.com/scoreboard

            score_over_time.png

              v1

              v2 ✅ Trinity

### Commander

### Librarian

### Operator

            trinity — TwistedPair

               BCCTF&#123;D0n7_g37_m3_Tw157eD&#125;

---

## https://claw-stack.com/en/agents

URL: https://claw-stack.com/en/agents
> Orange

Orange

                Claude Opus 4

              Researcher

                Claude Sonnet 4

              Coder

                Claude Sonnet 4

              Content

                Claude Sonnet 4

              Meeting

                Claude Sonnet 4

              CTF Agents

                      Commander
                      CIPHER

                        Claude Opus 4

                      Operator
                      GRUNT

                        Claude Sonnet 4

                      Librarian
                      SAGE

                        Claude Haiku 4

---

## https://claw-stack.com/en/about

URL: https://claw-stack.com/en/about

## Qiushi Wu

## Orange 🍊

---

## Documentation — Claw-Stack

URL: https://claw-stack.com/en/docs
> Complete documentation for Claw-Stack: architecture, memory system, multi-agent consensus, policy enforcement, and deployment guides.

# Documentation

Documentation for Claw-Stack — a research project exploring agent governance on top of OpenClaw.

What is Claw-Stack?

Definition, value proposition, and how it differs from raw OpenClaw.

Architecture Overview

Sidecar Pattern, layered wrapping, and system topology.

Getting Started

Install OpenClaw, configure Claw-Stack, and run your first agent.

BearcatCTF 2026 Case Study

Trinity architecture. #20 out of 362 teams. 40 of 44 challenges solved in 48 hours.

## Modules

Multi-Agent Consensus Protocol

Structured turn-based agent discussions with rolling summaries and automatic consensus detection.

Smart Scheduler & Deadline Watch

Temporal context injection, background task tracking, and persistent event log for LLM agents.

Web Automation Operator

26 MCP tools for browser automation, debugging, and performance analysis via Chrome DevTools Protocol.

Live Intelligence Feed

Multi-source AI/tech content pipeline across 7 platforms with keyword scoring and unified output schema.

Encrypted State Archive

Automated encrypted, deduplicated backups of the agent workspace using restic and rclone.

Governance & Security

Six-module security plugin: spotlighting, audit logging, least-privilege, LLM guard, HMAC comms, memory ACL.

Executive Voice Interface

WebRTC voice interface with local STT (MLX-Whisper), Edge-TTS, and Claude via OpenClaw OAuth.

---

## What is Claw-Stack? — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/what-is-claw-stack
> Claw-Stack is a personal research project that wraps OpenClaw with persistent memory, multi-agent consensus, and policy enforcement — exploring how to transform a bare execution engine into a safer, more capable agent runtime.

# What is Claw-Stack?

Claw-Stack is a personal research project that wraps OpenClaw with persistent memory, multi-agent consensus, and policy enforcement — exploring how to transform a bare execution engine into a safer, more capable agent runtime.

## The Problem with Raw OpenClaw

OpenClaw is a powerful execution engine for AI agents. It handles tool calls, context windows, and model routing well. But a raw OpenClaw installation gives you a stateless executor: the moment a session ends, memory is gone. There are no guardrails on what tools agents can call, no mechanism for agents to coordinate on high-stakes decisions, and no audit trail for what happened and why.

Running OpenClaw in production without additional infrastructure is like running a database without backups, transactions, or access control. It works — until something goes wrong.

## What Claw-Stack Adds

Claw-Stack does not fork or modify OpenClaw. It wraps it using the Sidecar Pattern, running alongside the OpenClaw process and intercepting agent actions at defined policy gates. This means OpenClaw updates are applied cleanly without merge conflicts.

| Capability | Raw OpenClaw | Claw-Stack |
| Memory across sessions | None — context resets | 3-tier persistent memory |
| Multi-agent coordination | Manual orchestration only | Consensus protocol built-in |
| Tool access control | All or nothing | Per-agent allowlists |
| High-stakes approval gates | Not available | Human-in-the-loop workflows |
| Audit logging | None | Full action audit trail |
| Upgradeable without conflict | Yes | Yes — Sidecar pattern |

## Key Value Propositions

Memory that persists

Agents remember facts, decisions, and lessons across sessions using a three-tier system: instant MEMORY.md recall, structured per-topic memory files, and a semantic vector index for deep retrieval.

Consensus before action

For high-stakes decisions — deploys, financial operations, config changes — multiple agents debate and vote before any action is executed. No single agent can unilaterally do something irreversible.

Governed by policy, not trust

Every agent operates under a declared policy: which tools it may use, which domains it may access, and which operations require human approval. Violations are blocked and logged, not ignored.

Competition-tested

The system placed **#20 out of 362 teams (top 6%)** at BearcatCTF 2026, solving 40 of 44 challenges autonomously in 48 hours. A real-world stress test under competition conditions.

## What Claw-Stack Is Not

          - — Not a fork of OpenClaw. The upstream project is untouched.

          - — Not a cloud service. Claw-Stack is self-hosted and designed for air-gapped environments.

          - — Not a replacement for OpenClaw. It is a runtime layer that requires OpenClaw to function.

          - — Not model-specific. Claw-Stack works with any model that OpenClaw supports.

## Frequently Asked Questions

What is OpenClaw?

OpenClaw is an MIT-licensed, self-hosted gateway that connects messaging apps (WhatsApp, Telegram, Discord, iMessage, and more) to AI coding agents. It runs as a Node.js daemon on your own machine and handles model calls, tool invocations, multi-agent sessions, and channel routing. Claw-Stack is a personal research project built on top of OpenClaw, adding governance and memory layers through separate modules.

Is Claw-Stack open source?

The individual modules that make up Claw-Stack are open source and self-hosted. You run everything on your own infrastructure — there is no Claw-Stack cloud and no telemetry. See each module's repository for its specific license.

Which AI models does it support?

Claw-Stack is model-agnostic at the infrastructure level. The runtime layer is tested primarily with Claude Opus 4, Claude Sonnet 4, and Claude Haiku 4 — but any model supported by OpenClaw can be used.

---

## Getting Started — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/getting-started
> Install OpenClaw, configure workspace files, connect channels, and layer on Claw-Stack modules. A guide to the real setup.

# Getting Started

Claw-Stack is not a single package — it is a collection of modules built on top of OpenClaw. The first step is always getting OpenClaw running. Everything else layers on after that.

Private beta

Claw-Stack is an active personal research project. The modules described here are open source but not packaged as a turnkey product. For questions or collaboration, reach out at hello@claw-stack.com .

## Prerequisites

| Requirement | Notes |
| Node.js 22+ | Required by OpenClaw |
| npm | For installing OpenClaw globally |
| An API key | Anthropic, OpenAI, Gemini, or another OpenClaw-supported provider |
| macOS or Linux | Windows is not officially supported by OpenClaw |

## Step 1 — Install OpenClaw

Install OpenClaw globally via npm. OpenClaw is the Node.js gateway that Claw-Stack modules build on top of.

npm install -g openclaw
openclaw --version

For detailed installation instructions, pairing flows, and platform-specific notes, refer to the official OpenClaw documentation .

## Step 2 — Start the Gateway

The OpenClaw gateway is the central process that manages agents, channels, and tool execution. Start it with:

openclaw gateway start

On first start, OpenClaw creates `~/.openclaw/` and populates it with a default workspace. The gateway config lives at `~/.openclaw/openclaw.json`.

You can also run `openclaw onboard` for a guided wizard that walks through pairing your first channel and model.

## Step 3 — Configure the Workspace

OpenClaw automatically creates `~/.openclaw/workspace/` and loads several Markdown files into every agent's context at session start. These are the primary way to configure agent behavior:

| File | Purpose |
| MEMORY.md | Loaded at session start — current projects, active tasks, key facts. Keep under ~200 lines. |
| AGENTS.md | Behavioral rules: how to route tasks, subagent patterns, session hygiene, media rules. |
| TOOLS.md | Tool usage guide: server SSH details, CLI syntax for MCP tools, search priority, etc. |
| SOUL.md | Agent personality and communication style. |
| IDENTITY.md | Agent identity: name, role, capabilities summary. |
| memory/ | Per-topic Markdown files loaded on demand: entities/, patterns/, lessons.md, etc. |

Each file is plain Markdown — edit them directly. Per-agent workspaces (e.g. `~/.openclaw/workspace-coding/`) follow the same structure and override the defaults for that agent.

## Step 4 — Connect Channels

OpenClaw connects to messaging platforms so you can talk to your agents from anywhere. Channels are configured in `~/.openclaw/openclaw.json`. Supported channels include iMessage, Telegram, Discord, WhatsApp, and others.

Use `openclaw onboard` for the guided pairing flow, or consult the OpenClaw channel docs for manual configuration. Once a channel is paired, messages from your configured accounts will route to the gateway.

## Step 5 — Add Claw-Stack Modules

Claw-Stack modules are separate projects you clone and configure alongside your running OpenClaw gateway. Each module integrates through one or more of OpenClaw's native extension points:

MCP Servers

Modules like `agent-time-awareness` and `chrome-devtools-mcp` expose MCP tools that the agent calls via `mcporter`. Start the MCP server, then configure it in TOOLS.md so the agent knows how to use it.

OpenClaw Skills

Skills are Markdown files in `~/.openclaw/workspace/skills/` that agents read before performing specialized tasks (image processing, document generation, etc.).

Cron Tasks

The OpenClaw cron system schedules recurring agent work — daily memory sync, backups, heartbeats. Modules like `openclaw-backup` add their backup script to the cron schedule.

OpenClaw Plugins

Modules like `openclaw-security` install as OpenClaw plugins that hook into before/after tool call events via the plugin system.

Each module's README documents its specific setup steps. Start with the ones most relevant to your use case:

| Module | Doc page |
| Multi-Agent Consensus (agent-meeting) | Modules → Multi-Agent Consensus Protocol |
| Smart Scheduler (agent-time-awareness) | Modules → Smart Scheduler & Deadline Watch |
| Web Automation (chrome-devtools-mcp) | Modules → Web Automation Operator |
| Intelligence Feed (info-pipeline) | Modules → Live Intelligence Feed |
| Backup (openclaw-backup) | Modules → Encrypted State Archive |
| Security (openclaw-security) | Modules → Governance & Security |
| Voice Interface (voice-call) | Modules → Executive Voice Interface |

## Frequently Asked Questions

Do I need OpenClaw to use Claw-Stack?

Yes. Every Claw-Stack module is designed to work alongside a running OpenClaw gateway. The modules integrate through OpenClaw's extension points — MCP servers, skills, plugins, cron tasks, and workspace configuration files. None of them operate as a standalone system.

Where does OpenClaw store everything?

Everything lives under `~/.openclaw/`: the gateway config (`openclaw.json`), the main workspace (`workspace/`), per-agent workspaces (`workspace-coding/`, etc.), agent-specific state (`agents/`), logs, media, cron schedules, and credentials.

Is there a web dashboard?

OpenClaw includes a built-in control UI that you can open from the gateway. It shows chat, session status, and configuration. This is separate from any Claw-Stack modules — it is part of OpenClaw itself. Check the OpenClaw docs for how to access it.

---

## Architecture Overview — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/architecture-overview
> How Claw-Stack modules integrate with OpenClaw through skills, MCP servers, cron tasks, plugins, and workspace configuration — without modifying the upstream gateway.

# Architecture Overview

Claw-Stack is not a monolithic runtime that wraps OpenClaw. It is a collection of independent modules — each a separate project — that integrate with a running OpenClaw gateway through its native extension points, without modifying the upstream source.

## OpenClaw: The Foundation

OpenClaw runs as a Node.js daemon (the "gateway") on your host machine. It manages AI agent sessions, routes messages from connected channels (iMessage, Telegram, Discord, WhatsApp, etc.) to agents, executes tool calls, and exposes a web control UI. All persistent configuration lives in `~/.openclaw/openclaw.json`.

Claw-Stack modules extend this gateway through mechanisms OpenClaw already provides. There is no Claw-Stack runtime process — the gateway itself is the runtime.

## Directory Structure

~/.openclaw/
├── openclaw.json          # Gateway config: agents, models, tool policy, channels
├── workspace/             # Main agent workspace (loaded at session start)
│   ├── MEMORY.md          # Session-start context (keep under ~200 lines)
│   ├── AGENTS.md          # Behavioral rules, routing, session hygiene
│   ├── TOOLS.md           # Tool usage guide, MCP server syntax
│   ├── SOUL.md            # Agent personality
│   ├── IDENTITY.md        # Agent identity and role
│   ├── memory/            # Per-topic memory files (loaded on demand)
│   │   ├── entities/      # Projects, contacts, servers
│   │   ├── patterns/      # Reusable patterns and workflows
│   │   ├── lessons.md     # Extracted lessons from past sessions
│   │   └── summaries/     # Session summaries
│   ├── memory-system/     # Memory organizer scripts (Claw-Stack module)
│   └── skills/            # Agent skill files
├── workspace-coding/      # Per-agent workspace for the coding agent
├── workspace-researcher/  # Per-agent workspace for the researcher agent
├── agents/                # Agent-specific state directories
├── cron/                  # Scheduled tasks
├── subagents/             # Subagent session state
├── logs/                  # Gateway and agent logs
└── credentials/           # Encrypted credentials store

## Integration Points

Each Claw-Stack module uses one or more of these extension points:

1. MCP Servers

Modules expose their functionality as Model Context Protocol servers. The agent calls these tools through `mcporter`. Examples: `agent-time-awareness` (time context via HTTP MCP), `chrome-devtools-mcp` (browser control via npx).

Config: TOOLS.md documents the MCP tool syntax; the agent calls them natively during sessions.

2. OpenClaw Plugins (Hooks)

Plugins hook into the gateway's event lifecycle: `before_tool_call`, `after_tool_call`, `llm_input`, `llm_output`. Example: `openclaw-security` installs a JS bridge plugin that calls Python security modules on each hook event.

Config: `openclaw plugins install --link ~/projects/openclaw-security/openclaw-plugin`

3. Workspace Configuration Files

AGENTS.md, TOOLS.md, MEMORY.md, and skills files load into agent context at session start. This is the lowest-overhead integration — no running process required. Example: `info-pipeline` reports are referenced in MEMORY.md for daily briefings.

Config: edit the Markdown files directly.

4. OpenClaw Cron Tasks

The gateway has a built-in cron scheduler. Modules register shell commands to run on a schedule. Example: `openclaw-backup` schedules `backup.sh` to run daily; the memory-system organizer runs on its own cron schedule.

Config: `openclaw cron add --schedule "0 4 * * *" --command "/path/to/script.sh"`

5. Subagent Orchestration

OpenClaw has native subagent support. The main agent can spawn subagents via `sessions_spawn`, each running with their own workspace and tool permissions. `agent-meeting` uses this to coordinate multiple agents through a structured meeting loop.

Config: `agents.list[].subagents.allowAgents` in openclaw.json.

## Agent Configuration

Agents are defined in `~/.openclaw/openclaw.json` under `agents.list`. Each entry specifies the model, workspace path, identity, and which subagents it may spawn:

// ~/.openclaw/openclaw.json (excerpt)

  "agents":
    "defaults":
      "model": "primary": "anthropic/claude-opus-4-6""},
      "workspace": "/Users/you/.openclaw/workspace",
      "subagents": "maxConcurrent": 8, "maxSpawnDepth": 2"}
    "list": [

        "id": "main",
        "subagents": "allowAgents": ["*"]"}

        "id": "coding",
        "workspace": "/Users/you/.openclaw/workspace-coding",
        "model": "anthropic/claude-sonnet-4-6"

        "id": "meeting",
        "workspace": "/Users/you/.openclaw/workspace-meeting",
        "tools": "allow": ["read", "memory_search", "memory_get", "session_status"]"}

## Why Not Modify OpenClaw Directly?

Forking or patching OpenClaw creates a maintenance burden that grows with every upstream release. Using OpenClaw's native extension points means the gateway updates cleanly via `npm install -g openclaw` without merge conflicts. Each Claw-Stack module evolves independently — adding new capabilities without touching the agent execution engine.

## Frequently Asked Questions

Is there a separate Claw-Stack runtime process?

No. The OpenClaw gateway is the only persistent process. Claw-Stack modules either run as MCP servers (when they need to be always-on) or as on-demand scripts invoked by cron or the agent. There is no Claw-Stack daemon, no IPC socket, and no policy gate process.

Can individual modules be used without the others?

Yes. Each module is an independent project with its own README and setup steps. You can run just `chrome-devtools-mcp` without `openclaw-backup`, or just the memory system without the voice interface.

What happens if an MCP server module crashes?

The agent receives a tool error from the MCP call and can handle it like any other tool failure. The OpenClaw gateway itself continues running. Other modules and agents are unaffected. Restart the crashed MCP server and the tool becomes available again in the next session.

---

## Persistent Memory System — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/persistent-memory
> How OpenClaw agents maintain persistent memory across sessions using MEMORY.md, per-topic memory files, and the memory-system organizer with SQLite, FTS5, and QMD vector search.

# Persistent Memory System

OpenClaw agents accumulate knowledge across sessions through a layered memory system: a compact index loaded at session start, per-topic Markdown files read on demand, and an organizer pipeline that extracts facts from raw session files, deduplicates them, and indexes them for semantic search.

## Three Design Inspirations

The memory-system organizer (`~/.openclaw/workspace/memory-system/`) draws on three research paradigms:

Mem0 — Fact Extraction

After sessions, the organizer uses an LLM (Claude Haiku via OpenClaw OAuth) to extract discrete facts from raw memory files — preferences, project state, lessons learned, active tasks. Extracted facts are structured, categorized, and stored in SQLite with FTS5.

Zep — Temporal Decay

Each fact carries a TTL. Personal facts and system state are permanent; task-related facts expire after 7 days, agent-related facts after 30. A decay engine periodically checks expired entries, uses an LLM to summarize them into lessons, and archives them — never deletes them.

MemGPT — Agent Self-Management

Agents write new memories directly via `python -m src.tools write`. A separate lessons extractor scans session JSONL files for "failure → retry → success" patterns, runs LLM extraction, and appends structured lessons to `memory/lessons.md`.

## Memory Layers

### Layer 1 — MEMORY.md (Session-Start Context)

Loaded into every agent's context window at the start of each session. Intentionally kept short (under ~200 lines) to avoid consuming the context budget on sessions with large task descriptions. Contains only the most current, high-priority facts: recent activity, active projects, key contacts, infrastructure notes.

# MEMORY.md — long-term memory index
# (agent loads this at session start)

## Recent Activity
- [2026-03-05] claw-stack website: major refactor, 71 pages
- [2026-03-04] AI Town: WS + interpolation + tile atlas fixes

## Active Projects
| Project  | Path              | Status  | Memory file         |
| claw-stack | ~/projects/claw-stack/ | active | memory/website-claw-stack.md |

## Key Facts
- Owner timezone: EST
- Host: macOS, Apple Silicon, local network

### Layer 2 — memory/*.md (On-Demand Knowledge)

The `memory/` directory holds per-topic Markdown files that agents read when they need deep context on a specific subject. MEMORY.md stores references (links) to these files; agents fetch the full file when needed.

| Directory / File | Contents |
| memory/entities/ | Projects, contacts, servers — one file per entity |
| memory/patterns/ | Reusable patterns and workflows discovered over time |
| memory/lessons.md | Extracted lessons from past failures and retries |
| memory/summaries/ | Session summaries (written at end of sessions) |
| memory/events/ | Time-stamped event records |
| memory/structured/ | Facts exported per category by the organizer pipeline (used by QMD for vector indexing) |

### Layer 3 — QMD Semantic Search (Deep Retrieval)

OpenClaw's built-in `memory_search` tool (QMD) runs a 768-dimensional embedding model (`embeddinggemma`) over the `memory/structured/` files. The index auto-updates every 10 minutes. Queries go through `qwen3-reranker` for semantic re-ranking and query expansion. This handles questions like "what did we decide about X three months ago" without the agent scanning every file manually.

## The Organizer Pipeline

The memory-system organizer runs on a schedule (cron) and processes raw memory files through a pipeline:

[Organizer — runs on cron schedule]
  → scan memory/*.md (MD5 hash check, skip unchanged)
  → LLM fact extraction (Haiku via OpenClaw OAuth)
  → deduplication (word overlap > 60% → merge)
  → classification: personal / tasks / agents / system
  → TTL assignment: personal=∞, tasks=7d, agents=30d, system=∞
  → store in SQLite + FTS5 (memories.db, WAL mode)
  → export structured/*.md for QMD vector indexing
  → update INDEX.md (compact index ~2KB)

[Decay Engine — runs separately]
  → scan SQLite for expired TTLs
  → LLM summarize expired entries into lessons
  → archive (removed from index, still searchable)

[Lessons Extractor — runs separately]
  → scan session JSONL for error → retry → success patterns
  → LLM extraction: scenario / wrong approach / correct approach / reason
  → dedup (word overlap > 70%, then LLM DUPLICATE/NOVEL check)
  → append to memory/lessons.md

## Memory Search Script

A standalone `memory-search.py` script provides intent-aware search across agent memory:

# Search coding agent's memory for tmux content
uv run python memory-search.py --query "tmux claude code" --agent coding

# Search all agents, return top 3
uv run python memory-search.py --query "deployment patterns" --all-agents --top 3

# Smart mode: use Claude API for intent analysis
uv run python memory-search.py --query "how to deploy" --smart --verbose

# Search only lessons
uv run python memory-search.py --query "mistakes" --category lessons --agent main

## Frequently Asked Questions

Can memory files be edited manually?

Yes. All memory files are plain Markdown on disk. You can edit, delete, or reorganize them directly. The organizer uses MD5 hashing to detect changed files and will re-process them on the next run. Changes to MEMORY.md take effect in the next session immediately — no pipeline run needed.

Is memory shared between agents?

Each agent has its own workspace and memory directory. There is also a `~/.openclaw/shared/memory/` directory that all agents can read. Agents write to their own memory; the orchestrator (main agent) coordinates shared memory updates.

What is the token cost of loading MEMORY.md at session start?

MEMORY.md is kept under ~200 lines by design. The actual token cost varies with content density but keeping it to an index-only file (with links to the detailed entity files) minimizes context overhead while still giving the agent immediate access to current state.

---

## Policy Enforcement — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/policy-enforcement
> How OpenClaw governs agent tool access through tool allow/deny lists, sandbox modes, and the elevated exec escape hatch — all configured in openclaw.json.

# Policy Enforcement

OpenClaw governs what agents can do through three distinct control layers: sandbox (where tools run), tool policy (which tools are callable), and elevated (an exec-only escape hatch). All configuration lives in `~/.openclaw/openclaw.json`.

## The Three Control Layers

1. Sandbox — where tools run

Controls whether tool execution happens in a Docker container or directly on the host. Configured via `agents.defaults.sandbox.mode`:

| "off" | All tools run on the host (default for personal setups) |
| "non-main" | Only non-main sessions (groups, channels) are sandboxed |
| "all" | All sessions run in Docker containers |

2. Tool Policy — which tools are callable

Defines which built-in OpenClaw tools an agent can call. Configured with `tools.allow` and `tools.deny` at the global level, or `agents.list[].tools.allow/deny` per-agent. **deny always wins.** If allow is non-empty, everything else is blocked.

3. Elevated — exec-only escape hatch

When an agent is sandboxed, `elevated` lets specific exec calls run on the host instead. It does _not_ grant extra tools and does _not_ override tool allow/deny — it only affects where exec runs. Gated by `tools.elevated.enabled` and `tools.elevated.allowFrom`.

## Tool Groups

Allow/deny lists accept `group:*` shorthands that expand to multiple tools:

| Group | Expands to |
| group:runtime | exec, bash, process |
| group:fs | read, write, edit, apply_patch |
| group:sessions | sessions_list, sessions_history, sessions_send, sessions_spawn, session_status |
| group:memory | memory_search, memory_get |
| group:ui | browser, canvas |
| group:automation | cron, gateway |
| group:messaging | message |
| group:openclaw | All built-in OpenClaw tools (excludes provider plugins) |

## Per-Agent Configuration

Each entry in `agents.list` can override global tool policy. Here is a real example — the `meeting` agent has a restricted tool allowlist so it can only read files and query memory:

// ~/.openclaw/openclaw.json (excerpt)

  "agents":
    "list": [

        "id": "meeting",
        "workspace": "/Users/you/.openclaw/workspace-meeting",
        "model": "anthropic/claude-sonnet-4-6",
        "tools":
          "allow": ["read", "memory_search", "memory_get", "session_status"]

        "id": "redteam",
        "workspace": "/Users/you/.openclaw/workspace-redteam",
        "model": "anthropic/claude-opus-4-6",
        "subagents":
          "allowAgents": ["operator", "librarian"]

## Agents and Their Real Access Patterns

Behavioral constraints on _how_ an agent acts are defined in the workspace AGENTS.md file — separate from the technical tool policy in openclaw.json. Together they form the practical access model:

| Agent | Model | Constraints (AGENTS.md) |
| main (Orange) | Opus | Can spawn any subagent; financial ops require human confirmation |
| coding | Sonnet | Full shell in project dirs; blocked command patterns (curl to external, rm -rf /) |
| researcher | Sonnet | Web search, web fetch, file read/write in workspace; can spawn coding + content |
| redteam (CIPHER Commander) | Opus | CTF only; can spawn operator + librarian; no external system attacks |
| operator (GRUNT) | Sonnet | Shell exec for CTF tasks; no subagents |
| meeting | Sonnet | tools.allow: [read, memory_search, memory_get, session_status] only |

## Security Audit

OpenClaw ships a built-in security audit command that flags common misconfigurations:

openclaw security audit
openclaw security audit --deep
openclaw security audit --fix
openclaw security audit --json

It checks: gateway auth exposure, browser control exposure, elevated allowlists, and filesystem permissions. Run it after any config change.

## Debugging Policy

To see exactly what sandbox and tool policy is active for a given session:

# Check effective policy for the default agent
openclaw sandbox explain

# Check for a specific agent
openclaw sandbox explain --agent coding

# Machine-readable output
openclaw sandbox explain --json

The output shows the effective sandbox mode, whether the session is sandboxed, effective tool allow/deny (with the source: agent config, global config, or default), and elevated gates.

## Frequently Asked Questions

What is the difference between AGENTS.md and tool policy in openclaw.json?

AGENTS.md is a Markdown file loaded into the agent's context window — it describes behavioral rules the agent is expected to follow (routing, communication style, what requires human confirmation). Tool policy in openclaw.json is a hard technical constraint enforced by the gateway — the agent cannot call a denied tool regardless of what AGENTS.md says.

Does deny always take precedence over allow?

Yes. Per the OpenClaw docs: "deny always wins." If a tool appears in both allow and deny, it is denied. If allow is non-empty, any tool not in the allow list is implicitly denied.

What security model does OpenClaw assume?

OpenClaw is designed for a personal assistant model: one trusted operator per gateway. It is not a multi-tenant security boundary for adversarial users. If multiple untrusted users can message one tool-enabled agent, they effectively share the same delegated tool authority. For separate trust boundaries, run separate gateway instances.

---

## BearcatCTF 2026 Case Study — Claw-Stack Docs

URL: https://claw-stack.com/en/docs/bearcat-ctf-case-study
> How Claw-Stack's Trinity architecture — Commander (CIPHER), Operator (GRUNT), and Librarian (SAGE) — placed #20 out of 362 teams at BearcatCTF 2026, solving 40 of 44 challenges autonomously in 48 hours.

# BearcatCTF 2026 Case Study

At BearcatCTF 2026, Claw-Stack's Trinity architecture competed autonomously — no human solved any challenge. The system placed **#20 out of 362 teams (top 6%)**, solving **40 of 44 challenges** in 48 hours.

Final rank

Total teams

Challenges solved

48h

Competition window

## The Trinity Architecture

The CTF system used a specialized three-agent configuration called the Trinity. Each agent has a distinct role, model, and permission boundary. They coordinate through a shared blackboard — a persistent key-value store that tracks challenge state, discovered credentials, and failed approaches.

Commander

              `CIPHER`
              Claude Opus 4

The strategic brain. CIPHER does full lifecycle management of each challenge: reading the challenge description, decomposing it into sub-tasks, maintaining the blackboard, spawning Operator instances for execution, and consulting Librarian for knowledge gaps. CIPHER never executes system commands directly.

              ✓Spawn Operator/Librarian instances
              ✓Full blackboard read/write
              ✗Cannot read flag files directly
              ✗Cannot access systems outside CTF scope

Operator

              `GRUNT`
              Claude Sonnet 4

The tactical executor. GRUNT receives a specific sub-task from CIPHER with full context from the blackboard, executes shell commands and exploit scripts in isolated Docker containers, reports results back as structured JSON, and handles micro-level errors (permission issues, missing dependencies) without bothering CIPHER. GRUNT's context resets between tasks — it is stateless by design.

              ✓Shell exec in CTF containers
              ✓Write exploit scripts
              ✗Cannot attack real external systems
              ✗Cannot run binaries on host macOS

Librarian

              `SAGE`
              Claude Haiku 4

The knowledge specialist. SAGE handles all research tasks so CIPHER and GRUNT can stay focused on execution. It searches the local CTFKnowledges database for relevant techniques, queries CTFTools for available tools and usage patterns, and performs web searches for CVEs and writeups when local knowledge is insufficient. It returns a maximum of 3 results to avoid context bloat.

              ✓Local knowledge base search
              ✓Web search and CVE lookup
              ✗Cannot execute system commands
              ✗Read-only access except own lessons.md

## The Blackboard

The shared blackboard was the critical innovation that prevented duplicate work and preserved state across CIPHER's long-running sessions. It tracked:

          - — Challenge state : unsolved / in-progress / solved / abandoned

          - — Discovered assets : IPs, ports, service banners, credentials found

          - — Failed attempts : approaches that didn't work, to prevent repetition

          - — Flags captured : confirmed flag strings submitted to the scoreboard

          - — GRUNT task queue : pending sub-tasks with priority ordering

## Challenge Category Breakdown

| Category | Solved | Notes |
| Cryptography | 8/8 | SAGE's knowledge base contained most attack patterns |
| Misc | 7/8 | One challenge required image analysis beyond current capabilities |
| Reverse Engineering | 6/7 | One challenge involved visual pattern recognition the system lacks |
| Forensics | 7/7 | Strong performance across memory dumps, disk images, and packet captures |
| Binary Exploitation (Pwn) | 5/5 | GRUNT handled buffer overflows, ROP chains, and format strings |
| OSINT | 3/5 | Image-based reconnaissance limited by weak visual analysis capabilities |
| Web | 4/4 | GRUNT excelled at SQLi, SSRF, and JWT forgery |
| Total | 40/44 (91%) | #20 / 362 teams — top 6% |

## Lessons Learned

**Blackboard prevents repetition.** Without the failed-attempt log, GRUNT repeatedly tried the same approaches on heap challenges. Once the blackboard was implemented, dead-end approaches were not revisited.

**Stateless GRUNT scales well.** Running GRUNT as a stateless executor (context reset per task) allowed CIPHER to spawn multiple parallel GRUNT instances without context window conflicts.

**Haiku for knowledge retrieval is cost-effective.** SAGE used Claude Haiku 4, which returned answers fast and cheaply. Most knowledge retrieval does not require frontier-model reasoning — it is search and retrieval, not synthesis.

**Image analysis is the current bottleneck.** The 4 unsolved challenges (1 rev, 2 OSINT, 1 misc) all required visual/image analysis — recognizing patterns in images, reading text from screenshots, or interpreting visual clues. This is a known weakness of current LLM-based agent systems.

## Frequently Asked Questions

Did any human solve challenges during the competition?

No. The system ran fully autonomously for the entire 48-hour window. The human operator monitored the dashboard but did not intervene in any challenge. All 40 flags were captured and submitted by the Trinity system without human assistance.

What were the 4 unsolved challenges?

One reverse engineering challenge, two OSINT challenges, and one misc challenge. All four required image or visual analysis — recognizing patterns, reading text from images, or interpreting visual clues — which is a known limitation of current LLM-based agent systems.

---

## Multi-Agent Consensus Protocol — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/multi-agent-consensus
> Design doc for agent-meeting: how the Mediator Pattern, rolling summaries, and deterministic stance detection enable structured multi-agent consensus built on OpenClaw.

# Multi-Agent Consensus Protocol

Coordinates multiple OpenClaw agents through structured turn-based discussions using the Mediator Pattern. Each agent reasons independently, embeds a stance marker in its response, and the coordinator automatically detects consensus and writes meeting minutes.

← Module Overview

## What

`agent-meeting` is a multi-agent meeting system. A coordinator sends each agent the current rolling summary and a question; agents reply with their reasoning and a structured stance marker. After each round the coordinator extracts stances, checks for consensus, compresses the conversation history, and either advances to the next round or writes a final Markdown minutes file.

## Why

Multi-agent collaboration without structure has two failure modes: token explosion and ambiguous outcomes. Without a coordinator, agents send messages to each other in N×N patterns — every agent talks to every other agent, and the conversation history grows quadratically. Without a formal consensus mechanism, there is no reliable way to know when the group has actually agreed on something.

Prompt-based multi-agent frameworks typically ask agents to "discuss" a topic and then summarize the conversation. This works for simple cases but breaks down when rounds are long, agents disagree, or you need a verifiable audit trail. This module formalizes the process: a fixed protocol, deterministic consensus detection, and a structured minutes file that records exactly what was said and decided.

## Architecture

| Component | Responsibility |
| Coordinator | Meeting orchestration loop — drives rounds, manages state, routes all messages |
| Session Manager | Spawns and sends messages to OpenClaw agent sessions |
| Summarizer | Compresses round history to ~500 tokens after each round |
| Consensus Detector | Stance extraction via regex; FULL / MAJORITY / NO consensus logic |
| Minutes Writer | Generates final Markdown minutes with full transcript |
| Timeout Handler | Per-agent 60s timeout; overall meeting 10-minute limit |

The round lifecycle:

Coordinator
  → send each agent: rolling summary + question
  → collect responses with [STANCE: AGREE/DISAGREE/NEUTRAL]
  → extract stances (regex, deterministic)
  → check consensus
  → if consensus reached or max rounds → write minutes
  → else → compress history to ~500 tokens → next round

Consensus rules:

| Result | Condition |
| FULL_CONSENSUS | All agents AGREE |
| MAJORITY_CONSENSUS | &ge; 2/3 of agents AGREE, zero DISAGREE |
| NO_CONSENSUS | Neither condition met — continue to next round |

## Key Design Decisions

Mediator Pattern — no N×N chatter

All messages route through the coordinator. Agents never communicate directly. This keeps the message graph linear (N messages per round, not N×N) and gives the coordinator full control over the conversation state.

Rolling summaries — O(1) tokens per round

After each round, the full transcript is compressed to ~500 tokens. Each subsequent round only sees the rolling summary plus the new round's responses — not the entire history. Token usage stays roughly constant regardless of round count, making long meetings tractable.

Stance markers — deterministic, no extra LLM call

Agents embed `[STANCE: AGREE | DISAGREE | NEUTRAL]` in their responses. Regex parsing extracts the stance without an additional LLM call for interpretation. An unknown or missing stance is treated as NEUTRAL — it won't block a majority consensus.

Early exit on consensus

The meeting ends as soon as full or majority consensus is reached, even before max rounds. This avoids pointless additional rounds when agents have already agreed.

## How to Build Your Own

1. Inject stance instructions into agent system prompts

Every participating agent needs a system prompt that instructs it to include a stance marker. The format must be consistent and parseable by your consensus detector. Use a fixed format like `[STANCE: AGREE/DISAGREE/NEUTRAL]` — don't let agents choose their own format.

2. Use regex for stance extraction, not LLM parsing

Calling an LLM to interpret whether an agent agreed adds cost and latency, and creates a failure mode where the parser LLM misreads the original. A simple regex like `/\[STANCE:\s*(AGREE|DISAGREE|NEUTRAL)\]/i` is deterministic and fast.

3. Compress after every round, not at the end

Rolling compression must happen incrementally. Compressing only at the end defeats the purpose — the context window is already exhausted. Use your LLM to summarize the current round's responses plus the previous rolling summary into a new ~500-token summary.

4. Set both per-agent and overall timeouts

A single slow or stuck agent should not block the entire meeting indefinitely. Per-agent timeouts (e.g. 60s) trigger a NEUTRAL stance for that agent; an overall meeting limit (e.g. 10 minutes) ensures the coordinator eventually terminates regardless.

5. Write minutes to disk, not just to memory

The output of a multi-agent meeting is only as useful as its record. Write a structured Markdown file with: topic, participants, timestamps, per-round transcripts with parsed stances, rolling summaries, and the final consensus result. This becomes the audit trail for the decision.

## Frequently Asked Questions

Does this work with any OpenClaw agents I already have configured?

Yes. You pass the agent IDs that exist in your OpenClaw workspace. The session manager spawns them through the OpenClaw gateway rather than creating new agent configs.

What happens if an agent doesn't include a stance marker?

The stance is recorded as `UNKNOWN`. The coordinator treats UNKNOWN the same as NEUTRAL for consensus calculation — it won't block a majority consensus if all other agents agree.

How does rolling summary compression work?

After each round, the summarizer sends the full transcript to an LLM and requests a compressed summary of approximately 500 tokens. This summary becomes the context for the next round, keeping token usage roughly constant regardless of round count.

Can I use this without OpenClaw?

Not directly — the session manager spawns agents via the OpenClaw gateway. However, the core protocol (stance markers, rolling compression, consensus detection) is framework-agnostic and can be reimplemented against any agent runtime.

---

## Smart Scheduler & Deadline Watch — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/smart-scheduler
> Design doc for agent-time-awareness (TCS): how time context injection, task lifecycle tracking, and a persistent event log give LLM agents reliable temporal awareness.

# Smart Scheduler & Deadline Watch

Time Context Service (TCS) — a lightweight Python service that gives LLM agents accurate temporal awareness, background task tracking with timeout detection, and a persistent event log that survives context compaction.

← Module Overview

## What

TCS solves three related problems LLM agents have with time:

1. Temporal Context Injection

Generates a ready-made time context block for system prompts: current timestamp, day of week, relative timezone info, and contact quiet-hours. An agent that starts every session with this context knows exactly when it is.

2. Task Lifecycle Tracker

Register background tasks, poll them at configurable intervals, detect timeouts, and mark them complete. Prevents agents from starting duplicate work or forgetting to follow up on async operations.

3. Persistent Event Log

A SQLite-backed timeline of agent events. Because it lives outside the LLM context window, it survives context compaction. Agents can query "what happened in the last 2 hours" even after their context was trimmed.

## Why

LLMs have no real-time clock. They know the world up to a training cutoff, not the current moment. Without explicit time injection, an agent cannot correctly answer "what day is it?" or reason about deadlines and scheduling. This is easily fixed at session start — but requires a dedicated service to do it reliably.

Background task tracking is harder. An agent that kicks off a long-running process (a build, a test run, a data pipeline) needs to check back on it. Without external state, the agent either polls every turn — creating noise — or forgets entirely after context compaction. TCS keeps task state in SQLite outside the context window, so it persists across session resets.

The event log addresses the same problem for history: context compaction silently removes earlier events. A SQLite event log outside the context window provides a durable timeline that any session can query, regardless of how many times the context has been trimmed.

## Architecture

TCS runs as an MCP server, exposing all three layers as tools that agents call natively via the Model Context Protocol.

| Tool | Description |
| get_temporal_context | Current time context (text or JSON) for system prompt injection |
| start_task | Register a background task for tracking |
| poll_task | Check if a task should be polled now (based on configured interval) |
| finish_task | Mark a task completed or cancelled |
| list_tasks | List tasks, optionally filtered by status |
| check_timeouts | Scan all running tasks for timeout |
| log_event | Append an event to the persistent timeline |
| query_timeline | Query events by time range or type |
| search_events | Full-text search across event summaries |
| get_stats | Activity statistics |

The SQLite database is stored outside the agent's workspace — on a path that survives session restarts and context compaction. WAL mode enables concurrent reads alongside the server's writes. Task state and event log share the same database but separate tables.

## Key Design Decisions

External process, not in-context state

TCS runs as a separate service. State lives in SQLite, not in the agent's context. This is the fundamental design choice: context compaction cannot erase task state or event history because they're stored outside the context window entirely.

Smart polling — "should I poll now?" not raw timestamps

The `poll_task` tool returns a boolean: should the agent check on this task right now? This abstracts the interval logic away from the agent. The agent doesn't need to track last-polled timestamps itself; TCS handles it.

MCP as the interface

Exposing TCS via MCP means any MCP-compatible agent can use it with no custom integration. The agent treats TCS tools the same as any other tool in its toolbox. This also makes it easy to add new tools without changing the agent configuration.

Time context injected at session start, not on every message

The temporal context block belongs in the system prompt, not repeated in every user message. Calling `get_temporal_context` once at session start and including the result in the system prompt is sufficient — the timestamp is accurate enough for scheduling purposes.

## How to Build Your Own

1. Put temporal state outside the context window

The core insight: any state that needs to survive context compaction must live in an external store. SQLite is a good choice — lightweight, file-based, no server required. WAL mode allows concurrent reads without blocking writes.

2. Inject a time context block into every session-start system prompt

Generate a structured block at session start: current ISO timestamp, day of week, local timezone offset, and any relevant contact quiet-hours. Format it as human-readable prose, not raw JSON — the agent's reasoning about time will be more reliable.

3. Design poll_task as a gate, not a data source

The agent should call `poll_task` on every turn that might involve a tracked task, but the tool should return "yes, check now" or "not yet" — not the task status itself. This prevents the agent from polling an external system too frequently.

4. Event log schema: timestamp + type + summary + optional metadata

Keep the event log schema minimal. The summary field should be a short human-readable string that's searchable by FTS. Store structured metadata separately. Never delete events — archive or flag them instead. Agents querying "what happened" need the full history.

5. Per-task timeouts, not just global ones

Different tasks have different expected durations. Store a timeout value per task at registration time. The `check_timeouts` tool scans all running tasks and flags those that have exceeded their individual timeout — not a single global limit.

## Frequently Asked Questions

Why does an LLM agent need a separate time service?

LLMs have no real-time clock and no persistent memory between context windows. TCS solves both: it provides the current timestamp on demand, and its SQLite event log records what happened even after the context is compacted or the session restarted.

What is "smart polling" for tasks?

The `poll_task` tool returns whether enough time has elapsed since the last poll, based on the configured interval. This prevents an agent from polling a background job every message turn when it only needs to check every 30 seconds.

Does the event log survive an OpenClaw session restart?

Yes. Events are stored in a SQLite file outside any session state. As long as that file is present, historical events are queryable from any session.

---

## Web Automation Operator — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/web-automation
> Design doc for chrome-devtools-mcp: how the Chrome DevTools Protocol, accessibility tree UIDs, and Puppeteer wait logic make AI-driven browser automation reliable and debuggable.

# Web Automation Operator

An MCP server that exposes the Chrome DevTools Protocol to AI agents — enabling reliable browser automation, network inspection, console debugging, screenshot capture, and performance tracing from within any MCP-compatible client.

← Module Overview

## What

`chrome-devtools-mcp` launches as an MCP server and connects to a Chrome browser instance. It provides 26 tools across 6 categories that an AI agent can call to interact with a live browser: clicking elements, filling forms, navigating pages, reading console output, capturing performance traces, and more. Automation uses Puppeteer under the hood to wait for page state after each action.

Privacy note

This server exposes browser contents to the MCP client. Performance tools may send trace URLs to the Google CrUX API for real-user field data. Usage statistics are collected by default; both can be disabled via server flags.

## Why

Browser automation for AI agents has two common failure modes. Selenium and Playwright are designed for deterministic test scripts: you know exactly which element to click and in what order. An AI agent needs to observe, reason, and adapt — it doesn't know the DOM structure in advance and must discover it dynamically. Screenshot-only approaches address this by letting the agent "see" the page visually, but screenshots don't give precise element references and can't express intent as tool calls.

The Chrome DevTools Protocol (CDP) exposes the same primitives developers use in DevTools: the accessibility tree, JavaScript execution, network traffic, console output, and performance traces. An agent working with CDP can inspect the page structure precisely, identify elements by their accessibility roles, execute arbitrary JavaScript, and read exactly what the browser logged — all through a single MCP interface.

## Architecture

Three layers collaborate to handle a single agent tool call:

Agent
  → MCP tool call (e.g. click, fill, navigate_page)
  → chrome-devtools-mcp server
    → Puppeteer (Chrome management + wait-for-action-result)
      → Chrome DevTools Protocol (browser control)
        → Chrome instance

26 tools across 6 categories:

### Input Automation (8 tools)

| Tool | Description |
| click | Click an element by uid (single or double click) |
| drag | Drag one element onto another |
| fill | Type text into an input or select an option |
| fill_form | Fill multiple form elements in one call |
| handle_dialog | Accept or dismiss browser dialogs |
| hover | Hover over an element |
| press_key | Press a key or key combination |
| upload_file | Upload a local file via a file input element |

### Navigation (6 tools)

| Tool | Description |
| navigate_page | Navigate to a URL |
| new_page | Open a new browser tab |
| close_page | Close a tab by page ID |
| list_pages | List all open tabs |
| select_page | Switch focus to a tab by page ID |
| wait_for | Wait for a condition before proceeding |

### Debugging (5 tools)

| Tool | Description |
| take_screenshot | Capture a screenshot of the current page |
| take_snapshot | Capture the page accessibility tree snapshot (returns element UIDs) |
| evaluate_script | Execute JavaScript in the page context |
| get_console_message | Retrieve a specific console message with source-mapped stack trace |
| list_console_messages | List all console messages from the current page |

### Network (2), Performance (3), Emulation (2)

| Tool | Description |
| list_network_requests | List all network requests made by the page |
| get_network_request | Get details of a specific request including headers and body |
| performance_start_trace | Start a DevTools performance trace |
| performance_stop_trace | Stop the trace and return raw data |
| performance_analyze_insight | Extract actionable insights from a trace (optionally includes CrUX field data) |
| emulate | Emulate a device (mobile, tablet, etc.) |
| resize_page | Resize the browser viewport |

## Key Design Decisions

Accessibility tree UIDs — snapshot first, then act

Tools that interact with elements take a `uid` parameter. UIDs come from the accessibility tree returned by `take_snapshot`. The agent always calls `take_snapshot` first to get current UIDs, then passes the target UID to an action tool. This avoids brittle CSS selectors and XPath expressions that break when the DOM changes.

Puppeteer wait-for-action-result — no polling loops

After every interaction (click, fill, navigate), Puppeteer waits for the page to settle before returning control to the agent. The agent doesn't need to explicitly poll for page readiness. This eliminates a common class of timing bugs where the agent acts on a page before JavaScript has finished updating it.

CDP over Selenium/Playwright

CDP gives lower-level access than Playwright. The agent can read console messages with source-mapped stack traces, intercept network requests, execute arbitrary JavaScript, and record DevTools-level performance traces — none of which are easily accessible through Playwright's abstraction layer.

Managed Chrome vs. connecting to an existing instance

By default the server launches its own Chrome with a dedicated profile. For cases where the agent needs to maintain session state (logged-in accounts) or work alongside manual testing, it can connect to an existing Chrome instance running with remote debugging enabled.

## How to Build Your Own

1. The snapshot-then-act pattern is fundamental

Every interaction sequence starts with a fresh snapshot. UIDs are page-state-specific; a UID from a previous snapshot may be stale after navigation or a DOM mutation. Always snapshot before acting, especially after any navigation or form submission.

2. Implement wait-for-action-result for every interaction

Return from a tool call only after the page has settled — not immediately after the DOM event fires. Puppeteer's `waitForNavigation` and `waitForSelector` are the right primitives. Without this, the agent will act on a page mid-transition and get inconsistent results.

3. Use accessibility tree for elements, screenshots for visual verification

The accessibility tree gives precise element references (role, name, state, uid). Screenshots give visual context — useful for the agent to verify that a page looks correct. Use both: snapshot for acting, screenshot for confirming the visual result looks right.

4. Isolate user data when handling sensitive sites

CDP exposes the full contents of the browser session to the MCP client. If the agent is browsing authenticated pages or handling credentials, use the isolated mode (temporary user data directory cleaned up after the session) to prevent cross-contamination between tasks.

5. Combine lab traces with CrUX field data for performance analysis

A single lab trace shows one user's experience. CrUX field data shows percentile distributions across real users. The `performance_analyze_insight` tool combines both when the URL is publicly accessible, giving the agent a fuller picture of actual user experience vs. lab conditions.

## Frequently Asked Questions

Does the server need Chrome to be running before it starts?

No. By default the server launches its own managed Chrome instance via Puppeteer when a tool that requires a browser is first called. Chrome does not start on server startup — only when actually needed.

How do tools identify page elements?

Tools that interact with elements (click, fill, hover) take a `uid` parameter. UIDs come from the page snapshot returned by `take_snapshot`. The agent calls `take_snapshot`, identifies the target element by uid, then passes that uid to the action tool.

What is the CrUX API and when does it send data externally?

The Chrome User Experience Report (CrUX) API provides real-user performance field data for public URLs. It is used by `performance_analyze_insight` alongside lab trace data. It can be disabled via a server flag to prevent any URL from being sent to Google's API.

---

## Live Intelligence Feed — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/live-intelligence-feed
> Design doc for info-pipeline: how a BaseCollector pattern, unified output schema, keyword scoring, and graceful degradation enable reliable multi-source AI content aggregation.

# Live Intelligence Feed

A Python pipeline that pulls AI and tech content from 7 platforms, applies keyword filtering and relevance scoring, deduplicates results, and produces a unified JSON/Markdown report for downstream agent consumption.

← Module Overview

## What

`info-pipeline` aggregates AI and tech content from 7 heterogeneous platforms into a single scored, deduplicated feed. Each platform has a dedicated collector that normalizes its output to a common schema. A filter/scorer stage applies global keyword matching and relevance scoring (0–100) before the report is written. The output is machine-readable JSON plus a human-readable Markdown report.

## Why

Staying current with AI developments requires monitoring many heterogeneous sources simultaneously: GitHub for new repositories, Hacker News and Reddit for community discussion, YouTube for research explanations, Product Hunt for new tools, Twitter/X for early signals, and Chinese platforms for developments that English-language sources miss. Doing this manually across 7 platforms is time-consuming and inconsistent.

Existing aggregators (RSS readers, Feedly, etc.) produce human-readable feeds but not machine-readable structured data. An AI research agent needs a scored, deduplicated, normalized feed it can query and reason about — not a list of links to read. This pipeline provides that feed in a format designed for agent consumption.

## Architecture

The pipeline has four stages:

[Collectors — run in parallel]
  → GitHub Trending, Hacker News, Reddit, YouTube,
    Product Hunt, X/Twitter, Chinese platforms (via MCP)
  → each normalizes to unified schema

[Filter / Scorer]
  → keyword matching against global keyword list
  → relevance score 0–100 (keyword density + platform signal)
  → deduplication by URL, then title similarity

[Report Writer]
  → unified JSON (all items)
  → Markdown report (top N items)

Data sources:

| Platform | Language | Notes |
| GitHub Trending | EN | Topic-filtered repos by stars/forks in the last N days |
| Hacker News | EN | Top Stories, filtered by minimum score |
| Reddit | EN | Multiple AI subreddits — no API key required |
| YouTube | EN | Configured channel playlists via Data API v3 |
| Product Hunt | EN | Daily new products via GraphQL API |
| X / Twitter | EN | Keyword search — requires Basic API tier (low-volume) |
| Chinese platforms | ZH | Zhihu, 36kr, Juejin, Sspai, InfoQ, Bilibili via trends-hub MCP |

Code structure:

| Component | Responsibility |
| collectors/base.py | BaseCollector with common fetch, retry, and rate-limit logic |
| collectors/*.py | One file per platform — each extends BaseCollector |
| collectors/__init__.py | ALL_COLLECTORS registry — registers all enabled collectors |
| filters/scorer.py | Keyword filtering, relevance scoring (0–100), deduplication |
| main.py | Entry point — orchestrates collector runs and report generation |

Every item across all sources shares the same JSON schema:

  "title": "Article / project title",
  "url": "https://...",
  "source": "github",
  "score": 85,
  "published_at": "2026-02-18T10:00:00Z",
  "summary": "Short description or excerpt",
  "tags": ["llm", "open-source"]

## Key Design Decisions

Unified output schema — normalize at the source

Every collector's job is to normalize its platform's output to the common schema before it leaves the collector. The scorer and report writer deal only with the common schema — they have no platform-specific logic. This makes adding a new source a matter of implementing one new class with a single `fetch()` method.

Config-driven, not code-driven

Keywords, platform parameters, and source selection all live in a YAML config file. Tuning the feed doesn't require code changes. This matters when iterating quickly on which topics to follow or adjusting platform-specific thresholds like minimum HN score.

Graceful degradation on missing API keys

A collector with no configured API key is skipped with a warning — not an error. The pipeline still produces output from the sources that are configured. This means the pipeline can run partially (e.g. GitHub + HN only) without requiring all seven API credentials to be present.

MCP for Chinese platforms — avoid direct scraping

Chinese tech platforms (Zhihu, 36kr, Juejin) have complex anti-bot measures and no official English-language APIs. The `trends-hub` MCP service handles these sources separately; the pipeline calls it as a tool rather than implementing platform-specific scrapers that would need constant maintenance.

## How to Build Your Own

1. Define the unified schema first

Before writing any collector, define the output schema all collectors must produce. The minimum useful fields are: title, url, source, score, published_at, summary. Adding a field later requires updating every collector.

2. Implement BaseCollector with retry and rate-limit logic

Rate limiting and retry-on-error are needed by every platform collector. Put them in the base class. Each concrete collector only needs to implement how to fetch its platform's data and how to map it to the common schema — not how to handle HTTP errors or rate limits.

3. Keep scoring simple — keyword density + platform weight

A score of 0–100 based on keyword match count (normalized by text length) plus a platform-specific engagement signal (GitHub stars, HN score, Reddit upvotes) is sufficient. Don't add LLM-based scoring — it adds latency and cost without proportionate quality gains for most use cases.

4. Deduplicate by URL first, then by title similarity

The same story often appears across multiple platforms. URL deduplication catches exact duplicates. Title similarity (simple word overlap is sufficient) catches cases where the same article is linked with slightly different URLs or titles across platforms.

5. Twitter API rate limits are a real constraint

The Basic Twitter/X API tier has strict monthly request limits. Keep max_results low (10–15 per run) and run the collector less frequently than others. Build the collector to fail gracefully and not block the entire pipeline run if it hits a rate limit.

## Frequently Asked Questions

Can I run it with only some sources enabled?

Yes. Sources without configured API keys are skipped with a warning rather than failing the whole run. You can also run specific collectors directly. Reddit and Hacker News require no API keys and work out of the box.

How does the relevance score work?

The scorer checks how many of the configured global keywords appear in the title and summary. Items that match no keywords are filtered out; matches boost the score up to 100. Original platform engagement metrics (stars, HN score, upvotes) also factor in.

What is the trends-hub MCP for Chinese platforms?

The Chinese platform collector calls MCP tools from the `trends-hub` service (a separate open-source project) to fetch Zhihu trending, 36kr, Juejin, and others. That service must be running locally for the Chinese platform source to work.

---

## Encrypted State Archive — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/encrypted-state-archive
> Design doc for openclaw-backup: how restic content-addressed snapshots, rclone transport, and macOS Keychain credential storage protect the agent workspace with zero secrets on disk.

# Encrypted State Archive

Automated encrypted, deduplicated backup of the OpenClaw workspace and local projects to Google Drive — using restic for encryption and snapshot management, rclone for transport, with the encryption password stored in macOS Keychain.

← Module Overview

## What

`openclaw-backup` creates incremental, encrypted, point-in-time snapshots of the OpenClaw workspace and local project directories and stores them on Google Drive. Each backup run adds a new snapshot; unchanged blocks are deduplicated so only changed data is uploaded. A retention policy automatically prunes old snapshots. The encryption password never touches disk — it's retrieved from macOS Keychain on each run.

## Why

An agent system accumulates state that's difficult to reconstruct: memory files built up over months of sessions, project code in progress, configuration that took time to tune. Disk failure, accidental deletion, or a corrupted workspace without backups means complete loss of that state.

Standard cloud sync tools (iCloud, Dropbox) don't encrypt before upload, don't track snapshot history, and don't deduplicate. A user can't restore to "the state of my workspace three days ago" — only to the current synced state. restic solves all three problems: client-side encryption (Google Drive only ever sees ciphertext), content-addressed deduplication (fast incrementals), and immutable point-in-time snapshots with a configurable retention policy.

## Architecture

Two layers collaborate on each backup run:

restic — encryption + deduplication + snapshot management

restic creates content-addressed, encrypted snapshots. Only changed blocks are stored; unchanged blocks reference existing data. The encryption password is read from macOS Keychain on each run — never written to disk.

rclone — Google Drive transport

rclone provides the backend that restic writes to. It handles Google Drive OAuth and the file transfer layer. The OAuth token lives in a config file that is excluded from git.

~/.openclaw/  ──┐
               ├──▶ restic (encrypt + dedup) ──▶ rclone ──▶ Google Drive
~/projects/   ──┘

What gets backed up:

| Path | Approx Size | Contents |
| ~/.openclaw/ | ~2.9 GB | Config, workspace, memory files, logs, media |
| ~/projects/ | ~1.3 GB | Source code, experiments |

Exclusions (not backed up):

          - `node_modules/` — reinstallable from package.json

          - `.venv/` — recreatable Python environments

          - `browser/` — Chromium binary cache

          - `.git/objects` — already on GitHub

Retention policy:

| Period | Kept |
| Daily | Last 7 days |
| Weekly | Last 4 weeks |

Older snapshots are automatically pruned at the end of each backup run.

## Key Design Decisions

restic over tar/zip — content-addressed deduplication

restic splits files into variable-size chunks and identifies them by content hash. Only chunks that don't already exist in the repository are uploaded. This makes incremental backups fast — a session where only memory files changed uploads only those chunks, not the full 4 GB workspace.

macOS Keychain — password never on disk

The restic encryption password is stored in Keychain under a service/account key pair. The backup script retrieves it at runtime using the `security` CLI. No plaintext password ever appears in a config file, environment variable export, or shell history.

Active secret scanning in test.sh

The test script actively scans all git-tracked files for patterns that look like secrets (API keys, tokens, passwords). This runs before any backup to prevent accidentally committing credentials alongside the backup infrastructure code.

Immutable snapshots — never overwrite, always prune

Each backup run adds a new snapshot to the repository. Old snapshots are removed only by the retention policy pruning step, not by overwriting. At any point you can restore to any snapshot in the retention window — not just the most recent one.

## How to Build Your Own

1. Use restic with any rclone-supported backend

The same architecture works with any rclone backend: S3, Backblaze B2, Azure Blob, SFTP. Google Drive is used here because it's free up to 15 GB and requires no credit card. Swap the rclone remote name in the backup script to change backends.

2. Store the encryption password in a system secrets manager

On macOS, Keychain is the right choice. On Linux, use `pass` (GPG-backed) or the `RESTIC_PASSWORD` environment variable set by a secrets manager at runtime. Never store the password in a dotfile.

3. Build a comprehensive exclusion list

node_modules, Python virtualenvs, browser caches, and compiled output can easily add several gigabytes that are fully reinstallable. Exclude them. A good rule: if a directory can be recreated from committed source (package.json, requirements.txt, etc.), exclude it.

4. Test your restore process before you need it

A backup that has never been tested is not a backup. Include a dry-run restore in your testing checklist. Restore to a temporary directory and verify that key files are present and intact. The worst time to discover a broken restore process is during an actual recovery.

5. Fire a system event notification on backup failure

Silent backup failures are the most dangerous kind. The backup script should send a notification (via OpenClaw system event, email, or any other channel you'll actually see) when a backup fails. A successful backup should log silently; a failure should be loud.

## Security

          - No hardcoded secrets — all credentials use macOS Keychain or environment variables

          - restic encrypts all data before upload; Google Drive stores only ciphertext

          - OAuth tokens in rclone config are excluded from git via .gitignore

          - test.sh actively scans for secrets in all git-tracked files

## Frequently Asked Questions

Where is the restic encryption password stored?

In macOS Keychain under service name `openclaw-backup`, account `restic-password`. The setup script generates and stores it automatically. You can also override it with the `RESTIC_PASSWORD` environment variable.

Will backups overwrite each other?

No. restic creates immutable snapshots. Each backup run adds a new snapshot; old ones are only removed when the retention policy prunes them. You can restore to any historical snapshot within the retention window.

Does this work on Linux?

The Keychain integration is macOS-specific. On Linux you would need to substitute a different secret store (e.g. `pass`) or use the `RESTIC_PASSWORD` environment variable directly. The restic and rclone commands themselves are cross-platform.

---

## Governance & Security — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/governance-security
> Design doc for openclaw-security: how six priority-tiered Python modules — spotlighting, audit logging, least-privilege, LLM guard, HMAC comms, and memory ACL — harden OpenClaw agents.

# Governance & Security

A security hardening plugin for OpenClaw agents consisting of six Python modules integrated into the OpenClaw hook system via a JavaScript bridge — providing prompt-injection defense, least-privilege enforcement, audit logging, inter-agent message signing, and memory access control.

← Module Overview

## What

`openclaw-security` is a six-module security layer that hooks into OpenClaw at four lifecycle points. Each module addresses a specific AI agent threat: external data injecting instructions into the agent, agents executing unauthorized commands, inter-agent messages being forged, and memory entries being corrupted. The modules are independent — you can deploy them individually or together.

## Why

A raw OpenClaw agent with shell access and no additional controls is a significant attack surface. An agent that fetches web content is exposed to prompt injection: a malicious page can embed instructions that the agent treats as coming from its operator. An agent with unrestricted shell access can be directed to exfiltrate data, delete files, or make external network calls. Inter-agent messages have no authenticity guarantee — a compromised agent can forge messages appearing to come from trusted peers.

This module applies defenses from AI agent security research: spotlighting to isolate external data, command whitelisting to enforce least privilege, HMAC signing for inter-agent trust, and audit logging for forensics. The design is pragmatic — defenses are layered by priority rather than trying to solve every threat at once.

## Architecture

Six modules organized by implementation priority:

| Priority | Module | What it does |
| P0 | Spotlighting | Wraps external data (web, email, webhooks) with boundary tags to isolate it from system instructions |
| P0 | Audit Logger | JSONL tool-call log with file locking, redaction, and built-in alert rules |
| P1 | Least Privilege | Per-agent command blacklist and file path whitelist — hard-blocks unauthorized exec calls |
| P1 | LLM Guard | Regex/llm-guard scanning of LLM inputs and outputs for injection, secrets, and malicious URLs (warn-only) |
| P1 | Agent Comms | HMAC-SHA256 message signing for inter-agent communication — prevents message forgery |
| P2 | Memory ACL | Memory layer write-time injection detection and signed storage |

OpenClaw hook integration — Python modules are called from the JS hook system via a bridge layer:

| Hook | Python Module | Effect |
| before_tool_call | permission_checker.py | Block unauthorized exec calls (hard block) |
| after_tool_call | audit_logger.py | Write JSONL audit record + evaluate alert rules |
| llm_input | llm_guard_wrapper.py | Scan for injection / secrets (warn only) |
| llm_output | llm_guard_wrapper.py | Scan for secrets / malicious URLs (warn only) |

Fail-open design

All bridge calls fail open: if Python is missing, crashes, or times out, the call is allowed and a warning is logged. This avoids false-positive blockages but means security is best-effort, not guaranteed.

## Key Design Decisions

Priority tiers — not all defenses are equal

P0 modules (spotlighting, audit logging) have the best effort-to-value ratio: they're cheap to implement and provide immediate observability and basic injection resistance. P1 modules require more configuration. P2 is nice-to-have. Deploying only P0 already meaningfully improves the security posture.

Fail-open — availability over security

If the Python security layer crashes, the agent call proceeds. This is an explicit tradeoff: a broken security layer that blocks legitimate agent operations is worse than a temporarily absent security layer. The audit log captures what happened, enabling post-hoc forensics even when active blocking fails.

LLM Guard warns, doesn't block at llm_input/llm_output

The current OpenClaw hook API doesn't support hard blocking at llm_input/llm_output hooks. LLM Guard detections are logged as warnings. To enforce blocking on detected injection, handle it at the agent application layer — LLM Guard provides the signal, the application decides the action.

JavaScript bridge — thin layer, Python for logic

The OpenClaw hook system is JavaScript; the security logic is Python. The bridge is intentionally thin — just serialization, subprocess invocation, and error handling. All security logic stays in the Python modules, which can be tested independently of OpenClaw.

## How to Build Your Own

1. Start with spotlighting and audit logging (P0)

Spotlighting is cheap: wrap any externally-fetched content with boundary markers before it enters the agent's context. An LLM that sees `[EXTERNAL_DATA_START] ... [EXTERNAL_DATA_END]` has a structural signal that this content is untrusted. Audit logging is also cheap: write a JSONL record after every tool call with agent ID, tool name, inputs (redacted), and timestamp.

2. Command blacklist with regex, not exact match

Exact-match command blocking is trivial to bypass (add a space, use a path prefix). Use regex patterns that match the dangerous operation regardless of minor variations. Essential patterns: `rm\s+-rf\s+/`, `curl\s+.*(?!localhost)`, `wget\s+`, `mkfs\b`.

3. Build alert rules into the audit logger, not the security modules

Alert rules (fire on sensitive file access, fire on external network calls) belong in the audit logger, not scattered across individual modules. The audit logger sees all tool calls and can apply cross-cutting rules in one place. Three minimum rules: sensitive path access, external network calls, cross-agent sensitive content.

4. Use HMAC-SHA256 for inter-agent signing, symmetric keys per pair

For single-machine multi-agent deployments, HMAC with a per-pair symmetric key is sufficient and simple. Each agent pair (sender, receiver) shares a secret key stored in the key manager. The receiver verifies the HMAC before processing any inter-agent message. This prevents a compromised agent from impersonating a trusted peer.

5. Keep each module standalone and independently testable

Each security module should be importable and testable without OpenClaw running. This enables unit testing of security logic without the full agent stack. The OpenClaw integration is only a thin bridge — the modules themselves are pure Python with no OpenClaw dependency.

## Known Limitations

          - LLM Guard warns only: The llm_input / llm_output hooks don't support hard blocking in OpenClaw's current hook API. Detected threats generate warnings; blocking requires handling at the application layer.

          - before_tool_call only checks exec: The JS bridge currently calls the permission checker only for the exec tool. Other tools (shell, write_file) bypass PermissionChecker unless you extend the check list in the bridge.

          - Regex rules can be bypassed: LLM Guard's regex fallback lacks semantic understanding. Complex prompt injection variants may evade detection. For production, pair regex with model-level scanning.

## Frequently Asked Questions

What is "spotlighting" and why does it matter?

Spotlighting wraps externally-fetched data (web results, emails, webhooks) with structural tags that mark it as untrusted external content. This makes it harder for a malicious web page to embed instructions that the agent would treat as coming from the user or system prompt — a common prompt injection vector.

What alert rules are built into the audit logger?

Three built-in rules: (1) sensitive file path access (.ssh, .aws, id_rsa, /etc/passwd), (2) shell commands containing curl/wget/nc targeting non-localhost, and (3) cross-agent messages containing sensitive keywords. All trigger with "critical" or "warning" severity and are written to the JSONL audit log.

Can I run the tests without OpenClaw installed?

Yes. The Python modules are standalone and their tests run independently. The OpenClaw plugin integration is only needed for live hook testing.

---

## Executive Voice Interface — Claw-Stack Modules

URL: https://claw-stack.com/en/docs/modules/executive-voice-interface
> Design doc for voice-call: how local MLX-Whisper STT, self-hosted LiveKit WebRTC, Edge-TTS, and OpenClaw OAuth combine to create a fully local voice interface for AI agents.

# Executive Voice Interface

A voice conversation interface for OpenClaw — talk to your AI assistant over WebRTC from a browser or phone. Fully local speech-to-text (MLX-Whisper on Apple Silicon), free TTS (Edge-TTS), Claude via OpenClaw OAuth, and self-hosted LiveKit for audio routing.

← Module Overview

## What

`voice-call` provides a voice conversation loop: a browser or phone connects via WebRTC, speech is detected by VAD, transcribed locally by Whisper, sent to Claude (via OpenClaw OAuth), and the response is synthesized by Edge-TTS and streamed back. Claude can call agent tools during the conversation — reading files, running commands, searching memory, and listing active sessions. The entire STT pipeline runs on-device; no audio leaves the local machine.

## Why

Text interfaces require a screen and keyboard. Voice enables hands-free interaction while mobile, cooking, commuting, or in situations where typing is impractical. The obvious solution — OpenAI Realtime API, Gemini Live — sends audio to cloud servers, raising privacy concerns for anyone who discusses confidential projects, personal information, or unreleased work with their AI assistant.

This module keeps the STT pipeline entirely local. MLX-Whisper large-v3-mlx-4bit runs on Apple Silicon faster than real-time — a 10-second utterance transcribes in under a second. Only the transcribed text (and any tool results) travels to the Anthropic API. Spoken words never leave the machine.

## Architecture

Four layers connected through LiveKit:

1. Audio Transport — LiveKit + WebRTC

LiveKit Server handles WebRTC audio routing. The browser or phone connects via WSS to the token server, which proxies to LiveKit. Tailscale provides a trusted TLS certificate so iOS Safari accepts the WebSocket connection from any network.

2. Speech-to-Text — MLX-Whisper (local)

Silero VAD detects speech in the audio stream. When speech ends, the STT module transcribes with Whisper large-v3-mlx-4bit running locally on Apple Silicon. No audio is sent to an external STT service.

3. LLM — Claude via OpenClaw OAuth

The LLM adapter loads the OAuth token from OpenClaw's credential store — no separate Anthropic API key needed. Claude can call agent tools during the conversation: file read, command execution, memory search, and session list.

4. Text-to-Speech — Edge-TTS (free)

Microsoft Edge-TTS synthesizes responses — free, no API key required — and streams audio back to the caller through LiveKit.

Source files:

| File | Responsibility |
| agent.py | Main voice agent entry point — LiveKit agent loop |
| stt_mlx.py | Custom STT plugin using MLX-Whisper large-v3-mlx-4bit |
| llm_anthropic.py | LLM adapter — Claude via OpenClaw OAuth, with tool calling |
| tts_edge.py | TTS via Microsoft Edge-TTS (free) |
| tools.py | Agent tools: read_file, run_command, search_memory, list_sessions, think_carefully |
| token_server.py | HTTPS server + WSS proxy to LiveKit + JWT token generation |
| gateway.py | OpenClaw Gateway WebSocket client (device identity auth) |
| web/index.html | Browser call UI |

Agent tools available during voice conversations:

| Tool | What it does |
| read_file | Read a file from the local filesystem (truncated to max_lines) |
| run_command | Execute a shell command with a configurable timeout |
| search_memory | Search OpenClaw memory via the qmd CLI |
| list_sessions | List active OpenClaw sessions via Gateway WebSocket |

## Key Design Decisions

Local STT — audio never leaves the machine

MLX-Whisper runs on Apple Silicon faster than real-time. The privacy guarantee is architectural: the audio pipeline is entirely local, so there is no API call that could leak spoken content. Only the transcribed text travels to the Anthropic API — and even then, only if the operator is comfortable with that.

LiveKit for WebRTC — don't implement WebRTC directly

WebRTC signaling, ICE negotiation, and codec handling are complex. LiveKit provides a production-grade abstraction that handles multi-device routing, reconnection, and audio quality management. The alternative (raw WebRTC) would require maintaining significantly more infrastructure code.

Edge-TTS — free, no key, good quality

Microsoft's Edge TTS endpoint is free and produces natural-sounding speech without requiring an API key or billing account. The tradeoff: it requires an outbound connection to Microsoft's servers for each response. For fully air-gapped use, substitute an on-device TTS (e.g. Kokoro, or system TTS).

Tailscale for remote access — trusted TLS without a public domain

iOS Safari requires valid TLS for WebSocket connections. Getting a Let's Encrypt certificate for a home server normally requires a public domain. Tailscale Serve proxies the token server behind the Tailscale HTTPS endpoint — which has a valid Let's Encrypt certificate issued to the Tailscale DNS name — without exposing anything to the public internet.

JWT call links — single-use tokens per call

Each call link is a short-lived signed JWT that authorizes one participant to join one LiveKit room. There is no persistent login session. Links can be generated on demand and expire automatically, making it easy to share a call link with a phone without creating a persistent credential.

## How to Build Your Own

1. Use VAD before STT — don't transcribe continuous audio

Voice Activity Detection (Silero VAD or WebRTC VAD) detects when someone is speaking and when they've finished. Only the speech segment gets passed to Whisper. Without VAD, you'd either transcribe silence (waste) or need the user to press a button to speak (awkward).

2. MLX-Whisper on Apple Silicon, faster-whisper on CUDA, API elsewhere

MLX-Whisper is Apple Silicon-specific. On CUDA hardware, faster-whisper achieves similar throughput. For cloud deployment or hardware without a GPU, use an API-based STT service and accept that audio will leave the device. Make the STT layer swappable — the rest of the architecture is independent of the STT choice.

3. Token server pattern — keep LiveKit internal

The token server is a small HTTPS server that generates JWT tokens for LiveKit room access and proxies WebSocket connections. It's the only service exposed externally (via Tailscale). LiveKit itself runs on localhost — it doesn't need to be reachable from the browser directly.

4. iOS Safari requires valid TLS — plan for this

Safari on iOS refuses WebSocket connections to servers with self-signed certificates, even if the user manually accepts the cert. Tailscale Serve solves this cleanly. Without Tailscale, you need a public domain and Let's Encrypt, or a reverse proxy with a valid cert on a cloud host.

5. Inject MEMORY.md at session start for context continuity

Voice conversations are stateless by default — each call starts fresh. To maintain continuity with ongoing projects, inject the agent's MEMORY.md at the start of each session's system prompt. The agent will have immediate context about active projects without needing to be briefed verbally at the start of every call.

## Frequently Asked Questions

Is speech transcription done locally?

Yes. MLX-Whisper runs entirely on your Apple Silicon Mac — audio is never sent to an external STT service. Only the transcribed text (and any tool results) travels to the Anthropic API.

Do I need a paid Anthropic API key?

No. The LLM adapter authenticates using the OAuth token that OpenClaw manages. As long as you have an active OpenClaw session with a valid Anthropic OAuth credential, no separate API key is needed.

Does this work outside my home network?

Yes, via Tailscale. The start script configures Tailscale Serve to expose the token server on your Tailscale DNS name with a Let's Encrypt certificate. Any device in your Tailnet can then open a call link from any network.

Can I run this on Intel Mac or Linux?

MLX-Whisper requires Apple Silicon. On Intel Mac or Linux you would need to swap the STT module for a different provider — faster-whisper (CUDA), a system-level STT, or an API-based service. The rest of the architecture (LiveKit, Edge-TTS, the token server) is cross-platform.

---

## AI Dev Workforce — Claw-Stack Plugin

URL: https://claw-stack.com/en/plugins/agent-swarm
> Turn one developer into an entire AI development team. Morning Scan, task registry, AI PR review, and intelligent failure retry — all orchestrated autonomously.

### Morning Scan

                  Module 1

Runs daily at 09:00 EST via cron. Scans GitHub Issues for new actionable items, filters already-handled tasks, and automatically dispatches them to the coding agent — all before you've had your first coffee.

                GitHub Issues
                Cron
                Auto-dispatch

---

## Voice Control — OpenClaw

URL: https://claw-stack.com/en/plugins/voice-control
> Powered by Voice Control Plugin

Powered by Voice Control Plugin

# Executive Voice Interface

Command your AI. Hands-free.

Command your digital workforce while driving or walking. Turn voice memos into executed tasks and complex deployments without touching a keyboard.

Your best ideas happen when your hands are full. Don't lose them to a locked screen. Speak it — your AI executes it.

Listening...

🎙️ Transcribing...

⚡ Executing...

✓ Fetching quarterly project metrics from data store...

✓ Generating executive summary (247 rows)...

✓ Email sent to sarah@company.com

✅ Done in 4.2s

Private — audio never leaves your device

Commands — any task your agent can do, by voice

## How It Works

### 100% Local Processing

MLX-Whisper runs on your Apple Silicon. No cloud, no API fees, no audio sent anywhere. Your conversations stay private.

### Works From Anywhere

Call your AI from your iPhone via Tailscale. Secure HTTPS, no VPN setup, works globally.

### Full Agent Control by Voice

Run shell commands, search memory, check task status, steer agents — anything you would type, just say it.

## Built for Real Life

### Commute

Brief your AI on today's priorities while driving to the office.

### On the Go

Capture decisions instantly — AI executes while you walk.

### After Calls

Summarize a client call and draft a follow-up email by voice.

### Late Night

Check agent status and redirect tasks without opening a laptop.

          📖 Technical Documentation →

---

## Blog — Claw-Stack

URL: https://claw-stack.com/en/blog
> Writing on AI agent architecture, multi-agent systems, memory design, and lessons from building Claw-Stack as a personal research project.

Read →

---