# Claw-Stack β€” Full Documentation > Complete content of all pages on claw-stack.com/en/ for AI consumption. > Auto-generated from src/pages/en/. Do not edit manually. URL: https://claw-stack.com Generated: 2026-03-09 --- ## https://claw-stack.com/en/ URL: https://claw-stack.com/en/ > Home β€” overview of the Claw-Stack research platform CAPABILITIES ## Core Capabilities Three systems working in concert, so your AI never misses a beat. ### Governance & Security Policy enforcement and real-time threat interception. Every agent action governed and audited before execution. claw-guard.log [CLAW-GUARD] πŸ›‘οΈ Monitoring... [AGENT-001] > "aws s3 rb s3://prod --force" [CLAW-GUARD] πŸ”΄ BLOCKED: destructive op. [CLAW-GUARD] ⚠️ High Risk (Policy #902). [CLAW-GUARD] πŸ”’ Admin approval required. [CLAW-GUARD] β–ˆ ### AI Memory Persistent, searchable memory across every session. Your AI remembers what matters. memory-demo ### Live Intelligence GitHub, HN, Reddit, YouTube β€” your AI reads the internet so you don't have to. LIVE Explore all modules & capabilities β†’ --- ## https://claw-stack.com/en/showcase URL: https://claw-stack.com/en/showcase > **#20 of 362** **#20 of 362** Top 6% bearcatctf.com/scoreboard score_over_time.png v1 v2 βœ… Trinity ### Commander ### Librarian ### Operator trinity β€” TwistedPair BCCTF{D0n7_g37_m3_Tw157eD} --- ## https://claw-stack.com/en/agents URL: https://claw-stack.com/en/agents > Orange Orange Claude Opus 4 Researcher Claude Sonnet 4 Coder Claude Sonnet 4 Content Claude Sonnet 4 Meeting Claude Sonnet 4 CTF Agents Commander CIPHER Claude Opus 4 Operator GRUNT Claude Sonnet 4 Librarian SAGE Claude Haiku 4 --- ## https://claw-stack.com/en/about URL: https://claw-stack.com/en/about ## Qiushi Wu ## Orange 🍊 --- ## Documentation β€” Claw-Stack URL: https://claw-stack.com/en/docs > Complete documentation for Claw-Stack: architecture, memory system, multi-agent consensus, policy enforcement, and deployment guides. # Documentation Documentation for Claw-Stack β€” a research project exploring agent governance on top of OpenClaw. What is Claw-Stack? Definition, value proposition, and how it differs from raw OpenClaw. Architecture Overview Sidecar Pattern, layered wrapping, and system topology. Getting Started Install OpenClaw, configure Claw-Stack, and run your first agent. BearcatCTF 2026 Case Study Trinity architecture. #20 out of 362 teams. 40 of 44 challenges solved in 48 hours. ## Modules Multi-Agent Consensus Protocol Structured turn-based agent discussions with rolling summaries and automatic consensus detection. Smart Scheduler & Deadline Watch Temporal context injection, background task tracking, and persistent event log for LLM agents. Web Automation Operator 26 MCP tools for browser automation, debugging, and performance analysis via Chrome DevTools Protocol. Live Intelligence Feed Multi-source AI/tech content pipeline across 7 platforms with keyword scoring and unified output schema. Encrypted State Archive Automated encrypted, deduplicated backups of the agent workspace using restic and rclone. Governance & Security Six-module security plugin: spotlighting, audit logging, least-privilege, LLM guard, HMAC comms, memory ACL. Executive Voice Interface WebRTC voice interface with local STT (MLX-Whisper), Edge-TTS, and Claude via OpenClaw OAuth. --- ## What is Claw-Stack? β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/what-is-claw-stack > Claw-Stack is a personal research project that wraps OpenClaw with persistent memory, multi-agent consensus, and policy enforcement β€” exploring how to transform a bare execution engine into a safer, more capable agent runtime. # What is Claw-Stack? Claw-Stack is a personal research project that wraps OpenClaw with persistent memory, multi-agent consensus, and policy enforcement β€” exploring how to transform a bare execution engine into a safer, more capable agent runtime. ## The Problem with Raw OpenClaw OpenClaw is a powerful execution engine for AI agents. It handles tool calls, context windows, and model routing well. But a raw OpenClaw installation gives you a stateless executor: the moment a session ends, memory is gone. There are no guardrails on what tools agents can call, no mechanism for agents to coordinate on high-stakes decisions, and no audit trail for what happened and why. Running OpenClaw in production without additional infrastructure is like running a database without backups, transactions, or access control. It works β€” until something goes wrong. ## What Claw-Stack Adds Claw-Stack does not fork or modify OpenClaw. It wraps it using the Sidecar Pattern, running alongside the OpenClaw process and intercepting agent actions at defined policy gates. This means OpenClaw updates are applied cleanly without merge conflicts. | Capability | Raw OpenClaw | Claw-Stack | | Memory across sessions | None β€” context resets | 3-tier persistent memory | | Multi-agent coordination | Manual orchestration only | Consensus protocol built-in | | Tool access control | All or nothing | Per-agent allowlists | | High-stakes approval gates | Not available | Human-in-the-loop workflows | | Audit logging | None | Full action audit trail | | Upgradeable without conflict | Yes | Yes β€” Sidecar pattern | ## Key Value Propositions Memory that persists Agents remember facts, decisions, and lessons across sessions using a three-tier system: instant MEMORY.md recall, structured per-topic memory files, and a semantic vector index for deep retrieval. Consensus before action For high-stakes decisions β€” deploys, financial operations, config changes β€” multiple agents debate and vote before any action is executed. No single agent can unilaterally do something irreversible. Governed by policy, not trust Every agent operates under a declared policy: which tools it may use, which domains it may access, and which operations require human approval. Violations are blocked and logged, not ignored. Competition-tested The system placed **#20 out of 362 teams (top 6%)** at BearcatCTF 2026, solving 40 of 44 challenges autonomously in 48 hours. A real-world stress test under competition conditions. ## What Claw-Stack Is Not - β€” Not a fork of OpenClaw. The upstream project is untouched. - β€” Not a cloud service. Claw-Stack is self-hosted and designed for air-gapped environments. - β€” Not a replacement for OpenClaw. It is a runtime layer that requires OpenClaw to function. - β€” Not model-specific. Claw-Stack works with any model that OpenClaw supports. ## Frequently Asked Questions What is OpenClaw? OpenClaw is an MIT-licensed, self-hosted gateway that connects messaging apps (WhatsApp, Telegram, Discord, iMessage, and more) to AI coding agents. It runs as a Node.js daemon on your own machine and handles model calls, tool invocations, multi-agent sessions, and channel routing. Claw-Stack is a personal research project built on top of OpenClaw, adding governance and memory layers through separate modules. Is Claw-Stack open source? The individual modules that make up Claw-Stack are open source and self-hosted. You run everything on your own infrastructure β€” there is no Claw-Stack cloud and no telemetry. See each module's repository for its specific license. Which AI models does it support? Claw-Stack is model-agnostic at the infrastructure level. The runtime layer is tested primarily with Claude Opus 4, Claude Sonnet 4, and Claude Haiku 4 β€” but any model supported by OpenClaw can be used. --- ## Getting Started β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/getting-started > Install OpenClaw, configure workspace files, connect channels, and layer on Claw-Stack modules. A guide to the real setup. # Getting Started Claw-Stack is not a single package β€” it is a collection of modules built on top of OpenClaw. The first step is always getting OpenClaw running. Everything else layers on after that. Private beta Claw-Stack is an active personal research project. The modules described here are open source but not packaged as a turnkey product. For questions or collaboration, reach out at hello@claw-stack.com . ## Prerequisites | Requirement | Notes | | Node.js 22+ | Required by OpenClaw | | npm | For installing OpenClaw globally | | An API key | Anthropic, OpenAI, Gemini, or another OpenClaw-supported provider | | macOS or Linux | Windows is not officially supported by OpenClaw | ## Step 1 β€” Install OpenClaw Install OpenClaw globally via npm. OpenClaw is the Node.js gateway that Claw-Stack modules build on top of. npm install -g openclaw openclaw --version For detailed installation instructions, pairing flows, and platform-specific notes, refer to the official OpenClaw documentation . ## Step 2 β€” Start the Gateway The OpenClaw gateway is the central process that manages agents, channels, and tool execution. Start it with: openclaw gateway start On first start, OpenClaw creates `~/.openclaw/` and populates it with a default workspace. The gateway config lives at `~/.openclaw/openclaw.json`. You can also run `openclaw onboard` for a guided wizard that walks through pairing your first channel and model. ## Step 3 β€” Configure the Workspace OpenClaw automatically creates `~/.openclaw/workspace/` and loads several Markdown files into every agent's context at session start. These are the primary way to configure agent behavior: | File | Purpose | | MEMORY.md | Loaded at session start β€” current projects, active tasks, key facts. Keep under ~200 lines. | | AGENTS.md | Behavioral rules: how to route tasks, subagent patterns, session hygiene, media rules. | | TOOLS.md | Tool usage guide: server SSH details, CLI syntax for MCP tools, search priority, etc. | | SOUL.md | Agent personality and communication style. | | IDENTITY.md | Agent identity: name, role, capabilities summary. | | memory/ | Per-topic Markdown files loaded on demand: entities/, patterns/, lessons.md, etc. | Each file is plain Markdown β€” edit them directly. Per-agent workspaces (e.g. `~/.openclaw/workspace-coding/`) follow the same structure and override the defaults for that agent. ## Step 4 β€” Connect Channels OpenClaw connects to messaging platforms so you can talk to your agents from anywhere. Channels are configured in `~/.openclaw/openclaw.json`. Supported channels include iMessage, Telegram, Discord, WhatsApp, and others. Use `openclaw onboard` for the guided pairing flow, or consult the OpenClaw channel docs for manual configuration. Once a channel is paired, messages from your configured accounts will route to the gateway. ## Step 5 β€” Add Claw-Stack Modules Claw-Stack modules are separate projects you clone and configure alongside your running OpenClaw gateway. Each module integrates through one or more of OpenClaw's native extension points: MCP Servers Modules like `agent-time-awareness` and `chrome-devtools-mcp` expose MCP tools that the agent calls via `mcporter`. Start the MCP server, then configure it in TOOLS.md so the agent knows how to use it. OpenClaw Skills Skills are Markdown files in `~/.openclaw/workspace/skills/` that agents read before performing specialized tasks (image processing, document generation, etc.). Cron Tasks The OpenClaw cron system schedules recurring agent work β€” daily memory sync, backups, heartbeats. Modules like `openclaw-backup` add their backup script to the cron schedule. OpenClaw Plugins Modules like `openclaw-security` install as OpenClaw plugins that hook into before/after tool call events via the plugin system. Each module's README documents its specific setup steps. Start with the ones most relevant to your use case: | Module | Doc page | | Multi-Agent Consensus (agent-meeting) | Modules β†’ Multi-Agent Consensus Protocol | | Smart Scheduler (agent-time-awareness) | Modules β†’ Smart Scheduler & Deadline Watch | | Web Automation (chrome-devtools-mcp) | Modules β†’ Web Automation Operator | | Intelligence Feed (info-pipeline) | Modules β†’ Live Intelligence Feed | | Backup (openclaw-backup) | Modules β†’ Encrypted State Archive | | Security (openclaw-security) | Modules β†’ Governance & Security | | Voice Interface (voice-call) | Modules β†’ Executive Voice Interface | ## Frequently Asked Questions Do I need OpenClaw to use Claw-Stack? Yes. Every Claw-Stack module is designed to work alongside a running OpenClaw gateway. The modules integrate through OpenClaw's extension points β€” MCP servers, skills, plugins, cron tasks, and workspace configuration files. None of them operate as a standalone system. Where does OpenClaw store everything? Everything lives under `~/.openclaw/`: the gateway config (`openclaw.json`), the main workspace (`workspace/`), per-agent workspaces (`workspace-coding/`, etc.), agent-specific state (`agents/`), logs, media, cron schedules, and credentials. Is there a web dashboard? OpenClaw includes a built-in control UI that you can open from the gateway. It shows chat, session status, and configuration. This is separate from any Claw-Stack modules β€” it is part of OpenClaw itself. Check the OpenClaw docs for how to access it. --- ## Architecture Overview β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/architecture-overview > How Claw-Stack modules integrate with OpenClaw through skills, MCP servers, cron tasks, plugins, and workspace configuration β€” without modifying the upstream gateway. # Architecture Overview Claw-Stack is not a monolithic runtime that wraps OpenClaw. It is a collection of independent modules β€” each a separate project β€” that integrate with a running OpenClaw gateway through its native extension points, without modifying the upstream source. ## OpenClaw: The Foundation OpenClaw runs as a Node.js daemon (the "gateway") on your host machine. It manages AI agent sessions, routes messages from connected channels (iMessage, Telegram, Discord, WhatsApp, etc.) to agents, executes tool calls, and exposes a web control UI. All persistent configuration lives in `~/.openclaw/openclaw.json`. Claw-Stack modules extend this gateway through mechanisms OpenClaw already provides. There is no Claw-Stack runtime process β€” the gateway itself is the runtime. ## Directory Structure ~/.openclaw/ β”œβ”€β”€ openclaw.json # Gateway config: agents, models, tool policy, channels β”œβ”€β”€ workspace/ # Main agent workspace (loaded at session start) β”‚ β”œβ”€β”€ MEMORY.md # Session-start context (keep under ~200 lines) β”‚ β”œβ”€β”€ AGENTS.md # Behavioral rules, routing, session hygiene β”‚ β”œβ”€β”€ TOOLS.md # Tool usage guide, MCP server syntax β”‚ β”œβ”€β”€ SOUL.md # Agent personality β”‚ β”œβ”€β”€ IDENTITY.md # Agent identity and role β”‚ β”œβ”€β”€ memory/ # Per-topic memory files (loaded on demand) β”‚ β”‚ β”œβ”€β”€ entities/ # Projects, contacts, servers β”‚ β”‚ β”œβ”€β”€ patterns/ # Reusable patterns and workflows β”‚ β”‚ β”œβ”€β”€ lessons.md # Extracted lessons from past sessions β”‚ β”‚ └── summaries/ # Session summaries β”‚ β”œβ”€β”€ memory-system/ # Memory organizer scripts (Claw-Stack module) β”‚ └── skills/ # Agent skill files β”œβ”€β”€ workspace-coding/ # Per-agent workspace for the coding agent β”œβ”€β”€ workspace-researcher/ # Per-agent workspace for the researcher agent β”œβ”€β”€ agents/ # Agent-specific state directories β”œβ”€β”€ cron/ # Scheduled tasks β”œβ”€β”€ subagents/ # Subagent session state β”œβ”€β”€ logs/ # Gateway and agent logs └── credentials/ # Encrypted credentials store ## Integration Points Each Claw-Stack module uses one or more of these extension points: 1. MCP Servers Modules expose their functionality as Model Context Protocol servers. The agent calls these tools through `mcporter`. Examples: `agent-time-awareness` (time context via HTTP MCP), `chrome-devtools-mcp` (browser control via npx). Config: TOOLS.md documents the MCP tool syntax; the agent calls them natively during sessions. 2. OpenClaw Plugins (Hooks) Plugins hook into the gateway's event lifecycle: `before_tool_call`, `after_tool_call`, `llm_input`, `llm_output`. Example: `openclaw-security` installs a JS bridge plugin that calls Python security modules on each hook event. Config: `openclaw plugins install --link ~/projects/openclaw-security/openclaw-plugin` 3. Workspace Configuration Files AGENTS.md, TOOLS.md, MEMORY.md, and skills files load into agent context at session start. This is the lowest-overhead integration β€” no running process required. Example: `info-pipeline` reports are referenced in MEMORY.md for daily briefings. Config: edit the Markdown files directly. 4. OpenClaw Cron Tasks The gateway has a built-in cron scheduler. Modules register shell commands to run on a schedule. Example: `openclaw-backup` schedules `backup.sh` to run daily; the memory-system organizer runs on its own cron schedule. Config: `openclaw cron add --schedule "0 4 * * *" --command "/path/to/script.sh"` 5. Subagent Orchestration OpenClaw has native subagent support. The main agent can spawn subagents via `sessions_spawn`, each running with their own workspace and tool permissions. `agent-meeting` uses this to coordinate multiple agents through a structured meeting loop. Config: `agents.list[].subagents.allowAgents` in openclaw.json. ## Agent Configuration Agents are defined in `~/.openclaw/openclaw.json` under `agents.list`. Each entry specifies the model, workspace path, identity, and which subagents it may spawn: // ~/.openclaw/openclaw.json (excerpt) "agents": "defaults": "model": "primary": "anthropic/claude-opus-4-6""}, "workspace": "/Users/you/.openclaw/workspace", "subagents": "maxConcurrent": 8, "maxSpawnDepth": 2"} "list": [ "id": "main", "subagents": "allowAgents": ["*"]"} "id": "coding", "workspace": "/Users/you/.openclaw/workspace-coding", "model": "anthropic/claude-sonnet-4-6" "id": "meeting", "workspace": "/Users/you/.openclaw/workspace-meeting", "tools": "allow": ["read", "memory_search", "memory_get", "session_status"]"} ## Why Not Modify OpenClaw Directly? Forking or patching OpenClaw creates a maintenance burden that grows with every upstream release. Using OpenClaw's native extension points means the gateway updates cleanly via `npm install -g openclaw` without merge conflicts. Each Claw-Stack module evolves independently β€” adding new capabilities without touching the agent execution engine. ## Frequently Asked Questions Is there a separate Claw-Stack runtime process? No. The OpenClaw gateway is the only persistent process. Claw-Stack modules either run as MCP servers (when they need to be always-on) or as on-demand scripts invoked by cron or the agent. There is no Claw-Stack daemon, no IPC socket, and no policy gate process. Can individual modules be used without the others? Yes. Each module is an independent project with its own README and setup steps. You can run just `chrome-devtools-mcp` without `openclaw-backup`, or just the memory system without the voice interface. What happens if an MCP server module crashes? The agent receives a tool error from the MCP call and can handle it like any other tool failure. The OpenClaw gateway itself continues running. Other modules and agents are unaffected. Restart the crashed MCP server and the tool becomes available again in the next session. --- ## Persistent Memory System β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/persistent-memory > How OpenClaw agents maintain persistent memory across sessions using MEMORY.md, per-topic memory files, and the memory-system organizer with SQLite, FTS5, and QMD vector search. # Persistent Memory System OpenClaw agents accumulate knowledge across sessions through a layered memory system: a compact index loaded at session start, per-topic Markdown files read on demand, and an organizer pipeline that extracts facts from raw session files, deduplicates them, and indexes them for semantic search. ## Three Design Inspirations The memory-system organizer (`~/.openclaw/workspace/memory-system/`) draws on three research paradigms: Mem0 β€” Fact Extraction After sessions, the organizer uses an LLM (Claude Haiku via OpenClaw OAuth) to extract discrete facts from raw memory files β€” preferences, project state, lessons learned, active tasks. Extracted facts are structured, categorized, and stored in SQLite with FTS5. Zep β€” Temporal Decay Each fact carries a TTL. Personal facts and system state are permanent; task-related facts expire after 7 days, agent-related facts after 30. A decay engine periodically checks expired entries, uses an LLM to summarize them into lessons, and archives them β€” never deletes them. MemGPT β€” Agent Self-Management Agents write new memories directly via `python -m src.tools write`. A separate lessons extractor scans session JSONL files for "failure β†’ retry β†’ success" patterns, runs LLM extraction, and appends structured lessons to `memory/lessons.md`. ## Memory Layers ### Layer 1 β€” MEMORY.md (Session-Start Context) Loaded into every agent's context window at the start of each session. Intentionally kept short (under ~200 lines) to avoid consuming the context budget on sessions with large task descriptions. Contains only the most current, high-priority facts: recent activity, active projects, key contacts, infrastructure notes. # MEMORY.md β€” long-term memory index # (agent loads this at session start) ## Recent Activity - [2026-03-05] claw-stack website: major refactor, 71 pages - [2026-03-04] AI Town: WS + interpolation + tile atlas fixes ## Active Projects | Project | Path | Status | Memory file | | claw-stack | ~/projects/claw-stack/ | active | memory/website-claw-stack.md | ## Key Facts - Owner timezone: EST - Host: macOS, Apple Silicon, local network ### Layer 2 β€” memory/*.md (On-Demand Knowledge) The `memory/` directory holds per-topic Markdown files that agents read when they need deep context on a specific subject. MEMORY.md stores references (links) to these files; agents fetch the full file when needed. | Directory / File | Contents | | memory/entities/ | Projects, contacts, servers β€” one file per entity | | memory/patterns/ | Reusable patterns and workflows discovered over time | | memory/lessons.md | Extracted lessons from past failures and retries | | memory/summaries/ | Session summaries (written at end of sessions) | | memory/events/ | Time-stamped event records | | memory/structured/ | Facts exported per category by the organizer pipeline (used by QMD for vector indexing) | ### Layer 3 β€” QMD Semantic Search (Deep Retrieval) OpenClaw's built-in `memory_search` tool (QMD) runs a 768-dimensional embedding model (`embeddinggemma`) over the `memory/structured/` files. The index auto-updates every 10 minutes. Queries go through `qwen3-reranker` for semantic re-ranking and query expansion. This handles questions like "what did we decide about X three months ago" without the agent scanning every file manually. ## The Organizer Pipeline The memory-system organizer runs on a schedule (cron) and processes raw memory files through a pipeline: [Organizer β€” runs on cron schedule] β†’ scan memory/*.md (MD5 hash check, skip unchanged) β†’ LLM fact extraction (Haiku via OpenClaw OAuth) β†’ deduplication (word overlap > 60% β†’ merge) β†’ classification: personal / tasks / agents / system β†’ TTL assignment: personal=∞, tasks=7d, agents=30d, system=∞ β†’ store in SQLite + FTS5 (memories.db, WAL mode) β†’ export structured/*.md for QMD vector indexing β†’ update INDEX.md (compact index ~2KB) [Decay Engine β€” runs separately] β†’ scan SQLite for expired TTLs β†’ LLM summarize expired entries into lessons β†’ archive (removed from index, still searchable) [Lessons Extractor β€” runs separately] β†’ scan session JSONL for error β†’ retry β†’ success patterns β†’ LLM extraction: scenario / wrong approach / correct approach / reason β†’ dedup (word overlap > 70%, then LLM DUPLICATE/NOVEL check) β†’ append to memory/lessons.md ## Memory Search Script A standalone `memory-search.py` script provides intent-aware search across agent memory: # Search coding agent's memory for tmux content uv run python memory-search.py --query "tmux claude code" --agent coding # Search all agents, return top 3 uv run python memory-search.py --query "deployment patterns" --all-agents --top 3 # Smart mode: use Claude API for intent analysis uv run python memory-search.py --query "how to deploy" --smart --verbose # Search only lessons uv run python memory-search.py --query "mistakes" --category lessons --agent main ## Frequently Asked Questions Can memory files be edited manually? Yes. All memory files are plain Markdown on disk. You can edit, delete, or reorganize them directly. The organizer uses MD5 hashing to detect changed files and will re-process them on the next run. Changes to MEMORY.md take effect in the next session immediately β€” no pipeline run needed. Is memory shared between agents? Each agent has its own workspace and memory directory. There is also a `~/.openclaw/shared/memory/` directory that all agents can read. Agents write to their own memory; the orchestrator (main agent) coordinates shared memory updates. What is the token cost of loading MEMORY.md at session start? MEMORY.md is kept under ~200 lines by design. The actual token cost varies with content density but keeping it to an index-only file (with links to the detailed entity files) minimizes context overhead while still giving the agent immediate access to current state. --- ## Policy Enforcement β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/policy-enforcement > How OpenClaw governs agent tool access through tool allow/deny lists, sandbox modes, and the elevated exec escape hatch β€” all configured in openclaw.json. # Policy Enforcement OpenClaw governs what agents can do through three distinct control layers: sandbox (where tools run), tool policy (which tools are callable), and elevated (an exec-only escape hatch). All configuration lives in `~/.openclaw/openclaw.json`. ## The Three Control Layers 1. Sandbox β€” where tools run Controls whether tool execution happens in a Docker container or directly on the host. Configured via `agents.defaults.sandbox.mode`: | "off" | All tools run on the host (default for personal setups) | | "non-main" | Only non-main sessions (groups, channels) are sandboxed | | "all" | All sessions run in Docker containers | 2. Tool Policy β€” which tools are callable Defines which built-in OpenClaw tools an agent can call. Configured with `tools.allow` and `tools.deny` at the global level, or `agents.list[].tools.allow/deny` per-agent. **deny always wins.** If allow is non-empty, everything else is blocked. 3. Elevated β€” exec-only escape hatch When an agent is sandboxed, `elevated` lets specific exec calls run on the host instead. It does _not_ grant extra tools and does _not_ override tool allow/deny β€” it only affects where exec runs. Gated by `tools.elevated.enabled` and `tools.elevated.allowFrom`. ## Tool Groups Allow/deny lists accept `group:*` shorthands that expand to multiple tools: | Group | Expands to | | group:runtime | exec, bash, process | | group:fs | read, write, edit, apply_patch | | group:sessions | sessions_list, sessions_history, sessions_send, sessions_spawn, session_status | | group:memory | memory_search, memory_get | | group:ui | browser, canvas | | group:automation | cron, gateway | | group:messaging | message | | group:openclaw | All built-in OpenClaw tools (excludes provider plugins) | ## Per-Agent Configuration Each entry in `agents.list` can override global tool policy. Here is a real example β€” the `meeting` agent has a restricted tool allowlist so it can only read files and query memory: // ~/.openclaw/openclaw.json (excerpt) "agents": "list": [ "id": "meeting", "workspace": "/Users/you/.openclaw/workspace-meeting", "model": "anthropic/claude-sonnet-4-6", "tools": "allow": ["read", "memory_search", "memory_get", "session_status"] "id": "redteam", "workspace": "/Users/you/.openclaw/workspace-redteam", "model": "anthropic/claude-opus-4-6", "subagents": "allowAgents": ["operator", "librarian"] ## Agents and Their Real Access Patterns Behavioral constraints on _how_ an agent acts are defined in the workspace AGENTS.md file β€” separate from the technical tool policy in openclaw.json. Together they form the practical access model: | Agent | Model | Constraints (AGENTS.md) | | main (Orange) | Opus | Can spawn any subagent; financial ops require human confirmation | | coding | Sonnet | Full shell in project dirs; blocked command patterns (curl to external, rm -rf /) | | researcher | Sonnet | Web search, web fetch, file read/write in workspace; can spawn coding + content | | redteam (CIPHER Commander) | Opus | CTF only; can spawn operator + librarian; no external system attacks | | operator (GRUNT) | Sonnet | Shell exec for CTF tasks; no subagents | | meeting | Sonnet | tools.allow: [read, memory_search, memory_get, session_status] only | ## Security Audit OpenClaw ships a built-in security audit command that flags common misconfigurations: openclaw security audit openclaw security audit --deep openclaw security audit --fix openclaw security audit --json It checks: gateway auth exposure, browser control exposure, elevated allowlists, and filesystem permissions. Run it after any config change. ## Debugging Policy To see exactly what sandbox and tool policy is active for a given session: # Check effective policy for the default agent openclaw sandbox explain # Check for a specific agent openclaw sandbox explain --agent coding # Machine-readable output openclaw sandbox explain --json The output shows the effective sandbox mode, whether the session is sandboxed, effective tool allow/deny (with the source: agent config, global config, or default), and elevated gates. ## Frequently Asked Questions What is the difference between AGENTS.md and tool policy in openclaw.json? AGENTS.md is a Markdown file loaded into the agent's context window β€” it describes behavioral rules the agent is expected to follow (routing, communication style, what requires human confirmation). Tool policy in openclaw.json is a hard technical constraint enforced by the gateway β€” the agent cannot call a denied tool regardless of what AGENTS.md says. Does deny always take precedence over allow? Yes. Per the OpenClaw docs: "deny always wins." If a tool appears in both allow and deny, it is denied. If allow is non-empty, any tool not in the allow list is implicitly denied. What security model does OpenClaw assume? OpenClaw is designed for a personal assistant model: one trusted operator per gateway. It is not a multi-tenant security boundary for adversarial users. If multiple untrusted users can message one tool-enabled agent, they effectively share the same delegated tool authority. For separate trust boundaries, run separate gateway instances. --- ## BearcatCTF 2026 Case Study β€” Claw-Stack Docs URL: https://claw-stack.com/en/docs/bearcat-ctf-case-study > How Claw-Stack's Trinity architecture β€” Commander (CIPHER), Operator (GRUNT), and Librarian (SAGE) β€” placed #20 out of 362 teams at BearcatCTF 2026, solving 40 of 44 challenges autonomously in 48 hours. # BearcatCTF 2026 Case Study At BearcatCTF 2026, Claw-Stack's Trinity architecture competed autonomously β€” no human solved any challenge. The system placed **#20 out of 362 teams (top 6%)**, solving **40 of 44 challenges** in 48 hours. Final rank Total teams Challenges solved 48h Competition window ## The Trinity Architecture The CTF system used a specialized three-agent configuration called the Trinity. Each agent has a distinct role, model, and permission boundary. They coordinate through a shared blackboard β€” a persistent key-value store that tracks challenge state, discovered credentials, and failed approaches. Commander `CIPHER` Claude Opus 4 The strategic brain. CIPHER does full lifecycle management of each challenge: reading the challenge description, decomposing it into sub-tasks, maintaining the blackboard, spawning Operator instances for execution, and consulting Librarian for knowledge gaps. CIPHER never executes system commands directly. βœ“Spawn Operator/Librarian instances βœ“Full blackboard read/write βœ—Cannot read flag files directly βœ—Cannot access systems outside CTF scope Operator `GRUNT` Claude Sonnet 4 The tactical executor. GRUNT receives a specific sub-task from CIPHER with full context from the blackboard, executes shell commands and exploit scripts in isolated Docker containers, reports results back as structured JSON, and handles micro-level errors (permission issues, missing dependencies) without bothering CIPHER. GRUNT's context resets between tasks β€” it is stateless by design. βœ“Shell exec in CTF containers βœ“Write exploit scripts βœ—Cannot attack real external systems βœ—Cannot run binaries on host macOS Librarian `SAGE` Claude Haiku 4 The knowledge specialist. SAGE handles all research tasks so CIPHER and GRUNT can stay focused on execution. It searches the local CTFKnowledges database for relevant techniques, queries CTFTools for available tools and usage patterns, and performs web searches for CVEs and writeups when local knowledge is insufficient. It returns a maximum of 3 results to avoid context bloat. βœ“Local knowledge base search βœ“Web search and CVE lookup βœ—Cannot execute system commands βœ—Read-only access except own lessons.md ## The Blackboard The shared blackboard was the critical innovation that prevented duplicate work and preserved state across CIPHER's long-running sessions. It tracked: - β€” Challenge state : unsolved / in-progress / solved / abandoned - β€” Discovered assets : IPs, ports, service banners, credentials found - β€” Failed attempts : approaches that didn't work, to prevent repetition - β€” Flags captured : confirmed flag strings submitted to the scoreboard - β€” GRUNT task queue : pending sub-tasks with priority ordering ## Challenge Category Breakdown | Category | Solved | Notes | | Cryptography | 8/8 | SAGE's knowledge base contained most attack patterns | | Misc | 7/8 | One challenge required image analysis beyond current capabilities | | Reverse Engineering | 6/7 | One challenge involved visual pattern recognition the system lacks | | Forensics | 7/7 | Strong performance across memory dumps, disk images, and packet captures | | Binary Exploitation (Pwn) | 5/5 | GRUNT handled buffer overflows, ROP chains, and format strings | | OSINT | 3/5 | Image-based reconnaissance limited by weak visual analysis capabilities | | Web | 4/4 | GRUNT excelled at SQLi, SSRF, and JWT forgery | | Total | 40/44 (91%) | #20 / 362 teams β€” top 6% | ## Lessons Learned **Blackboard prevents repetition.** Without the failed-attempt log, GRUNT repeatedly tried the same approaches on heap challenges. Once the blackboard was implemented, dead-end approaches were not revisited. **Stateless GRUNT scales well.** Running GRUNT as a stateless executor (context reset per task) allowed CIPHER to spawn multiple parallel GRUNT instances without context window conflicts. **Haiku for knowledge retrieval is cost-effective.** SAGE used Claude Haiku 4, which returned answers fast and cheaply. Most knowledge retrieval does not require frontier-model reasoning β€” it is search and retrieval, not synthesis. **Image analysis is the current bottleneck.** The 4 unsolved challenges (1 rev, 2 OSINT, 1 misc) all required visual/image analysis β€” recognizing patterns in images, reading text from screenshots, or interpreting visual clues. This is a known weakness of current LLM-based agent systems. ## Frequently Asked Questions Did any human solve challenges during the competition? No. The system ran fully autonomously for the entire 48-hour window. The human operator monitored the dashboard but did not intervene in any challenge. All 40 flags were captured and submitted by the Trinity system without human assistance. What were the 4 unsolved challenges? One reverse engineering challenge, two OSINT challenges, and one misc challenge. All four required image or visual analysis β€” recognizing patterns, reading text from images, or interpreting visual clues β€” which is a known limitation of current LLM-based agent systems. --- ## Multi-Agent Consensus Protocol β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/multi-agent-consensus > Design doc for agent-meeting: how the Mediator Pattern, rolling summaries, and deterministic stance detection enable structured multi-agent consensus built on OpenClaw. # Multi-Agent Consensus Protocol Coordinates multiple OpenClaw agents through structured turn-based discussions using the Mediator Pattern. Each agent reasons independently, embeds a stance marker in its response, and the coordinator automatically detects consensus and writes meeting minutes. ← Module Overview ## What `agent-meeting` is a multi-agent meeting system. A coordinator sends each agent the current rolling summary and a question; agents reply with their reasoning and a structured stance marker. After each round the coordinator extracts stances, checks for consensus, compresses the conversation history, and either advances to the next round or writes a final Markdown minutes file. ## Why Multi-agent collaboration without structure has two failure modes: token explosion and ambiguous outcomes. Without a coordinator, agents send messages to each other in NΓ—N patterns β€” every agent talks to every other agent, and the conversation history grows quadratically. Without a formal consensus mechanism, there is no reliable way to know when the group has actually agreed on something. Prompt-based multi-agent frameworks typically ask agents to "discuss" a topic and then summarize the conversation. This works for simple cases but breaks down when rounds are long, agents disagree, or you need a verifiable audit trail. This module formalizes the process: a fixed protocol, deterministic consensus detection, and a structured minutes file that records exactly what was said and decided. ## Architecture | Component | Responsibility | | Coordinator | Meeting orchestration loop β€” drives rounds, manages state, routes all messages | | Session Manager | Spawns and sends messages to OpenClaw agent sessions | | Summarizer | Compresses round history to ~500 tokens after each round | | Consensus Detector | Stance extraction via regex; FULL / MAJORITY / NO consensus logic | | Minutes Writer | Generates final Markdown minutes with full transcript | | Timeout Handler | Per-agent 60s timeout; overall meeting 10-minute limit | The round lifecycle: Coordinator β†’ send each agent: rolling summary + question β†’ collect responses with [STANCE: AGREE/DISAGREE/NEUTRAL] β†’ extract stances (regex, deterministic) β†’ check consensus β†’ if consensus reached or max rounds β†’ write minutes β†’ else β†’ compress history to ~500 tokens β†’ next round Consensus rules: | Result | Condition | | FULL_CONSENSUS | All agents AGREE | | MAJORITY_CONSENSUS | ≥ 2/3 of agents AGREE, zero DISAGREE | | NO_CONSENSUS | Neither condition met β€” continue to next round | ## Key Design Decisions Mediator Pattern β€” no NΓ—N chatter All messages route through the coordinator. Agents never communicate directly. This keeps the message graph linear (N messages per round, not NΓ—N) and gives the coordinator full control over the conversation state. Rolling summaries β€” O(1) tokens per round After each round, the full transcript is compressed to ~500 tokens. Each subsequent round only sees the rolling summary plus the new round's responses β€” not the entire history. Token usage stays roughly constant regardless of round count, making long meetings tractable. Stance markers β€” deterministic, no extra LLM call Agents embed `[STANCE: AGREE | DISAGREE | NEUTRAL]` in their responses. Regex parsing extracts the stance without an additional LLM call for interpretation. An unknown or missing stance is treated as NEUTRAL β€” it won't block a majority consensus. Early exit on consensus The meeting ends as soon as full or majority consensus is reached, even before max rounds. This avoids pointless additional rounds when agents have already agreed. ## How to Build Your Own 1. Inject stance instructions into agent system prompts Every participating agent needs a system prompt that instructs it to include a stance marker. The format must be consistent and parseable by your consensus detector. Use a fixed format like `[STANCE: AGREE/DISAGREE/NEUTRAL]` β€” don't let agents choose their own format. 2. Use regex for stance extraction, not LLM parsing Calling an LLM to interpret whether an agent agreed adds cost and latency, and creates a failure mode where the parser LLM misreads the original. A simple regex like `/\[STANCE:\s*(AGREE|DISAGREE|NEUTRAL)\]/i` is deterministic and fast. 3. Compress after every round, not at the end Rolling compression must happen incrementally. Compressing only at the end defeats the purpose β€” the context window is already exhausted. Use your LLM to summarize the current round's responses plus the previous rolling summary into a new ~500-token summary. 4. Set both per-agent and overall timeouts A single slow or stuck agent should not block the entire meeting indefinitely. Per-agent timeouts (e.g. 60s) trigger a NEUTRAL stance for that agent; an overall meeting limit (e.g. 10 minutes) ensures the coordinator eventually terminates regardless. 5. Write minutes to disk, not just to memory The output of a multi-agent meeting is only as useful as its record. Write a structured Markdown file with: topic, participants, timestamps, per-round transcripts with parsed stances, rolling summaries, and the final consensus result. This becomes the audit trail for the decision. ## Frequently Asked Questions Does this work with any OpenClaw agents I already have configured? Yes. You pass the agent IDs that exist in your OpenClaw workspace. The session manager spawns them through the OpenClaw gateway rather than creating new agent configs. What happens if an agent doesn't include a stance marker? The stance is recorded as `UNKNOWN`. The coordinator treats UNKNOWN the same as NEUTRAL for consensus calculation β€” it won't block a majority consensus if all other agents agree. How does rolling summary compression work? After each round, the summarizer sends the full transcript to an LLM and requests a compressed summary of approximately 500 tokens. This summary becomes the context for the next round, keeping token usage roughly constant regardless of round count. Can I use this without OpenClaw? Not directly β€” the session manager spawns agents via the OpenClaw gateway. However, the core protocol (stance markers, rolling compression, consensus detection) is framework-agnostic and can be reimplemented against any agent runtime. --- ## Smart Scheduler & Deadline Watch β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/smart-scheduler > Design doc for agent-time-awareness (TCS): how time context injection, task lifecycle tracking, and a persistent event log give LLM agents reliable temporal awareness. # Smart Scheduler & Deadline Watch Time Context Service (TCS) β€” a lightweight Python service that gives LLM agents accurate temporal awareness, background task tracking with timeout detection, and a persistent event log that survives context compaction. ← Module Overview ## What TCS solves three related problems LLM agents have with time: 1. Temporal Context Injection Generates a ready-made time context block for system prompts: current timestamp, day of week, relative timezone info, and contact quiet-hours. An agent that starts every session with this context knows exactly when it is. 2. Task Lifecycle Tracker Register background tasks, poll them at configurable intervals, detect timeouts, and mark them complete. Prevents agents from starting duplicate work or forgetting to follow up on async operations. 3. Persistent Event Log A SQLite-backed timeline of agent events. Because it lives outside the LLM context window, it survives context compaction. Agents can query "what happened in the last 2 hours" even after their context was trimmed. ## Why LLMs have no real-time clock. They know the world up to a training cutoff, not the current moment. Without explicit time injection, an agent cannot correctly answer "what day is it?" or reason about deadlines and scheduling. This is easily fixed at session start β€” but requires a dedicated service to do it reliably. Background task tracking is harder. An agent that kicks off a long-running process (a build, a test run, a data pipeline) needs to check back on it. Without external state, the agent either polls every turn β€” creating noise β€” or forgets entirely after context compaction. TCS keeps task state in SQLite outside the context window, so it persists across session resets. The event log addresses the same problem for history: context compaction silently removes earlier events. A SQLite event log outside the context window provides a durable timeline that any session can query, regardless of how many times the context has been trimmed. ## Architecture TCS runs as an MCP server, exposing all three layers as tools that agents call natively via the Model Context Protocol. | Tool | Description | | get_temporal_context | Current time context (text or JSON) for system prompt injection | | start_task | Register a background task for tracking | | poll_task | Check if a task should be polled now (based on configured interval) | | finish_task | Mark a task completed or cancelled | | list_tasks | List tasks, optionally filtered by status | | check_timeouts | Scan all running tasks for timeout | | log_event | Append an event to the persistent timeline | | query_timeline | Query events by time range or type | | search_events | Full-text search across event summaries | | get_stats | Activity statistics | The SQLite database is stored outside the agent's workspace β€” on a path that survives session restarts and context compaction. WAL mode enables concurrent reads alongside the server's writes. Task state and event log share the same database but separate tables. ## Key Design Decisions External process, not in-context state TCS runs as a separate service. State lives in SQLite, not in the agent's context. This is the fundamental design choice: context compaction cannot erase task state or event history because they're stored outside the context window entirely. Smart polling β€” "should I poll now?" not raw timestamps The `poll_task` tool returns a boolean: should the agent check on this task right now? This abstracts the interval logic away from the agent. The agent doesn't need to track last-polled timestamps itself; TCS handles it. MCP as the interface Exposing TCS via MCP means any MCP-compatible agent can use it with no custom integration. The agent treats TCS tools the same as any other tool in its toolbox. This also makes it easy to add new tools without changing the agent configuration. Time context injected at session start, not on every message The temporal context block belongs in the system prompt, not repeated in every user message. Calling `get_temporal_context` once at session start and including the result in the system prompt is sufficient β€” the timestamp is accurate enough for scheduling purposes. ## How to Build Your Own 1. Put temporal state outside the context window The core insight: any state that needs to survive context compaction must live in an external store. SQLite is a good choice β€” lightweight, file-based, no server required. WAL mode allows concurrent reads without blocking writes. 2. Inject a time context block into every session-start system prompt Generate a structured block at session start: current ISO timestamp, day of week, local timezone offset, and any relevant contact quiet-hours. Format it as human-readable prose, not raw JSON β€” the agent's reasoning about time will be more reliable. 3. Design poll_task as a gate, not a data source The agent should call `poll_task` on every turn that might involve a tracked task, but the tool should return "yes, check now" or "not yet" β€” not the task status itself. This prevents the agent from polling an external system too frequently. 4. Event log schema: timestamp + type + summary + optional metadata Keep the event log schema minimal. The summary field should be a short human-readable string that's searchable by FTS. Store structured metadata separately. Never delete events β€” archive or flag them instead. Agents querying "what happened" need the full history. 5. Per-task timeouts, not just global ones Different tasks have different expected durations. Store a timeout value per task at registration time. The `check_timeouts` tool scans all running tasks and flags those that have exceeded their individual timeout β€” not a single global limit. ## Frequently Asked Questions Why does an LLM agent need a separate time service? LLMs have no real-time clock and no persistent memory between context windows. TCS solves both: it provides the current timestamp on demand, and its SQLite event log records what happened even after the context is compacted or the session restarted. What is "smart polling" for tasks? The `poll_task` tool returns whether enough time has elapsed since the last poll, based on the configured interval. This prevents an agent from polling a background job every message turn when it only needs to check every 30 seconds. Does the event log survive an OpenClaw session restart? Yes. Events are stored in a SQLite file outside any session state. As long as that file is present, historical events are queryable from any session. --- ## Web Automation Operator β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/web-automation > Design doc for chrome-devtools-mcp: how the Chrome DevTools Protocol, accessibility tree UIDs, and Puppeteer wait logic make AI-driven browser automation reliable and debuggable. # Web Automation Operator An MCP server that exposes the Chrome DevTools Protocol to AI agents β€” enabling reliable browser automation, network inspection, console debugging, screenshot capture, and performance tracing from within any MCP-compatible client. ← Module Overview ## What `chrome-devtools-mcp` launches as an MCP server and connects to a Chrome browser instance. It provides 26 tools across 6 categories that an AI agent can call to interact with a live browser: clicking elements, filling forms, navigating pages, reading console output, capturing performance traces, and more. Automation uses Puppeteer under the hood to wait for page state after each action. Privacy note This server exposes browser contents to the MCP client. Performance tools may send trace URLs to the Google CrUX API for real-user field data. Usage statistics are collected by default; both can be disabled via server flags. ## Why Browser automation for AI agents has two common failure modes. Selenium and Playwright are designed for deterministic test scripts: you know exactly which element to click and in what order. An AI agent needs to observe, reason, and adapt β€” it doesn't know the DOM structure in advance and must discover it dynamically. Screenshot-only approaches address this by letting the agent "see" the page visually, but screenshots don't give precise element references and can't express intent as tool calls. The Chrome DevTools Protocol (CDP) exposes the same primitives developers use in DevTools: the accessibility tree, JavaScript execution, network traffic, console output, and performance traces. An agent working with CDP can inspect the page structure precisely, identify elements by their accessibility roles, execute arbitrary JavaScript, and read exactly what the browser logged β€” all through a single MCP interface. ## Architecture Three layers collaborate to handle a single agent tool call: Agent β†’ MCP tool call (e.g. click, fill, navigate_page) β†’ chrome-devtools-mcp server β†’ Puppeteer (Chrome management + wait-for-action-result) β†’ Chrome DevTools Protocol (browser control) β†’ Chrome instance 26 tools across 6 categories: ### Input Automation (8 tools) | Tool | Description | | click | Click an element by uid (single or double click) | | drag | Drag one element onto another | | fill | Type text into an input or select an option | | fill_form | Fill multiple form elements in one call | | handle_dialog | Accept or dismiss browser dialogs | | hover | Hover over an element | | press_key | Press a key or key combination | | upload_file | Upload a local file via a file input element | ### Navigation (6 tools) | Tool | Description | | navigate_page | Navigate to a URL | | new_page | Open a new browser tab | | close_page | Close a tab by page ID | | list_pages | List all open tabs | | select_page | Switch focus to a tab by page ID | | wait_for | Wait for a condition before proceeding | ### Debugging (5 tools) | Tool | Description | | take_screenshot | Capture a screenshot of the current page | | take_snapshot | Capture the page accessibility tree snapshot (returns element UIDs) | | evaluate_script | Execute JavaScript in the page context | | get_console_message | Retrieve a specific console message with source-mapped stack trace | | list_console_messages | List all console messages from the current page | ### Network (2), Performance (3), Emulation (2) | Tool | Description | | list_network_requests | List all network requests made by the page | | get_network_request | Get details of a specific request including headers and body | | performance_start_trace | Start a DevTools performance trace | | performance_stop_trace | Stop the trace and return raw data | | performance_analyze_insight | Extract actionable insights from a trace (optionally includes CrUX field data) | | emulate | Emulate a device (mobile, tablet, etc.) | | resize_page | Resize the browser viewport | ## Key Design Decisions Accessibility tree UIDs β€” snapshot first, then act Tools that interact with elements take a `uid` parameter. UIDs come from the accessibility tree returned by `take_snapshot`. The agent always calls `take_snapshot` first to get current UIDs, then passes the target UID to an action tool. This avoids brittle CSS selectors and XPath expressions that break when the DOM changes. Puppeteer wait-for-action-result β€” no polling loops After every interaction (click, fill, navigate), Puppeteer waits for the page to settle before returning control to the agent. The agent doesn't need to explicitly poll for page readiness. This eliminates a common class of timing bugs where the agent acts on a page before JavaScript has finished updating it. CDP over Selenium/Playwright CDP gives lower-level access than Playwright. The agent can read console messages with source-mapped stack traces, intercept network requests, execute arbitrary JavaScript, and record DevTools-level performance traces β€” none of which are easily accessible through Playwright's abstraction layer. Managed Chrome vs. connecting to an existing instance By default the server launches its own Chrome with a dedicated profile. For cases where the agent needs to maintain session state (logged-in accounts) or work alongside manual testing, it can connect to an existing Chrome instance running with remote debugging enabled. ## How to Build Your Own 1. The snapshot-then-act pattern is fundamental Every interaction sequence starts with a fresh snapshot. UIDs are page-state-specific; a UID from a previous snapshot may be stale after navigation or a DOM mutation. Always snapshot before acting, especially after any navigation or form submission. 2. Implement wait-for-action-result for every interaction Return from a tool call only after the page has settled β€” not immediately after the DOM event fires. Puppeteer's `waitForNavigation` and `waitForSelector` are the right primitives. Without this, the agent will act on a page mid-transition and get inconsistent results. 3. Use accessibility tree for elements, screenshots for visual verification The accessibility tree gives precise element references (role, name, state, uid). Screenshots give visual context β€” useful for the agent to verify that a page looks correct. Use both: snapshot for acting, screenshot for confirming the visual result looks right. 4. Isolate user data when handling sensitive sites CDP exposes the full contents of the browser session to the MCP client. If the agent is browsing authenticated pages or handling credentials, use the isolated mode (temporary user data directory cleaned up after the session) to prevent cross-contamination between tasks. 5. Combine lab traces with CrUX field data for performance analysis A single lab trace shows one user's experience. CrUX field data shows percentile distributions across real users. The `performance_analyze_insight` tool combines both when the URL is publicly accessible, giving the agent a fuller picture of actual user experience vs. lab conditions. ## Frequently Asked Questions Does the server need Chrome to be running before it starts? No. By default the server launches its own managed Chrome instance via Puppeteer when a tool that requires a browser is first called. Chrome does not start on server startup β€” only when actually needed. How do tools identify page elements? Tools that interact with elements (click, fill, hover) take a `uid` parameter. UIDs come from the page snapshot returned by `take_snapshot`. The agent calls `take_snapshot`, identifies the target element by uid, then passes that uid to the action tool. What is the CrUX API and when does it send data externally? The Chrome User Experience Report (CrUX) API provides real-user performance field data for public URLs. It is used by `performance_analyze_insight` alongside lab trace data. It can be disabled via a server flag to prevent any URL from being sent to Google's API. --- ## Live Intelligence Feed β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/live-intelligence-feed > Design doc for info-pipeline: how a BaseCollector pattern, unified output schema, keyword scoring, and graceful degradation enable reliable multi-source AI content aggregation. # Live Intelligence Feed A Python pipeline that pulls AI and tech content from 7 platforms, applies keyword filtering and relevance scoring, deduplicates results, and produces a unified JSON/Markdown report for downstream agent consumption. ← Module Overview ## What `info-pipeline` aggregates AI and tech content from 7 heterogeneous platforms into a single scored, deduplicated feed. Each platform has a dedicated collector that normalizes its output to a common schema. A filter/scorer stage applies global keyword matching and relevance scoring (0–100) before the report is written. The output is machine-readable JSON plus a human-readable Markdown report. ## Why Staying current with AI developments requires monitoring many heterogeneous sources simultaneously: GitHub for new repositories, Hacker News and Reddit for community discussion, YouTube for research explanations, Product Hunt for new tools, Twitter/X for early signals, and Chinese platforms for developments that English-language sources miss. Doing this manually across 7 platforms is time-consuming and inconsistent. Existing aggregators (RSS readers, Feedly, etc.) produce human-readable feeds but not machine-readable structured data. An AI research agent needs a scored, deduplicated, normalized feed it can query and reason about β€” not a list of links to read. This pipeline provides that feed in a format designed for agent consumption. ## Architecture The pipeline has four stages: [Collectors β€” run in parallel] β†’ GitHub Trending, Hacker News, Reddit, YouTube, Product Hunt, X/Twitter, Chinese platforms (via MCP) β†’ each normalizes to unified schema [Filter / Scorer] β†’ keyword matching against global keyword list β†’ relevance score 0–100 (keyword density + platform signal) β†’ deduplication by URL, then title similarity [Report Writer] β†’ unified JSON (all items) β†’ Markdown report (top N items) Data sources: | Platform | Language | Notes | | GitHub Trending | EN | Topic-filtered repos by stars/forks in the last N days | | Hacker News | EN | Top Stories, filtered by minimum score | | Reddit | EN | Multiple AI subreddits β€” no API key required | | YouTube | EN | Configured channel playlists via Data API v3 | | Product Hunt | EN | Daily new products via GraphQL API | | X / Twitter | EN | Keyword search β€” requires Basic API tier (low-volume) | | Chinese platforms | ZH | Zhihu, 36kr, Juejin, Sspai, InfoQ, Bilibili via trends-hub MCP | Code structure: | Component | Responsibility | | collectors/base.py | BaseCollector with common fetch, retry, and rate-limit logic | | collectors/*.py | One file per platform β€” each extends BaseCollector | | collectors/__init__.py | ALL_COLLECTORS registry β€” registers all enabled collectors | | filters/scorer.py | Keyword filtering, relevance scoring (0–100), deduplication | | main.py | Entry point β€” orchestrates collector runs and report generation | Every item across all sources shares the same JSON schema: "title": "Article / project title", "url": "https://...", "source": "github", "score": 85, "published_at": "2026-02-18T10:00:00Z", "summary": "Short description or excerpt", "tags": ["llm", "open-source"] ## Key Design Decisions Unified output schema β€” normalize at the source Every collector's job is to normalize its platform's output to the common schema before it leaves the collector. The scorer and report writer deal only with the common schema β€” they have no platform-specific logic. This makes adding a new source a matter of implementing one new class with a single `fetch()` method. Config-driven, not code-driven Keywords, platform parameters, and source selection all live in a YAML config file. Tuning the feed doesn't require code changes. This matters when iterating quickly on which topics to follow or adjusting platform-specific thresholds like minimum HN score. Graceful degradation on missing API keys A collector with no configured API key is skipped with a warning β€” not an error. The pipeline still produces output from the sources that are configured. This means the pipeline can run partially (e.g. GitHub + HN only) without requiring all seven API credentials to be present. MCP for Chinese platforms β€” avoid direct scraping Chinese tech platforms (Zhihu, 36kr, Juejin) have complex anti-bot measures and no official English-language APIs. The `trends-hub` MCP service handles these sources separately; the pipeline calls it as a tool rather than implementing platform-specific scrapers that would need constant maintenance. ## How to Build Your Own 1. Define the unified schema first Before writing any collector, define the output schema all collectors must produce. The minimum useful fields are: title, url, source, score, published_at, summary. Adding a field later requires updating every collector. 2. Implement BaseCollector with retry and rate-limit logic Rate limiting and retry-on-error are needed by every platform collector. Put them in the base class. Each concrete collector only needs to implement how to fetch its platform's data and how to map it to the common schema β€” not how to handle HTTP errors or rate limits. 3. Keep scoring simple β€” keyword density + platform weight A score of 0–100 based on keyword match count (normalized by text length) plus a platform-specific engagement signal (GitHub stars, HN score, Reddit upvotes) is sufficient. Don't add LLM-based scoring β€” it adds latency and cost without proportionate quality gains for most use cases. 4. Deduplicate by URL first, then by title similarity The same story often appears across multiple platforms. URL deduplication catches exact duplicates. Title similarity (simple word overlap is sufficient) catches cases where the same article is linked with slightly different URLs or titles across platforms. 5. Twitter API rate limits are a real constraint The Basic Twitter/X API tier has strict monthly request limits. Keep max_results low (10–15 per run) and run the collector less frequently than others. Build the collector to fail gracefully and not block the entire pipeline run if it hits a rate limit. ## Frequently Asked Questions Can I run it with only some sources enabled? Yes. Sources without configured API keys are skipped with a warning rather than failing the whole run. You can also run specific collectors directly. Reddit and Hacker News require no API keys and work out of the box. How does the relevance score work? The scorer checks how many of the configured global keywords appear in the title and summary. Items that match no keywords are filtered out; matches boost the score up to 100. Original platform engagement metrics (stars, HN score, upvotes) also factor in. What is the trends-hub MCP for Chinese platforms? The Chinese platform collector calls MCP tools from the `trends-hub` service (a separate open-source project) to fetch Zhihu trending, 36kr, Juejin, and others. That service must be running locally for the Chinese platform source to work. --- ## Encrypted State Archive β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/encrypted-state-archive > Design doc for openclaw-backup: how restic content-addressed snapshots, rclone transport, and macOS Keychain credential storage protect the agent workspace with zero secrets on disk. # Encrypted State Archive Automated encrypted, deduplicated backup of the OpenClaw workspace and local projects to Google Drive β€” using restic for encryption and snapshot management, rclone for transport, with the encryption password stored in macOS Keychain. ← Module Overview ## What `openclaw-backup` creates incremental, encrypted, point-in-time snapshots of the OpenClaw workspace and local project directories and stores them on Google Drive. Each backup run adds a new snapshot; unchanged blocks are deduplicated so only changed data is uploaded. A retention policy automatically prunes old snapshots. The encryption password never touches disk β€” it's retrieved from macOS Keychain on each run. ## Why An agent system accumulates state that's difficult to reconstruct: memory files built up over months of sessions, project code in progress, configuration that took time to tune. Disk failure, accidental deletion, or a corrupted workspace without backups means complete loss of that state. Standard cloud sync tools (iCloud, Dropbox) don't encrypt before upload, don't track snapshot history, and don't deduplicate. A user can't restore to "the state of my workspace three days ago" β€” only to the current synced state. restic solves all three problems: client-side encryption (Google Drive only ever sees ciphertext), content-addressed deduplication (fast incrementals), and immutable point-in-time snapshots with a configurable retention policy. ## Architecture Two layers collaborate on each backup run: restic β€” encryption + deduplication + snapshot management restic creates content-addressed, encrypted snapshots. Only changed blocks are stored; unchanged blocks reference existing data. The encryption password is read from macOS Keychain on each run β€” never written to disk. rclone β€” Google Drive transport rclone provides the backend that restic writes to. It handles Google Drive OAuth and the file transfer layer. The OAuth token lives in a config file that is excluded from git. ~/.openclaw/ ──┐ β”œβ”€β”€β–Ά restic (encrypt + dedup) ──▢ rclone ──▢ Google Drive ~/projects/ β”€β”€β”˜ What gets backed up: | Path | Approx Size | Contents | | ~/.openclaw/ | ~2.9 GB | Config, workspace, memory files, logs, media | | ~/projects/ | ~1.3 GB | Source code, experiments | Exclusions (not backed up): - `node_modules/` β€” reinstallable from package.json - `.venv/` β€” recreatable Python environments - `browser/` β€” Chromium binary cache - `.git/objects` β€” already on GitHub Retention policy: | Period | Kept | | Daily | Last 7 days | | Weekly | Last 4 weeks | Older snapshots are automatically pruned at the end of each backup run. ## Key Design Decisions restic over tar/zip β€” content-addressed deduplication restic splits files into variable-size chunks and identifies them by content hash. Only chunks that don't already exist in the repository are uploaded. This makes incremental backups fast β€” a session where only memory files changed uploads only those chunks, not the full 4 GB workspace. macOS Keychain β€” password never on disk The restic encryption password is stored in Keychain under a service/account key pair. The backup script retrieves it at runtime using the `security` CLI. No plaintext password ever appears in a config file, environment variable export, or shell history. Active secret scanning in test.sh The test script actively scans all git-tracked files for patterns that look like secrets (API keys, tokens, passwords). This runs before any backup to prevent accidentally committing credentials alongside the backup infrastructure code. Immutable snapshots β€” never overwrite, always prune Each backup run adds a new snapshot to the repository. Old snapshots are removed only by the retention policy pruning step, not by overwriting. At any point you can restore to any snapshot in the retention window β€” not just the most recent one. ## How to Build Your Own 1. Use restic with any rclone-supported backend The same architecture works with any rclone backend: S3, Backblaze B2, Azure Blob, SFTP. Google Drive is used here because it's free up to 15 GB and requires no credit card. Swap the rclone remote name in the backup script to change backends. 2. Store the encryption password in a system secrets manager On macOS, Keychain is the right choice. On Linux, use `pass` (GPG-backed) or the `RESTIC_PASSWORD` environment variable set by a secrets manager at runtime. Never store the password in a dotfile. 3. Build a comprehensive exclusion list node_modules, Python virtualenvs, browser caches, and compiled output can easily add several gigabytes that are fully reinstallable. Exclude them. A good rule: if a directory can be recreated from committed source (package.json, requirements.txt, etc.), exclude it. 4. Test your restore process before you need it A backup that has never been tested is not a backup. Include a dry-run restore in your testing checklist. Restore to a temporary directory and verify that key files are present and intact. The worst time to discover a broken restore process is during an actual recovery. 5. Fire a system event notification on backup failure Silent backup failures are the most dangerous kind. The backup script should send a notification (via OpenClaw system event, email, or any other channel you'll actually see) when a backup fails. A successful backup should log silently; a failure should be loud. ## Security - No hardcoded secrets β€” all credentials use macOS Keychain or environment variables - restic encrypts all data before upload; Google Drive stores only ciphertext - OAuth tokens in rclone config are excluded from git via .gitignore - test.sh actively scans for secrets in all git-tracked files ## Frequently Asked Questions Where is the restic encryption password stored? In macOS Keychain under service name `openclaw-backup`, account `restic-password`. The setup script generates and stores it automatically. You can also override it with the `RESTIC_PASSWORD` environment variable. Will backups overwrite each other? No. restic creates immutable snapshots. Each backup run adds a new snapshot; old ones are only removed when the retention policy prunes them. You can restore to any historical snapshot within the retention window. Does this work on Linux? The Keychain integration is macOS-specific. On Linux you would need to substitute a different secret store (e.g. `pass`) or use the `RESTIC_PASSWORD` environment variable directly. The restic and rclone commands themselves are cross-platform. --- ## Governance & Security β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/governance-security > Design doc for openclaw-security: how six priority-tiered Python modules β€” spotlighting, audit logging, least-privilege, LLM guard, HMAC comms, and memory ACL β€” harden OpenClaw agents. # Governance & Security A security hardening plugin for OpenClaw agents consisting of six Python modules integrated into the OpenClaw hook system via a JavaScript bridge β€” providing prompt-injection defense, least-privilege enforcement, audit logging, inter-agent message signing, and memory access control. ← Module Overview ## What `openclaw-security` is a six-module security layer that hooks into OpenClaw at four lifecycle points. Each module addresses a specific AI agent threat: external data injecting instructions into the agent, agents executing unauthorized commands, inter-agent messages being forged, and memory entries being corrupted. The modules are independent β€” you can deploy them individually or together. ## Why A raw OpenClaw agent with shell access and no additional controls is a significant attack surface. An agent that fetches web content is exposed to prompt injection: a malicious page can embed instructions that the agent treats as coming from its operator. An agent with unrestricted shell access can be directed to exfiltrate data, delete files, or make external network calls. Inter-agent messages have no authenticity guarantee β€” a compromised agent can forge messages appearing to come from trusted peers. This module applies defenses from AI agent security research: spotlighting to isolate external data, command whitelisting to enforce least privilege, HMAC signing for inter-agent trust, and audit logging for forensics. The design is pragmatic β€” defenses are layered by priority rather than trying to solve every threat at once. ## Architecture Six modules organized by implementation priority: | Priority | Module | What it does | | P0 | Spotlighting | Wraps external data (web, email, webhooks) with boundary tags to isolate it from system instructions | | P0 | Audit Logger | JSONL tool-call log with file locking, redaction, and built-in alert rules | | P1 | Least Privilege | Per-agent command blacklist and file path whitelist β€” hard-blocks unauthorized exec calls | | P1 | LLM Guard | Regex/llm-guard scanning of LLM inputs and outputs for injection, secrets, and malicious URLs (warn-only) | | P1 | Agent Comms | HMAC-SHA256 message signing for inter-agent communication β€” prevents message forgery | | P2 | Memory ACL | Memory layer write-time injection detection and signed storage | OpenClaw hook integration β€” Python modules are called from the JS hook system via a bridge layer: | Hook | Python Module | Effect | | before_tool_call | permission_checker.py | Block unauthorized exec calls (hard block) | | after_tool_call | audit_logger.py | Write JSONL audit record + evaluate alert rules | | llm_input | llm_guard_wrapper.py | Scan for injection / secrets (warn only) | | llm_output | llm_guard_wrapper.py | Scan for secrets / malicious URLs (warn only) | Fail-open design All bridge calls fail open: if Python is missing, crashes, or times out, the call is allowed and a warning is logged. This avoids false-positive blockages but means security is best-effort, not guaranteed. ## Key Design Decisions Priority tiers β€” not all defenses are equal P0 modules (spotlighting, audit logging) have the best effort-to-value ratio: they're cheap to implement and provide immediate observability and basic injection resistance. P1 modules require more configuration. P2 is nice-to-have. Deploying only P0 already meaningfully improves the security posture. Fail-open β€” availability over security If the Python security layer crashes, the agent call proceeds. This is an explicit tradeoff: a broken security layer that blocks legitimate agent operations is worse than a temporarily absent security layer. The audit log captures what happened, enabling post-hoc forensics even when active blocking fails. LLM Guard warns, doesn't block at llm_input/llm_output The current OpenClaw hook API doesn't support hard blocking at llm_input/llm_output hooks. LLM Guard detections are logged as warnings. To enforce blocking on detected injection, handle it at the agent application layer β€” LLM Guard provides the signal, the application decides the action. JavaScript bridge β€” thin layer, Python for logic The OpenClaw hook system is JavaScript; the security logic is Python. The bridge is intentionally thin β€” just serialization, subprocess invocation, and error handling. All security logic stays in the Python modules, which can be tested independently of OpenClaw. ## How to Build Your Own 1. Start with spotlighting and audit logging (P0) Spotlighting is cheap: wrap any externally-fetched content with boundary markers before it enters the agent's context. An LLM that sees `[EXTERNAL_DATA_START] ... [EXTERNAL_DATA_END]` has a structural signal that this content is untrusted. Audit logging is also cheap: write a JSONL record after every tool call with agent ID, tool name, inputs (redacted), and timestamp. 2. Command blacklist with regex, not exact match Exact-match command blocking is trivial to bypass (add a space, use a path prefix). Use regex patterns that match the dangerous operation regardless of minor variations. Essential patterns: `rm\s+-rf\s+/`, `curl\s+.*(?!localhost)`, `wget\s+`, `mkfs\b`. 3. Build alert rules into the audit logger, not the security modules Alert rules (fire on sensitive file access, fire on external network calls) belong in the audit logger, not scattered across individual modules. The audit logger sees all tool calls and can apply cross-cutting rules in one place. Three minimum rules: sensitive path access, external network calls, cross-agent sensitive content. 4. Use HMAC-SHA256 for inter-agent signing, symmetric keys per pair For single-machine multi-agent deployments, HMAC with a per-pair symmetric key is sufficient and simple. Each agent pair (sender, receiver) shares a secret key stored in the key manager. The receiver verifies the HMAC before processing any inter-agent message. This prevents a compromised agent from impersonating a trusted peer. 5. Keep each module standalone and independently testable Each security module should be importable and testable without OpenClaw running. This enables unit testing of security logic without the full agent stack. The OpenClaw integration is only a thin bridge β€” the modules themselves are pure Python with no OpenClaw dependency. ## Known Limitations - LLM Guard warns only: The llm_input / llm_output hooks don't support hard blocking in OpenClaw's current hook API. Detected threats generate warnings; blocking requires handling at the application layer. - before_tool_call only checks exec: The JS bridge currently calls the permission checker only for the exec tool. Other tools (shell, write_file) bypass PermissionChecker unless you extend the check list in the bridge. - Regex rules can be bypassed: LLM Guard's regex fallback lacks semantic understanding. Complex prompt injection variants may evade detection. For production, pair regex with model-level scanning. ## Frequently Asked Questions What is "spotlighting" and why does it matter? Spotlighting wraps externally-fetched data (web results, emails, webhooks) with structural tags that mark it as untrusted external content. This makes it harder for a malicious web page to embed instructions that the agent would treat as coming from the user or system prompt β€” a common prompt injection vector. What alert rules are built into the audit logger? Three built-in rules: (1) sensitive file path access (.ssh, .aws, id_rsa, /etc/passwd), (2) shell commands containing curl/wget/nc targeting non-localhost, and (3) cross-agent messages containing sensitive keywords. All trigger with "critical" or "warning" severity and are written to the JSONL audit log. Can I run the tests without OpenClaw installed? Yes. The Python modules are standalone and their tests run independently. The OpenClaw plugin integration is only needed for live hook testing. --- ## Executive Voice Interface β€” Claw-Stack Modules URL: https://claw-stack.com/en/docs/modules/executive-voice-interface > Design doc for voice-call: how local MLX-Whisper STT, self-hosted LiveKit WebRTC, Edge-TTS, and OpenClaw OAuth combine to create a fully local voice interface for AI agents. # Executive Voice Interface A voice conversation interface for OpenClaw β€” talk to your AI assistant over WebRTC from a browser or phone. Fully local speech-to-text (MLX-Whisper on Apple Silicon), free TTS (Edge-TTS), Claude via OpenClaw OAuth, and self-hosted LiveKit for audio routing. ← Module Overview ## What `voice-call` provides a voice conversation loop: a browser or phone connects via WebRTC, speech is detected by VAD, transcribed locally by Whisper, sent to Claude (via OpenClaw OAuth), and the response is synthesized by Edge-TTS and streamed back. Claude can call agent tools during the conversation β€” reading files, running commands, searching memory, and listing active sessions. The entire STT pipeline runs on-device; no audio leaves the local machine. ## Why Text interfaces require a screen and keyboard. Voice enables hands-free interaction while mobile, cooking, commuting, or in situations where typing is impractical. The obvious solution β€” OpenAI Realtime API, Gemini Live β€” sends audio to cloud servers, raising privacy concerns for anyone who discusses confidential projects, personal information, or unreleased work with their AI assistant. This module keeps the STT pipeline entirely local. MLX-Whisper large-v3-mlx-4bit runs on Apple Silicon faster than real-time β€” a 10-second utterance transcribes in under a second. Only the transcribed text (and any tool results) travels to the Anthropic API. Spoken words never leave the machine. ## Architecture Four layers connected through LiveKit: 1. Audio Transport β€” LiveKit + WebRTC LiveKit Server handles WebRTC audio routing. The browser or phone connects via WSS to the token server, which proxies to LiveKit. Tailscale provides a trusted TLS certificate so iOS Safari accepts the WebSocket connection from any network. 2. Speech-to-Text β€” MLX-Whisper (local) Silero VAD detects speech in the audio stream. When speech ends, the STT module transcribes with Whisper large-v3-mlx-4bit running locally on Apple Silicon. No audio is sent to an external STT service. 3. LLM β€” Claude via OpenClaw OAuth The LLM adapter loads the OAuth token from OpenClaw's credential store β€” no separate Anthropic API key needed. Claude can call agent tools during the conversation: file read, command execution, memory search, and session list. 4. Text-to-Speech β€” Edge-TTS (free) Microsoft Edge-TTS synthesizes responses β€” free, no API key required β€” and streams audio back to the caller through LiveKit. Source files: | File | Responsibility | | agent.py | Main voice agent entry point β€” LiveKit agent loop | | stt_mlx.py | Custom STT plugin using MLX-Whisper large-v3-mlx-4bit | | llm_anthropic.py | LLM adapter β€” Claude via OpenClaw OAuth, with tool calling | | tts_edge.py | TTS via Microsoft Edge-TTS (free) | | tools.py | Agent tools: read_file, run_command, search_memory, list_sessions, think_carefully | | token_server.py | HTTPS server + WSS proxy to LiveKit + JWT token generation | | gateway.py | OpenClaw Gateway WebSocket client (device identity auth) | | web/index.html | Browser call UI | Agent tools available during voice conversations: | Tool | What it does | | read_file | Read a file from the local filesystem (truncated to max_lines) | | run_command | Execute a shell command with a configurable timeout | | search_memory | Search OpenClaw memory via the qmd CLI | | list_sessions | List active OpenClaw sessions via Gateway WebSocket | ## Key Design Decisions Local STT β€” audio never leaves the machine MLX-Whisper runs on Apple Silicon faster than real-time. The privacy guarantee is architectural: the audio pipeline is entirely local, so there is no API call that could leak spoken content. Only the transcribed text travels to the Anthropic API β€” and even then, only if the operator is comfortable with that. LiveKit for WebRTC β€” don't implement WebRTC directly WebRTC signaling, ICE negotiation, and codec handling are complex. LiveKit provides a production-grade abstraction that handles multi-device routing, reconnection, and audio quality management. The alternative (raw WebRTC) would require maintaining significantly more infrastructure code. Edge-TTS β€” free, no key, good quality Microsoft's Edge TTS endpoint is free and produces natural-sounding speech without requiring an API key or billing account. The tradeoff: it requires an outbound connection to Microsoft's servers for each response. For fully air-gapped use, substitute an on-device TTS (e.g. Kokoro, or system TTS). Tailscale for remote access β€” trusted TLS without a public domain iOS Safari requires valid TLS for WebSocket connections. Getting a Let's Encrypt certificate for a home server normally requires a public domain. Tailscale Serve proxies the token server behind the Tailscale HTTPS endpoint β€” which has a valid Let's Encrypt certificate issued to the Tailscale DNS name β€” without exposing anything to the public internet. JWT call links β€” single-use tokens per call Each call link is a short-lived signed JWT that authorizes one participant to join one LiveKit room. There is no persistent login session. Links can be generated on demand and expire automatically, making it easy to share a call link with a phone without creating a persistent credential. ## How to Build Your Own 1. Use VAD before STT β€” don't transcribe continuous audio Voice Activity Detection (Silero VAD or WebRTC VAD) detects when someone is speaking and when they've finished. Only the speech segment gets passed to Whisper. Without VAD, you'd either transcribe silence (waste) or need the user to press a button to speak (awkward). 2. MLX-Whisper on Apple Silicon, faster-whisper on CUDA, API elsewhere MLX-Whisper is Apple Silicon-specific. On CUDA hardware, faster-whisper achieves similar throughput. For cloud deployment or hardware without a GPU, use an API-based STT service and accept that audio will leave the device. Make the STT layer swappable β€” the rest of the architecture is independent of the STT choice. 3. Token server pattern β€” keep LiveKit internal The token server is a small HTTPS server that generates JWT tokens for LiveKit room access and proxies WebSocket connections. It's the only service exposed externally (via Tailscale). LiveKit itself runs on localhost β€” it doesn't need to be reachable from the browser directly. 4. iOS Safari requires valid TLS β€” plan for this Safari on iOS refuses WebSocket connections to servers with self-signed certificates, even if the user manually accepts the cert. Tailscale Serve solves this cleanly. Without Tailscale, you need a public domain and Let's Encrypt, or a reverse proxy with a valid cert on a cloud host. 5. Inject MEMORY.md at session start for context continuity Voice conversations are stateless by default β€” each call starts fresh. To maintain continuity with ongoing projects, inject the agent's MEMORY.md at the start of each session's system prompt. The agent will have immediate context about active projects without needing to be briefed verbally at the start of every call. ## Frequently Asked Questions Is speech transcription done locally? Yes. MLX-Whisper runs entirely on your Apple Silicon Mac β€” audio is never sent to an external STT service. Only the transcribed text (and any tool results) travels to the Anthropic API. Do I need a paid Anthropic API key? No. The LLM adapter authenticates using the OAuth token that OpenClaw manages. As long as you have an active OpenClaw session with a valid Anthropic OAuth credential, no separate API key is needed. Does this work outside my home network? Yes, via Tailscale. The start script configures Tailscale Serve to expose the token server on your Tailscale DNS name with a Let's Encrypt certificate. Any device in your Tailnet can then open a call link from any network. Can I run this on Intel Mac or Linux? MLX-Whisper requires Apple Silicon. On Intel Mac or Linux you would need to swap the STT module for a different provider β€” faster-whisper (CUDA), a system-level STT, or an API-based service. The rest of the architecture (LiveKit, Edge-TTS, the token server) is cross-platform. --- ## AI Dev Workforce β€” Claw-Stack Plugin URL: https://claw-stack.com/en/plugins/agent-swarm > Turn one developer into an entire AI development team. Morning Scan, task registry, AI PR review, and intelligent failure retry β€” all orchestrated autonomously. ### Morning Scan Module 1 Runs daily at 09:00 EST via cron. Scans GitHub Issues for new actionable items, filters already-handled tasks, and automatically dispatches them to the coding agent β€” all before you've had your first coffee. GitHub Issues Cron Auto-dispatch --- ## Voice Control β€” OpenClaw URL: https://claw-stack.com/en/plugins/voice-control > Powered by Voice Control Plugin Powered by Voice Control Plugin # Executive Voice Interface Command your AI. Hands-free. Command your digital workforce while driving or walking. Turn voice memos into executed tasks and complex deployments without touching a keyboard. Your best ideas happen when your hands are full. Don't lose them to a locked screen. Speak it β€” your AI executes it. Listening... πŸŽ™οΈ Transcribing... ⚑ Executing... βœ“ Fetching quarterly project metrics from data store... βœ“ Generating executive summary (247 rows)... βœ“ Email sent to sarah@company.com βœ… Done in 4.2s Private β€” audio never leaves your device Commands β€” any task your agent can do, by voice ## How It Works ### 100% Local Processing MLX-Whisper runs on your Apple Silicon. No cloud, no API fees, no audio sent anywhere. Your conversations stay private. ### Works From Anywhere Call your AI from your iPhone via Tailscale. Secure HTTPS, no VPN setup, works globally. ### Full Agent Control by Voice Run shell commands, search memory, check task status, steer agents β€” anything you would type, just say it. ## Built for Real Life ### Commute Brief your AI on today's priorities while driving to the office. ### On the Go Capture decisions instantly β€” AI executes while you walk. ### After Calls Summarize a client call and draft a follow-up email by voice. ### Late Night Check agent status and redirect tasks without opening a laptop. πŸ“– Technical Documentation β†’ --- ## Blog β€” Claw-Stack URL: https://claw-stack.com/en/blog > Writing on AI agent architecture, multi-agent systems, memory design, and lessons from building Claw-Stack as a personal research project. Read β†’ ---