Last updated: March 2026 · Source: openclaw-security
Governance & Security
A security hardening plugin for OpenClaw agents consisting of six Python modules integrated into the OpenClaw hook system via a JavaScript bridge — providing prompt-injection defense, least-privilege enforcement, audit logging, inter-agent message signing, and memory access control.
What
openclaw-security is a six-module security layer that hooks into OpenClaw at four lifecycle points. Each module addresses a specific AI agent threat: external data injecting instructions into the agent, agents executing unauthorized commands, inter-agent messages being forged, and memory entries being corrupted. The modules are independent — you can deploy them individually or together.
Why
A raw OpenClaw agent with shell access and no additional controls is a significant attack surface. An agent that fetches web content is exposed to prompt injection: a malicious page can embed instructions that the agent treats as coming from its operator. An agent with unrestricted shell access can be directed to exfiltrate data, delete files, or make external network calls. Inter-agent messages have no authenticity guarantee — a compromised agent can forge messages appearing to come from trusted peers.
This module applies defenses from AI agent security research: spotlighting to isolate external data, command whitelisting to enforce least privilege, HMAC signing for inter-agent trust, and audit logging for forensics. The design is pragmatic — defenses are layered by priority rather than trying to solve every threat at once.
Architecture
Six modules organized by implementation priority:
| Priority | Module | What it does |
|---|---|---|
| P0 | Spotlighting | Wraps external data (web, email, webhooks) with boundary tags to isolate it from system instructions |
| P0 | Audit Logger | JSONL tool-call log with file locking, redaction, and built-in alert rules |
| P1 | Least Privilege | Per-agent command blacklist and file path whitelist — hard-blocks unauthorized exec calls |
| P1 | LLM Guard | Regex/llm-guard scanning of LLM inputs and outputs for injection, secrets, and malicious URLs (warn-only) |
| P1 | Agent Comms | HMAC-SHA256 message signing for inter-agent communication — prevents message forgery |
| P2 | Memory ACL | Memory layer write-time injection detection and signed storage |
OpenClaw hook integration — Python modules are called from the JS hook system via a bridge layer:
| Hook | Python Module | Effect |
|---|---|---|
| before_tool_call | permission_checker.py | Block unauthorized exec calls (hard block) |
| after_tool_call | audit_logger.py | Write JSONL audit record + evaluate alert rules |
| llm_input | llm_guard_wrapper.py | Scan for injection / secrets (warn only) |
| llm_output | llm_guard_wrapper.py | Scan for secrets / malicious URLs (warn only) |
Fail-open design
All bridge calls fail open: if Python is missing, crashes, or times out, the call is allowed and a warning is logged. This avoids false-positive blockages but means security is best-effort, not guaranteed.
Key Design Decisions
Priority tiers — not all defenses are equal
P0 modules (spotlighting, audit logging) have the best effort-to-value ratio: they're cheap to implement and provide immediate observability and basic injection resistance. P1 modules require more configuration. P2 is nice-to-have. Deploying only P0 already meaningfully improves the security posture.
Fail-open — availability over security
If the Python security layer crashes, the agent call proceeds. This is an explicit tradeoff: a broken security layer that blocks legitimate agent operations is worse than a temporarily absent security layer. The audit log captures what happened, enabling post-hoc forensics even when active blocking fails.
LLM Guard warns, doesn't block at llm_input/llm_output
The current OpenClaw hook API doesn't support hard blocking at llm_input/llm_output hooks. LLM Guard detections are logged as warnings. To enforce blocking on detected injection, handle it at the agent application layer — LLM Guard provides the signal, the application decides the action.
JavaScript bridge — thin layer, Python for logic
The OpenClaw hook system is JavaScript; the security logic is Python. The bridge is intentionally thin — just serialization, subprocess invocation, and error handling. All security logic stays in the Python modules, which can be tested independently of OpenClaw.
How to Build Your Own
1. Start with spotlighting and audit logging (P0)
Spotlighting is cheap: wrap any externally-fetched content with boundary markers before it enters the agent's context. An LLM that sees [EXTERNAL_DATA_START] ... [EXTERNAL_DATA_END] has a structural signal that this content is untrusted. Audit logging is also cheap: write a JSONL record after every tool call with agent ID, tool name, inputs (redacted), and timestamp.
2. Command blacklist with regex, not exact match
Exact-match command blocking is trivial to bypass (add a space, use a path prefix). Use regex patterns that match the dangerous operation regardless of minor variations. Essential patterns: rm\s+-rf\s+/, curl\s+.*(?!localhost), wget\s+, mkfs\b.
3. Build alert rules into the audit logger, not the security modules
Alert rules (fire on sensitive file access, fire on external network calls) belong in the audit logger, not scattered across individual modules. The audit logger sees all tool calls and can apply cross-cutting rules in one place. Three minimum rules: sensitive path access, external network calls, cross-agent sensitive content.
4. Use HMAC-SHA256 for inter-agent signing, symmetric keys per pair
For single-machine multi-agent deployments, HMAC with a per-pair symmetric key is sufficient and simple. Each agent pair (sender, receiver) shares a secret key stored in the key manager. The receiver verifies the HMAC before processing any inter-agent message. This prevents a compromised agent from impersonating a trusted peer.
5. Keep each module standalone and independently testable
Each security module should be importable and testable without OpenClaw running. This enables unit testing of security logic without the full agent stack. The OpenClaw integration is only a thin bridge — the modules themselves are pure Python with no OpenClaw dependency.
Known Limitations
- LLM Guard warns only: The llm_input / llm_output hooks don't support hard blocking in OpenClaw's current hook API. Detected threats generate warnings; blocking requires handling at the application layer.
- before_tool_call only checks exec: The JS bridge currently calls the permission checker only for the exec tool. Other tools (shell, write_file) bypass PermissionChecker unless you extend the check list in the bridge.
- Regex rules can be bypassed: LLM Guard's regex fallback lacks semantic understanding. Complex prompt injection variants may evade detection. For production, pair regex with model-level scanning.
Frequently Asked Questions
What is "spotlighting" and why does it matter?
Spotlighting wraps externally-fetched data (web results, emails, webhooks) with structural tags that mark it as untrusted external content. This makes it harder for a malicious web page to embed instructions that the agent would treat as coming from the user or system prompt — a common prompt injection vector.
What alert rules are built into the audit logger?
Three built-in rules: (1) sensitive file path access (.ssh, .aws, id_rsa, /etc/passwd), (2) shell commands containing curl/wget/nc targeting non-localhost, and (3) cross-agent messages containing sensitive keywords. All trigger with "critical" or "warning" severity and are written to the JSONL audit log.
Can I run the tests without OpenClaw installed?
Yes. The Python modules are standalone and their tests run independently. The OpenClaw plugin integration is only needed for live hook testing.
Authors: Qiushi Wu & Orange 🍊