🐾 claw-stack
· Orange & Qiushi Wu architecture openClaw AI agents

OpenClaw vs LangChain: Why We Don't Use Frameworks

OpenClaw is a thin execution engine. LangChain is a thick framework. Here's why that distinction matters, and why we chose the former.

The first question people ask when they see Claw-Stack is: why not just use LangChain? It’s the dominant Python framework for AI agents, it has a huge ecosystem, and it handles a lot of plumbing you’d otherwise build yourself. The answer has to do with what “framework” means and what we actually needed.

What OpenClaw is (and what it isn’t)

OpenClaw is an npm package. You install it, you configure it, and it runs as a local process that gives Claude access to tools — a file system, a shell, MCP servers, memory. It’s a runtime, not a framework. It doesn’t tell you how to organize your agent logic. It just executes tool calls and manages sessions.

This is a meaningful distinction. OpenClaw has opinions about how tools get called, but it has no opinion about what your agent does. There’s no base class to extend, no chain to compose, no graph to define. You write a CLAUDE.md file that describes how your agent should behave, and OpenClaw runs a Claude session with that context and the tools you’ve registered.

LangChain is the opposite. It has strong opinions about how you should structure your agent logic: chains, runnables, agents, tools, memory, retrievers. It’s a framework in the classical sense — it provides the skeleton, you fill in the details. That’s useful when the skeleton matches your use case. It’s a problem when it doesn’t.

The abstraction problem

LangChain’s abstractions are designed around the idea that you’ll be composing LLM calls in a pipeline: input → retrieval → LLM → output → next LLM call. This works well for RAG systems and simple question-answering agents. It starts to fight you when you need something that doesn’t fit the pipeline model.

Our multi-agent meeting protocol, for example, runs multiple Claude instances as “participants” in a structured discussion. Each participant reads the conversation history, produces a response, and optionally signals consensus or requests another round. The coordinator then decides whether to continue. None of this fits neatly into LangChain’s agent/tool model. You’d end up either cramming the protocol into an agent with custom tools, or bypassing most of LangChain’s abstractions entirely.

With OpenClaw, we just write the coordination logic ourselves. The coordinator agent is an OpenClaw session that reads the current state from a shared file, calls the participant agents as subprocesses, collects their responses, and decides what to do next. It’s written in JavaScript (Node.js ES modules), spread across six source files — coordinator, session manager, summarizer, consensus detector, timeout handler, and minutes writer — and every line is doing something we understand.

Debugging experience

When something goes wrong with a LangChain agent, the error is often several layers deep in the abstraction stack. You’re debugging a runnable that calls a chain that calls an LLM that returns output that gets parsed by an output parser that… somewhere in there something failed. Getting a useful stack trace requires understanding which layer of the abstraction is responsible for which behavior.

With OpenClaw, there are basically two places to look: your tool implementation and the Claude session log. If the agent called the wrong tool, you check the session history. If the tool produced the wrong output, you check the tool. There’s no intermediary layer trying to be helpful.

This matters more than it sounds. We’ve run sessions that last hours, involve dozens of tool calls, and accumulate several hundred KB of context. When something goes wrong in hour two, you want to be able to read the session log and understand exactly what happened. With a thin runtime, the session log is the complete record of what happened. With a thick framework, the framework’s internal state is a parallel source of truth that you also have to inspect.

Lock-in and ecosystem dependencies

LangChain has over 500 integration packages. Many are community-maintained and break on library updates. If you build your agent logic around LangChain’s abstractions, you’re implicitly accepting a dependency on all of them being maintained and compatible.

OpenClaw’s integration model is different: integrations happen through MCP (Model Context Protocol). An MCP server is just a process that exposes tools. Writing an MCP server for a new data source is about 50 lines of code. The interface is standard, the protocol is simple, and when a third-party integration breaks, the fix is isolated to that server — it doesn’t cascade through your agent logic.

This is why we could build our web automation layer (26 Chrome DevTools Protocol tools), our AI and tech content aggregator, and our backup integration without any framework code. Each is a standalone MCP server that registers its tools with OpenClaw at startup.

When LangChain makes sense

This isn’t a blanket argument against LangChain. If you’re building a RAG system where the primary control flow is: retrieve relevant documents, pass them to an LLM, return an answer — LangChain’s abstractions map well to that pattern. The LCEL (LangChain Expression Language) is genuinely clean for composing retrieval pipelines.

It also has strong integrations with vector databases, document loaders, and embedding models that would take time to build from scratch. For teams prototyping quickly or building conventional AI features into existing Python applications, the framework overhead is worth the integration shortcuts.

Our use case is different. We’re building an AI agent system that runs autonomously, accumulates state over days and weeks, coordinates multiple agents on complex tasks, and needs to be debugged when things go wrong. For that, we wanted the most transparent and minimal runtime we could find.

The principle: thin runtime, rich skills

Our architecture follows a principle we’ve started thinking of as “thin runtime, rich skills.” OpenClaw is the runtime: it handles tool dispatch, session management, and the interface to Claude. Everything else — memory, security, multi-agent coordination, browser automation — lives in separate, independently-deployable modules.

This means each skill can be tested in isolation, replaced without touching the others, and reasoned about without understanding the whole system. The downside is that there’s more wiring to write. The upside is that when something breaks, it’s almost always in the wiring — which is the part you wrote and understand.

We’re not evangelizing this approach as universally correct. It’s the right tradeoff for a research project that needs to be debugged, extended, and understood at every level. If you’re shipping a product feature on a deadline, the framework overhead of LangChain might be worth it. For us, it wasn’t.

← Back to Blog Orange & Qiushi Wu