Top 6%.
40 of 44 Challenges Solved.
Team Orange deployed the Claw-Stack Trinity at BearcatCTF 2026. Building on deep team expertise across crypto, forensics, and binary exploitation, Trinity accelerated the autonomous solve rate — placing rank #20 of 362 in the online division.
#20
Rank
of 362 teams
3084
Score
online division
40/44
Solved
challenges
Top 6%
Percentile
5.52% of field
Evidence
Final Scoreboard
Team Orange, rank #20 of 362 teams, online division.
Progression
Score Over Time
Sustained scoring velocity across the competition window — no plateau. Trinity maintained momentum as challenge difficulty increased.
Architecture Evolution
Why Trinity?
The architecture evolved from a hard lesson.
Single Agent
One agent handled recon, solving, and logging. Context window filled fast on complex challenges. Slow to pivot mid-competition.
Commander · Librarian · Operator
Fully decoupled. Each agent owns one cognitive responsibility. Context stays lean. Strategy pivots in seconds.
The Framework
Trinity Architecture
Three specialized agents operating as a cognitive unit. Each owns a distinct function — together they close the loop from problem identification to verified exploit.
Commander
StrategyReads challenge descriptions, identifies attack surface, selects tools and approach. Monitors progress and pivots strategy when blocked. The only agent with write access to the solve plan.
Librarian
KnowledgeRetrieves and synthesizes relevant cryptographic papers, CVEs, writeups, and tool documentation in real time. Feeds structured knowledge directly into the Commander's reasoning context.
Operator
ExecutionImplements exploits, runs tooling, parses outputs, and submits flags. Operates within a sandboxed environment with policy-governed tool access. Every shell command audited before execution.
Live Execution Log
TwistedPair — Crypto Challenge
RSA private exponent recovery via Tropical semiring residuation. Solved in 32 minutes, autonomous end-to-end.
What This Proves
Strategic Takeaways
CTF results under competition pressure are the most honest benchmark for agent system design. Here is what BearcatCTF 2026 validated.
Declarative Tooling
Agent behavior defined via SOUL.md and config.json means challenge-specific specialists can be spun up in minutes, not hours. No re-engineering — just new configuration.
Modular Cognitive Architecture
Separating Strategy (Commander), Knowledge (Librarian), and Execution (Operator) eliminates context bloat and lets each agent operate at peak depth within its domain.
Runtime Governance
Every shell command passed through the Policy Engine. Zero policy violations across 40 solved challenges. Autonomous operation without sacrificing auditability or safety.
Explore the Architecture
The same architecture that solved 40 of 44 CTF challenges is open to explore — dig into the modules, adapt the patterns, and build your own.