Live Case Study · BearcatCTF 2026

Top 6%.
40 of 44 Challenges Solved.

Team Orange deployed the Claw-Stack Trinity at BearcatCTF 2026. Building on deep team expertise across crypto, forensics, and binary exploitation, Trinity accelerated the autonomous solve rate — placing rank #20 of 362 in the online division.

#20

Rank

of 362 teams

3084

Score

online division

40/44

Solved

challenges

Top 6%

Percentile

5.52% of field

Evidence

Final Scoreboard

Team Orange, rank #20 of 362 teams, online division.

bearcatctf.com/scoreboard

BearcatCTF 2026 Scoreboard — Team Orange rank #20

Progression

Score Over Time

Sustained scoring velocity across the competition window — no plateau. Trinity maintained momentum as challenge difficulty increased.

score_over_time.png

BearcatCTF 2026 Score Over Time — Team Orange

Architecture Evolution

Why Trinity?

The architecture evolved from a hard lesson.

Single Agent

One agent handled recon, solving, and logging. Context window filled fast on complex challenges. Slow to pivot mid-competition.

Result: Context overflow on hard challenges

v2 ✅ Trinity

Commander · Librarian · Operator

Fully decoupled. Each agent owns one cognitive responsibility. Context stays lean. Strategy pivots in seconds.

Result: Top 6% globally, 40/44 solved

The Framework

Trinity Architecture

Three specialized agents operating as a cognitive unit. Each owns a distinct function — together they close the loop from problem identification to verified exploit.

Strategy Layer

⚔️ Commander

No direct tool access

dispatches

Knowledge Layer

📚 Librarian

Top-3 results only

CTF Knowledge Base · Web Search

Execution Layer

⚙️ Operator

Sandboxed scope

Docker · Scripts · Blackboard

returns results

Commander synthesizes results → updates solve plan

Commander

Strategy

Reads challenge descriptions, identifies attack surface, selects tools and approach. Monitors progress and pivots strategy when blocked. The only agent with write access to the solve plan.

→ Decomposes multi-step challenges

→ Hypothesis generation & triage

→ Cross-challenge pattern recognition

Librarian

Knowledge

Retrieves and synthesizes relevant cryptographic papers, CVEs, writeups, and tool documentation in real time. Feeds structured knowledge directly into the Commander's reasoning context.

→ Tropical semiring & lattice theory refs

→ CVE & exploit database lookup

→ CTF writeup corpus synthesis

Operator

Execution

Implements exploits, runs tooling, parses outputs, and submits flags. Operates within a sandboxed environment with policy-governed tool access. Every shell command audited before execution.

→ Python/pwntools exploit scripting

→ Binary & crypto tool execution

→ Sandboxed, audited, policy-governed

Live Execution Log

TwistedPair — Crypto Challenge

RSA private exponent recovery via Tropical semiring residuation. Solved in 32 minutes, autonomous end-to-end.

trinity — TwistedPair

00:00 Challenge received. Category: Crypto. File: twisted_pair.py, output.txt

00:02 [COMMANDER] Analyzing cipher structure — non-standard RSA with Tropical semiring operations detected.

00:05 [LIBRARIAN] Retrieved: Tropical semiring residuation theory, lattice-based cryptanalysis papers, RSA exponent leak methods.

00:08 [COMMANDER] Identified RSA private exponent leak via Tropical semiring residuation. Exploit path confirmed.

00:09 [RUNTIME] Pre-execution audit: exploit script reviewed. No unauthorized egress. Policy compliant. ✓

00:11 [OPERATOR] Executed exploit. Recovered private key. Decrypting ciphertext...

00:32 [OPERATOR] Flag: BCCTF{D0n7_g37_m3_Tw157eD} ✓ Submitted.

00:32 [AUDIT] Execution log sealed. Solve time: 32 min. Agent actions: 14. Policy violations: 0.

What This Proves

Strategic Takeaways

CTF results under competition pressure are the most honest benchmark for agent system design. Here is what BearcatCTF 2026 validated.

Declarative Tooling

Agent behavior defined via SOUL.md and config.json means challenge-specific specialists can be spun up in minutes, not hours. No re-engineering — just new configuration.

Modular Cognitive Architecture

Separating Strategy (Commander), Knowledge (Librarian), and Execution (Operator) eliminates context bloat and lets each agent operate at peak depth within its domain.

Runtime Governance

Every shell command passed through the Policy Engine. Zero policy violations across 40 solved challenges. Autonomous operation without sacrificing auditability or safety.

Explore the Architecture

The same architecture that solved 40 of 44 CTF challenges is open to explore — dig into the modules, adapt the patterns, and build your own.

View Architecture → Explore Modules

Top 6%. 40 of 44 Challenges Solved.