Web AgentMind2Web #1

The #1 web agent on the hardest
public benchmark.

Mind2Web tests 300 tasks across 136 live websites at three difficulty levels with human evaluation — the most rigorous public eval for web agents. TinyFish scores 89.9%, beating OpenAI Operator (61.3%), Claude Computer Use (56.3%), and Browser Use (30.0%).

Performance

Mind2Web benchmark

0.0%

Overall Score

300 tasks, 136 websites

0.0%

Hard Tasks

vs Operator 43.2%

0ms

Page Observation

C++ ABox, zero JS

Stealth Mechanisms

C++ level anti-bot

Mind2Web Benchmark

Difficulty Breakdown

Agent	Easy	Medium	Hard	Drop
★ TinyFish	97.5%	89.9%	81.9%	−15.6pt
Operator	83.1%	58%	43.2%	−39.9pt
Claude CU	90.4%	49%	32.4%	−58pt
BrowserUse	55.4%	26.6%	8.1%	−47.3pt

15.6pt drop from easy to hard — smallest degradation of any agent

Why TinyFish Wins

C++ Observation

Agent.extractABox runs in the rendering engine. 15ms to observe a page vs 200ms for Puppeteer.

Invisible to Sites

Custom CDP domain. No Runtime.evaluate, no Network.enable, no script injection. Nothing for anti-bot to detect.

Learns Over Time

CaR v2 extracts navigation patterns. Holdout A/B measures causal impact. Recipe replay skips LLM calls.

Cost Efficient

20-30% of steps need LLM reasoning. Mechanical actions run in milliseconds via compiled recipes.

Speed & Efficiency

Page Observation Latency

Puppeteer200ms

Runtime.evaluate + DOM

Playwright150ms

JS injection + selectors

Agent.extractABox15ms

Single C++ call

10-13x faster

Cost Architecture

LLM reasoning25%

Complex decisions only

Compiled recipes45%

Zero LLM cost

Mechanical actions30%

Millisecond execution

Effective LLM cost reduction

~75%

vs pure LLM-per-step agents

Architecture

Tool Palette

VisitUrl

ScrollPage

SwitchTab

ClickElement

InputText

SelectOption

HoverElement

DragAndDrop

PressKey

SetRange

FillForm

Screenshot

ContentExtract

ListOptions

InspectImage

InspectPDF

FetchTool

SearchTool

WebSearch

Wait

Reconfigure

■ Navigation■ Action■ Batch■ Observe■ Data■ Utility

Competitive Landscape

Agent	Mind2Web	Browser	Stealth	Learning
★ TinyFish	89.9%	Custom Chromium	28 C++ mechanisms	CaR v2 + recipes
OpenAI Operator	61.3%	Hosted Chrome	Standard	None
Claude CU	56.3%	Screenshot-based	N/A	None
Browser Use	30.0%	Playwright	None	None

Technology Stack

Layer	Technology	Role
Agent Framework	Google ADK v1.21+	Agent lifecycle, session management, tool dispatch
LLM	Google Gemini Flash	Decision-making, tool selection, content understanding
Browser	Custom Chromium 147	C++-level stealth, Agent.* CDP domain, 15ms ABox
Gateway	Python aiohttp	Session management, proxy pool, site model recording
Web Framework	FastAPI + uvicorn	SSE streaming, REST endpoints
Learning	CaR v2 (local JSON)	Pattern extraction, holdout A/B, recipe replay
Tracing	LangSmith + OpenTelemetry	Execution tracing, evaluation
Storage	SQLite (async)	Step snapshots, session history

Active Project

In Progress6 Stages~8,600 lines

AgentBrowser Migration

Incrementally migrating ~8,600 lines of new capabilities into production. Covers AgentBrowser driver integration, 9 new tools, CaR v2 learning system, prompt rewrite, and recipe replay — each stage independently testable and reversible.

View Migration Plan →

The #1 web agent on the hardestpublic benchmark.