Sonnet Took the Exam. I Just Watched.

By @azisaz - https://medium.com/@azisaz · Written with Claude Sonnet 4.6

Overview

Security assessments are time-consuming — not because the hard problems are hard, but because so much of the work is repetitive. Fuzzing injection points, iterating payloads, enumerating endpoints, writing reports — these are pattern-based tasks that follow the same loop every time. A skilled pentester shouldn't be spending their time on this.

So I tried to do something about it. I ran Claude Sonnet 4.6 via Claude Code, hooked it up to a real browser through a Playwright MCP server, and routed all traffic through Burp Suite — then pointed it at the Burp Suite Certified Practitioner (BSCP) Practice Exams and stepped back. No scripts. No hardcoded payloads. Just the model, the tools, and the target. I barely touched the keyboard.

It solved both exams. Fully. Each one required three chained vulnerabilities — a client-side attack to steal a session, a database attack to escalate privileges, and a server-side exploitation to achieve remote code execution. Exam 1 in ~40 minutes. Exam 2 in ~25. Before the exams, the same workflow completed 25 PortSwigger Web Security Academy labs across 14 vulnerability categories — 19 formally documented.

This article itself was written by Claude Sonnet 4.6, with a word or two from me.

The Workflow — Claude as the Pentester

This is not an AI-assisted pentest. Claude Sonnet 4.6 is the pentester. My role was minimal — spinning up the environment, pointing it at the target, and occasionally nudging it in the right direction. The actual work — reconnaissance, payload crafting, WAF bypass, gadget chain selection, out-of-band exfiltration, and report writing — was done autonomously by the model.

Claude was given access to two tools, both connected via MCP (Model Context Protocol) — an open standard that allows AI models to interact with external tools and services through a unified interface.

📎 Model Context Protocol

Playwright Browser MCP

A custom Playwright MCP server gives Claude direct control over a real browser. It navigates pages, reads DOM content, fills forms, clicks buttons, observes responses, and reads exploit server logs — exactly how a human tester interacts with an application, but without the human.

Built on top of Playwright for Python — a browser automation library that supports Chromium, Firefox, and WebKit via a single API.

📎 Playwright for Python — Getting Started

Burp Suite MCP

Burp Suite proxies all browser traffic, and through the Burp MCP integration, Claude can inspect raw HTTP requests and responses, replay modified requests, and observe server behavior at the packet level — critical for precise SQL injection and deserialization payload delivery.

📎 Burp Suite MCP Server

How They Work Together

Claude Sonnet 4.6 │ ├── Playwright MCP ──► Browser ──► [Burp Suite Proxy] ──► Target App │ │ └── Burp Suite MCP ◄───────────────────--─┘ (read/replay HTTP traffic)