How Claude Scope Analyzes Your UI

When you stop a recording, Claude Scope does more than save a video. It runs a multi-stage pipeline that extracts meaningful UI states, inspects your app’s accessibility structure, and merges everything into a single, structured system prompt. Understanding this pipeline helps you get better prompts and debug unexpected results.

The pipeline

Recording

The browser’s MediaRecorder API captures your selected tab as a WebM video stream (VP9 codec when available, with a fallback to baseline WebM). Recording is chunked every second and assembled into a single blob when you stop. The default auto-stop limit is 30 seconds, which is enough to capture most UI interaction sequences without producing a video that is expensive to process.The video is uploaded to Claude Scope for processing. The raw video file is used only for frame extraction and discarded immediately afterward — it is never stored permanently.

Frame extraction

Rather than analyzing every video frame, Claude Scope uses SSIM-based frame differencing to extract only the frames where the UI meaningfully changed. SSIM (Structural Similarity Index Measure) compares the structural content of consecutive frames and discards frames that are too similar to the previous one.This keeps the number of frames small, which reduces Vision API cost and produces a cleaner, more readable timeline in the output prompt.

Vision lane

Each extracted frame is sent to Anthropic’s Vision AI (the model configured in your Model Access settings). For every frame, the Vision lane identifies:

Buttons — interactive controls and their labels
Inputs — text fields, checkboxes, selects, and their current state
Headings — structural landmarks and page hierarchy
Links — navigation targets and their text
Other elements — any additional components the model identifies

The results are assembled into a visual timeline that describes how your UI changed across the recording.

The Vision lane requires a valid Anthropic API key with access to a vision-capable model. If the key is missing or all frames fail analysis, processing stops with an error. Configure your key in Model Access settings.

Playwright lane

Simultaneously with the Vision lane, a Playwright headless browser loads your seed URL and captures a full ARIA accessibility snapshot. This snapshot includes:

Every interactive element by ARIA role (button, textbox, link, checkbox, etc.)
Accessible names and labels
Counts of each element type on the page
The full accessible name tree for the loaded DOM

Unlike the Vision lane, which works from pixels, the Playwright lane works from the actual DOM. This gives Claude Scope ground truth about what elements exist, how they are named for assistive technology, and how they are structured — independent of how they look.

Synthesis

Once both lanes complete, the Synthesis stage merges the visual timeline and the Playwright accessibility snapshot into a single structured system prompt. The format of this prompt depends on the agent target you selected:

Claude Code — Full system prompt with an inline ARIA tree, screenshot bundle references, and a visual state changelog
Codex — Compact, diff-focused prompt optimized for GPT-4o completions
Cursor — Formatted for the Cursor composer’s context window
Raw — Unformatted merged output for use in any other tool

If you want a prompt for a different agent than you originally selected, you can change the target before copying. The analysis does not re-run — only the formatting changes.

Output

The final prompt is stored alongside the session and displayed in the recording review view. It includes:

A visual timeline summarizing UI changes frame-by-frame
An inline ARIA tree from the Playwright snapshot
A raw DOM diff comparing element counts between frames
An optional screenshot bundle (base64-encoded frame thumbnails)

Copy the prompt and paste it into your AI coding agent to start debugging.

Vision lane vs. Playwright lane

The two analysis lanes are complementary, not redundant. Each contributes something the other cannot.

	Vision lane	Playwright lane
Data source	Pixel-level frames from your recording	Live DOM loaded in a headless browser
What it captures	Visual appearance, UI states over time, element labels as rendered	ARIA roles, accessible names, structural element counts
Temporal coverage	Every extracted frame across the full recording	Single snapshot of the seed URL at inspection time
Handles animations/transitions	Yes — captures intermediate states	No — snapshot is taken after page load
Requires API key	Yes (Anthropic)	No
Handles SPAs / dynamic content	Yes, if the recording covers those states	Partially — depends on what renders before snapshot

Both lanes are required for processing. If either lane fails, the pipeline stops and reports an error attributed to the failing lane.

How frames are stored

After synthesis, each extracted frame is saved alongside its Vision analysis results and a diff summary. The diff summary counts elements added and removed relative to the previous frame, which is how the timeline shows you what changed at each step. The original video file is deleted from temporary storage after extraction. Only the extracted frames (as base64 PNG thumbnails) and their analysis metadata are stored.

Get Started

Recording

Configuration

Sessions

Help

How Claude Scope Analyzes Your UI

The pipeline

Vision lane vs. Playwright lane

How frames are stored

Get Started

Recording

Configuration

Sessions

Help

​The pipeline

​Vision lane vs. Playwright lane

​How frames are stored

The pipeline

Vision lane vs. Playwright lane

How frames are stored