···11+---
22+name: agent-browser
33+description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
44+---
55+66+# Browser Automation with agent-browser
77+88+## Quick start
99+1010+```bash
1111+agent-browser open <url> # Navigate to page
1212+agent-browser snapshot -i # Get interactive elements with refs
1313+agent-browser click @e1 # Click element by ref
1414+agent-browser fill @e2 "text" # Fill input by ref
1515+agent-browser close # Close browser
1616+```
1717+1818+## Core workflow
1919+2020+1. Navigate: `agent-browser open <url>`
2121+2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
2222+3. Interact using refs from the snapshot
2323+4. Re-snapshot after navigation or significant DOM changes
2424+2525+## Commands
2626+2727+### Navigation
2828+```bash
2929+agent-browser open <url> # Navigate to URL
3030+agent-browser back # Go back
3131+agent-browser forward # Go forward
3232+agent-browser reload # Reload page
3333+agent-browser close # Close browser
3434+```
3535+3636+### Snapshot (page analysis)
3737+```bash
3838+agent-browser snapshot # Full accessibility tree
3939+agent-browser snapshot -i # Interactive elements only (recommended)
4040+agent-browser snapshot -c # Compact output
4141+agent-browser snapshot -d 3 # Limit depth to 3
4242+```
4343+4444+### Interactions (use @refs from snapshot)
4545+```bash
4646+agent-browser click @e1 # Click
4747+agent-browser dblclick @e1 # Double-click
4848+agent-browser fill @e2 "text" # Clear and type
4949+agent-browser type @e2 "text" # Type without clearing
5050+agent-browser press Enter # Press key
5151+agent-browser press Control+a # Key combination
5252+agent-browser hover @e1 # Hover
5353+agent-browser check @e1 # Check checkbox
5454+agent-browser uncheck @e1 # Uncheck checkbox
5555+agent-browser select @e1 "value" # Select dropdown
5656+agent-browser scroll down 500 # Scroll page
5757+agent-browser scrollintoview @e1 # Scroll element into view
5858+```
5959+6060+### Get information
6161+```bash
6262+agent-browser get text @e1 # Get element text
6363+agent-browser get value @e1 # Get input value
6464+agent-browser get title # Get page title
6565+agent-browser get url # Get current URL
6666+```
6767+6868+### Screenshots
6969+```bash
7070+agent-browser screenshot # Screenshot to stdout
7171+agent-browser screenshot path.png # Save to file
7272+agent-browser screenshot --full # Full page
7373+```
7474+7575+### Wait
7676+```bash
7777+agent-browser wait @e1 # Wait for element
7878+agent-browser wait 2000 # Wait milliseconds
7979+agent-browser wait --text "Success" # Wait for text
8080+agent-browser wait --load networkidle # Wait for network idle
8181+```
8282+8383+### Semantic locators (alternative to refs)
8484+```bash
8585+agent-browser find role button click --name "Submit"
8686+agent-browser find text "Sign In" click
8787+agent-browser find label "Email" fill "user@test.com"
8888+```
8989+9090+## Example: Form submission
9191+9292+```bash
9393+agent-browser open https://example.com/form
9494+agent-browser snapshot -i
9595+# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
9696+9797+agent-browser fill @e1 "user@example.com"
9898+agent-browser fill @e2 "password123"
9999+agent-browser click @e3
100100+agent-browser wait --load networkidle
101101+agent-browser snapshot -i # Check result
102102+```
103103+104104+## Example: Authentication with saved state
105105+106106+```bash
107107+# Login once
108108+agent-browser open https://app.example.com/login
109109+agent-browser snapshot -i
110110+agent-browser fill @e1 "username"
111111+agent-browser fill @e2 "password"
112112+agent-browser click @e3
113113+agent-browser wait --url "**/dashboard"
114114+agent-browser state save auth.json
115115+116116+# Later sessions: load saved state
117117+agent-browser state load auth.json
118118+agent-browser open https://app.example.com/dashboard
119119+```
120120+121121+## Sessions (parallel browsers)
122122+123123+```bash
124124+agent-browser --session test1 open site-a.com
125125+agent-browser --session test2 open site-b.com
126126+agent-browser session list
127127+```
128128+129129+## JSON output (for parsing)
130130+131131+Add `--json` for machine-readable output:
132132+```bash
133133+agent-browser snapshot -i --json
134134+agent-browser get text @e1 --json
135135+```
136136+137137+## Debugging
138138+139139+```bash
140140+agent-browser open example.com --headed # Show browser window
141141+agent-browser console # View console messages
142142+agent-browser errors # View page errors
143143+```