···1---
2name: agent-browser
3-description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
4---
56# Browser Automation with agent-browser
78-## Quick start
0000000910```bash
11-agent-browser open <url> # Navigate to page
12-agent-browser snapshot -i # Get interactive elements with refs
13-agent-browser click @e1 # Click element by ref
14-agent-browser fill @e2 "text" # Fill input by ref
15-agent-browser close # Close browser
000016```
1718-## Core workflow
1920-1. Navigate: `agent-browser open <url>`
21-2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
22-3. Interact using refs from the snapshot
23-4. Re-snapshot after navigation or significant DOM changes
2425-## Commands
0002627-### Navigation
28-```bash
29-agent-browser open <url> # Navigate to URL
30-agent-browser back # Go back
31-agent-browser forward # Go forward
32-agent-browser reload # Reload page
33-agent-browser close # Close browser
34-```
3536-### Snapshot (page analysis)
37-```bash
38-agent-browser snapshot # Full accessibility tree
39-agent-browser snapshot -i # Interactive elements only (recommended)
40-agent-browser snapshot -c # Compact output
41-agent-browser snapshot -d 3 # Limit depth to 3
42-agent-browser snapshot -s "#main" # Scope to CSS selector
43-```
4445-### Interactions (use @refs from snapshot)
46-```bash
47-agent-browser click @e1 # Click
48-agent-browser dblclick @e1 # Double-click
49-agent-browser focus @e1 # Focus element
50-agent-browser fill @e2 "text" # Clear and type
51-agent-browser type @e2 "text" # Type without clearing
52-agent-browser press Enter # Press key
53-agent-browser press Control+a # Key combination
54-agent-browser keydown Shift # Hold key down
55-agent-browser keyup Shift # Release key
56-agent-browser hover @e1 # Hover
57-agent-browser check @e1 # Check checkbox
58-agent-browser uncheck @e1 # Uncheck checkbox
59-agent-browser select @e1 "value" # Select dropdown
60-agent-browser scroll down 500 # Scroll page
61-agent-browser scrollintoview @e1 # Scroll element into view
62-agent-browser drag @e1 @e2 # Drag and drop
63-agent-browser upload @e1 file.pdf # Upload files
64-```
6566-### Get information
67-```bash
68-agent-browser get text @e1 # Get element text
69-agent-browser get html @e1 # Get innerHTML
70-agent-browser get value @e1 # Get input value
71-agent-browser get attr @e1 href # Get attribute
72-agent-browser get title # Get page title
73-agent-browser get url # Get current URL
74-agent-browser get count ".item" # Count matching elements
75-agent-browser get box @e1 # Get bounding box
76```
7778-### Check state
79-```bash
80-agent-browser is visible @e1 # Check if visible
81-agent-browser is enabled @e1 # Check if enabled
82-agent-browser is checked @e1 # Check if checked
83-```
8485-### Screenshots & PDF
86-```bash
87-agent-browser screenshot # Screenshot to stdout
88-agent-browser screenshot path.png # Save to file
89-agent-browser screenshot --full # Full page
90-agent-browser pdf output.pdf # Save as PDF
91-```
9293-### Video recording
94```bash
95-agent-browser record start ./demo.webm # Start recording (uses current URL + state)
96-agent-browser click @e1 # Perform actions
97-agent-browser record stop # Stop and save video
98-agent-browser record restart ./take2.webm # Stop current + start new recording
000099```
100-Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.
101102-### Wait
103-```bash
104-agent-browser wait @e1 # Wait for element
105-agent-browser wait 2000 # Wait milliseconds
106-agent-browser wait --text "Success" # Wait for text
107-agent-browser wait --url "**/dashboard" # Wait for URL pattern
108-agent-browser wait --load networkidle # Wait for network idle
109-agent-browser wait --fn "window.ready" # Wait for JS condition
110-```
111112-### Mouse control
113```bash
114-agent-browser mouse move 100 200 # Move mouse
115-agent-browser mouse down left # Press button
116-agent-browser mouse up left # Release button
117-agent-browser mouse wheel 100 # Scroll wheel
118-```
000119120-### Semantic locators (alternative to refs)
121-```bash
122-agent-browser find role button click --name "Submit"
123-agent-browser find text "Sign In" click
124-agent-browser find label "Email" fill "user@test.com"
125-agent-browser find first ".item" click
126-agent-browser find nth 2 "a" text
127```
128129-### Browser settings
0130```bash
131-agent-browser set viewport 1920 1080 # Set viewport size
132-agent-browser set device "iPhone 14" # Emulate device
133-agent-browser set geo 37.7749 -122.4194 # Set geolocation
134-agent-browser set offline on # Toggle offline mode
135-agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
136-agent-browser set credentials user pass # HTTP basic auth
137-agent-browser set media dark # Emulate color scheme
0138```
139140-### Cookies & Storage
0141```bash
142-agent-browser cookies # Get all cookies
143-agent-browser cookies set name value # Set cookie
144-agent-browser cookies clear # Clear cookies
145-agent-browser storage local # Get all localStorage
146-agent-browser storage local key # Get specific key
147-agent-browser storage local set k v # Set value
148-agent-browser storage local clear # Clear all
149-```
150151-### Network
152-```bash
153-agent-browser network route <url> # Intercept requests
154-agent-browser network route <url> --abort # Block requests
155-agent-browser network route <url> --body '{}' # Mock response
156-agent-browser network unroute [url] # Remove routes
157-agent-browser network requests # View tracked requests
158-agent-browser network requests --filter api # Filter requests
159```
160161-### Tabs & Windows
162-```bash
163-agent-browser tab # List tabs
164-agent-browser tab new [url] # New tab
165-agent-browser tab 2 # Switch to tab
166-agent-browser tab close # Close tab
167-agent-browser window new # New window
168-```
169170-### Frames
171```bash
172-agent-browser frame "#iframe" # Switch to iframe
173-agent-browser frame main # Back to main frame
0174```
175176-### Dialogs
177-```bash
178-agent-browser dialog accept [text] # Accept dialog
179-agent-browser dialog dismiss # Dismiss dialog
180-```
181182-### JavaScript
183```bash
184-agent-browser eval "document.title" # Run JavaScript
000185```
186187-## Example: Form submission
188189```bash
190-agent-browser open https://example.com/form
191-agent-browser snapshot -i
192-# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
193194-agent-browser fill @e1 "user@example.com"
195-agent-browser fill @e2 "password123"
196-agent-browser click @e3
197-agent-browser wait --load networkidle
198-agent-browser snapshot -i # Check result
199-```
200201-## Example: Authentication with saved state
0000202203-```bash
204-# Login once
205-agent-browser open https://app.example.com/login
206-agent-browser snapshot -i
207-agent-browser fill @e1 "username"
208-agent-browser fill @e2 "password"
209-agent-browser click @e3
210-agent-browser wait --url "**/dashboard"
211-agent-browser state save auth.json
212213-# Later sessions: load saved state
214-agent-browser state load auth.json
215-agent-browser open https://app.example.com/dashboard
216```
217218-## Sessions (parallel browsers)
219220-```bash
221-agent-browser --session test1 open site-a.com
222-agent-browser --session test2 open site-b.com
223-agent-browser session list
224-```
225226-## JSON output (for parsing)
227228-Add `--json` for machine-readable output:
00000229```bash
230-agent-browser snapshot -i --json
231-agent-browser get text @e1 --json
0232```
233234-## Debugging
00235236```bash
237-agent-browser open example.com --headed # Show browser window
238-agent-browser console # View console messages
239-agent-browser errors # View page errors
240-agent-browser record start ./debug.webm # Record from current page
241-agent-browser record stop # Save recording
242-agent-browser open example.com --headed # Show browser window
243-agent-browser --cdp 9222 snapshot # Connect via CDP
244-agent-browser console # View console messages
245-agent-browser console --clear # Clear console
246-agent-browser errors # View page errors
247-agent-browser errors --clear # Clear errors
248-agent-browser highlight @e1 # Highlight element
249-agent-browser trace start # Start recording trace
250-agent-browser trace stop trace.zip # Stop and save trace
251```
···1---
2name: agent-browser
3+description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4---
56# Browser Automation with agent-browser
78+## Core Workflow
9+10+Every browser automation follows this pattern:
11+12+1. **Navigate**: `agent-browser open <url>`
13+2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
14+3. **Interact**: Use refs to click, fill, select
15+4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
1617```bash
18+agent-browser open https://example.com/form
19+agent-browser snapshot -i
20+# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
21+22+agent-browser fill @e1 "user@example.com"
23+agent-browser fill @e2 "password123"
24+agent-browser click @e3
25+agent-browser wait --load networkidle
26+agent-browser snapshot -i # Check result
27```
2829+## Essential Commands
3031+```bash
32+# Navigation
33+agent-browser open <url> # Navigate (aliases: goto, navigate)
34+agent-browser close # Close browser
3536+# Snapshot
37+agent-browser snapshot -i # Interactive elements with refs (recommended)
38+agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
39+agent-browser snapshot -s "#selector" # Scope to CSS selector
4041+# Interaction (use @refs from snapshot)
42+agent-browser click @e1 # Click element
43+agent-browser fill @e2 "text" # Clear and type text
44+agent-browser type @e2 "text" # Type without clearing
45+agent-browser select @e1 "option" # Select dropdown option
46+agent-browser check @e1 # Check checkbox
47+agent-browser press Enter # Press key
48+agent-browser scroll down 500 # Scroll page
4950+# Get information
51+agent-browser get text @e1 # Get element text
52+agent-browser get url # Get current URL
53+agent-browser get title # Get page title
00005455+# Wait
56+agent-browser wait @e1 # Wait for element
57+agent-browser wait --load networkidle # Wait for network idle
58+agent-browser wait --url "**/page" # Wait for URL pattern
59+agent-browser wait 2000 # Wait milliseconds
0000000000000006061+# Capture
62+agent-browser screenshot # Screenshot to temp dir
63+agent-browser screenshot --full # Full page screenshot
64+agent-browser pdf output.pdf # Save as PDF
00000065```
6667+## Common Patterns
000006869+### Form Submission
00000070071```bash
72+agent-browser open https://example.com/signup
73+agent-browser snapshot -i
74+agent-browser fill @e1 "Jane Doe"
75+agent-browser fill @e2 "jane@example.com"
76+agent-browser select @e3 "California"
77+agent-browser check @e4
78+agent-browser click @e5
79+agent-browser wait --load networkidle
80```
08182+### Authentication with State Persistence
0000000083084```bash
85+# Login once and save state
86+agent-browser open https://app.example.com/login
87+agent-browser snapshot -i
88+agent-browser fill @e1 "$USERNAME"
89+agent-browser fill @e2 "$PASSWORD"
90+agent-browser click @e3
91+agent-browser wait --url "**/dashboard"
92+agent-browser state save auth.json
9394+# Reuse in future sessions
95+agent-browser state load auth.json
96+agent-browser open https://app.example.com/dashboard
000097```
9899+### Data Extraction
100+101```bash
102+agent-browser open https://example.com/products
103+agent-browser snapshot -i
104+agent-browser get text @e5 # Get specific element text
105+agent-browser get text body > page.txt # Get all page text
106+107+# JSON output for parsing
108+agent-browser snapshot -i --json
109+agent-browser get text @e1 --json
110```
111112+### Parallel Sessions
113+114```bash
115+agent-browser --session site1 open https://site-a.com
116+agent-browser --session site2 open https://site-b.com
117+118+agent-browser --session site1 snapshot -i
119+agent-browser --session site2 snapshot -i
000120121+agent-browser session list
0000000122```
123124+### Visual Browser (Debugging)
00000001250126```bash
127+agent-browser --headed open https://example.com
128+agent-browser highlight @e1 # Highlight element
129+agent-browser record start demo.webm # Record session
130```
131132+### Local Files (PDFs, HTML)
00001330134```bash
135+# Open local files with file:// URLs
136+agent-browser --allow-file-access open file:///path/to/document.pdf
137+agent-browser --allow-file-access open file:///path/to/page.html
138+agent-browser screenshot output.png
139```
140141+### iOS Simulator (Mobile Safari)
142143```bash
144+# List available iOS simulators
145+agent-browser device list
0146147+# Launch Safari on a specific device
148+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
0000149150+# Same workflow as desktop - snapshot, interact, re-snapshot
151+agent-browser -p ios snapshot -i
152+agent-browser -p ios tap @e1 # Tap (alias for click)
153+agent-browser -p ios fill @e2 "text"
154+agent-browser -p ios swipe up # Mobile-specific gesture
155156+# Take screenshot
157+agent-browser -p ios screenshot mobile.png
0000000158159+# Close session (shuts down simulator)
160+agent-browser -p ios close
0161```
162163+**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
164165+**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
0000166167+## Ref Lifecycle (Important)
168169+Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
170+171+- Clicking links or buttons that navigate
172+- Form submissions
173+- Dynamic content loading (dropdowns, modals)
174+175```bash
176+agent-browser click @e5 # Navigates to new page
177+agent-browser snapshot -i # MUST re-snapshot
178+agent-browser click @e1 # Use new refs
179```
180181+## Semantic Locators (Alternative to Refs)
182+183+When refs are unavailable or unreliable, use semantic locators:
184185```bash
186+agent-browser find text "Sign In" click
187+agent-browser find label "Email" fill "user@test.com"
188+agent-browser find role button click --name "Submit"
189+agent-browser find placeholder "Search" type "query"
190+agent-browser find testid "submit-btn" click
000000000191```
···1+---
2+name: web-search
3+description: Use when you need fast, headless web search or readable page content without a browser, especially when you only have curl/wget and need a deterministic URL-based endpoint
4+---
5+6+# Jina AI Search + Reader
7+8+## Overview
9+Use Jina AIโs public endpoints to (1) search the web and (2) fetch readable page content as plain text/markdown via a single URL.
10+11+## When to Use
12+- You need quick web search results without an API key or browser
13+- You need readable page content from a URL for summarization or analysis
14+- Youโre in a headless environment and only have HTTP tools (curl/wget)
15+16+When NOT to use:
17+- You need advanced ranking, filters, or custom search parameters (use a full search API)
18+19+## Quick Reference
20+21+### Search (s.jina.ai)
22+```
23+https://s.jina.ai/YOUR_SEARCH_QUERY
24+```
25+26+### Reader (r.jina.ai)
27+```
28+https://r.jina.ai/YOUR_URL
29+```
30+31+## Implementation
32+33+### 1) Search for pages
34+```bash
35+curl "https://s.jina.ai/jina%20ai%20reader%20usage"
36+```
37+- URL-encode spaces and special characters in the query.
38+- Output returns search results with titles/snippets/links (plain text).
39+40+### 2) Fetch readable page content
41+```bash
42+curl "https://r.jina.ai/https://example.com/article"
43+```
44+- Prepend `https://r.jina.ai/` to any HTTP/HTTPS URL.
45+- Output is readable text/markdown for the target page.
46+47+### 3) Typical workflow
48+1. Use `s.jina.ai` to discover relevant links.
49+2. Use `r.jina.ai` to fetch readable content from those links.
50+51+## Common Mistakes
52+- Forgetting to URL-encode the search query โ results in malformed requests.
53+- Omitting the original URL scheme (http/https) after `r.jina.ai/`.
54+- Assuming `r.jina.ai` performs search (it only reads a specific URL).