···11---
22name: agent-browser
33-description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
33+description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
44---
5566# Browser Automation with agent-browser
7788-## Quick start
88+## Core Workflow
99+1010+Every browser automation follows this pattern:
1111+1212+1. **Navigate**: `agent-browser open <url>`
1313+2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
1414+3. **Interact**: Use refs to click, fill, select
1515+4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
9161017```bash
1111-agent-browser open <url> # Navigate to page
1212-agent-browser snapshot -i # Get interactive elements with refs
1313-agent-browser click @e1 # Click element by ref
1414-agent-browser fill @e2 "text" # Fill input by ref
1515-agent-browser close # Close browser
1818+agent-browser open https://example.com/form
1919+agent-browser snapshot -i
2020+# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
2121+2222+agent-browser fill @e1 "user@example.com"
2323+agent-browser fill @e2 "password123"
2424+agent-browser click @e3
2525+agent-browser wait --load networkidle
2626+agent-browser snapshot -i # Check result
1627```
17281818-## Core workflow
2929+## Essential Commands
19302020-1. Navigate: `agent-browser open <url>`
2121-2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
2222-3. Interact using refs from the snapshot
2323-4. Re-snapshot after navigation or significant DOM changes
3131+```bash
3232+# Navigation
3333+agent-browser open <url> # Navigate (aliases: goto, navigate)
3434+agent-browser close # Close browser
24352525-## Commands
3636+# Snapshot
3737+agent-browser snapshot -i # Interactive elements with refs (recommended)
3838+agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
3939+agent-browser snapshot -s "#selector" # Scope to CSS selector
26402727-### Navigation
2828-```bash
2929-agent-browser open <url> # Navigate to URL
3030-agent-browser back # Go back
3131-agent-browser forward # Go forward
3232-agent-browser reload # Reload page
3333-agent-browser close # Close browser
3434-```
4141+# Interaction (use @refs from snapshot)
4242+agent-browser click @e1 # Click element
4343+agent-browser fill @e2 "text" # Clear and type text
4444+agent-browser type @e2 "text" # Type without clearing
4545+agent-browser select @e1 "option" # Select dropdown option
4646+agent-browser check @e1 # Check checkbox
4747+agent-browser press Enter # Press key
4848+agent-browser scroll down 500 # Scroll page
35493636-### Snapshot (page analysis)
3737-```bash
3838-agent-browser snapshot # Full accessibility tree
3939-agent-browser snapshot -i # Interactive elements only (recommended)
4040-agent-browser snapshot -c # Compact output
4141-agent-browser snapshot -d 3 # Limit depth to 3
4242-agent-browser snapshot -s "#main" # Scope to CSS selector
4343-```
5050+# Get information
5151+agent-browser get text @e1 # Get element text
5252+agent-browser get url # Get current URL
5353+agent-browser get title # Get page title
44544545-### Interactions (use @refs from snapshot)
4646-```bash
4747-agent-browser click @e1 # Click
4848-agent-browser dblclick @e1 # Double-click
4949-agent-browser focus @e1 # Focus element
5050-agent-browser fill @e2 "text" # Clear and type
5151-agent-browser type @e2 "text" # Type without clearing
5252-agent-browser press Enter # Press key
5353-agent-browser press Control+a # Key combination
5454-agent-browser keydown Shift # Hold key down
5555-agent-browser keyup Shift # Release key
5656-agent-browser hover @e1 # Hover
5757-agent-browser check @e1 # Check checkbox
5858-agent-browser uncheck @e1 # Uncheck checkbox
5959-agent-browser select @e1 "value" # Select dropdown
6060-agent-browser scroll down 500 # Scroll page
6161-agent-browser scrollintoview @e1 # Scroll element into view
6262-agent-browser drag @e1 @e2 # Drag and drop
6363-agent-browser upload @e1 file.pdf # Upload files
6464-```
5555+# Wait
5656+agent-browser wait @e1 # Wait for element
5757+agent-browser wait --load networkidle # Wait for network idle
5858+agent-browser wait --url "**/page" # Wait for URL pattern
5959+agent-browser wait 2000 # Wait milliseconds
65606666-### Get information
6767-```bash
6868-agent-browser get text @e1 # Get element text
6969-agent-browser get html @e1 # Get innerHTML
7070-agent-browser get value @e1 # Get input value
7171-agent-browser get attr @e1 href # Get attribute
7272-agent-browser get title # Get page title
7373-agent-browser get url # Get current URL
7474-agent-browser get count ".item" # Count matching elements
7575-agent-browser get box @e1 # Get bounding box
6161+# Capture
6262+agent-browser screenshot # Screenshot to temp dir
6363+agent-browser screenshot --full # Full page screenshot
6464+agent-browser pdf output.pdf # Save as PDF
7665```
77667878-### Check state
7979-```bash
8080-agent-browser is visible @e1 # Check if visible
8181-agent-browser is enabled @e1 # Check if enabled
8282-agent-browser is checked @e1 # Check if checked
8383-```
6767+## Common Patterns
84688585-### Screenshots & PDF
8686-```bash
8787-agent-browser screenshot # Screenshot to stdout
8888-agent-browser screenshot path.png # Save to file
8989-agent-browser screenshot --full # Full page
9090-agent-browser pdf output.pdf # Save as PDF
9191-```
6969+### Form Submission
92709393-### Video recording
9471```bash
9595-agent-browser record start ./demo.webm # Start recording (uses current URL + state)
9696-agent-browser click @e1 # Perform actions
9797-agent-browser record stop # Stop and save video
9898-agent-browser record restart ./take2.webm # Stop current + start new recording
7272+agent-browser open https://example.com/signup
7373+agent-browser snapshot -i
7474+agent-browser fill @e1 "Jane Doe"
7575+agent-browser fill @e2 "jane@example.com"
7676+agent-browser select @e3 "California"
7777+agent-browser check @e4
7878+agent-browser click @e5
7979+agent-browser wait --load networkidle
9980```
100100-Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.
10181102102-### Wait
103103-```bash
104104-agent-browser wait @e1 # Wait for element
105105-agent-browser wait 2000 # Wait milliseconds
106106-agent-browser wait --text "Success" # Wait for text
107107-agent-browser wait --url "**/dashboard" # Wait for URL pattern
108108-agent-browser wait --load networkidle # Wait for network idle
109109-agent-browser wait --fn "window.ready" # Wait for JS condition
110110-```
8282+### Authentication with State Persistence
11183112112-### Mouse control
11384```bash
114114-agent-browser mouse move 100 200 # Move mouse
115115-agent-browser mouse down left # Press button
116116-agent-browser mouse up left # Release button
117117-agent-browser mouse wheel 100 # Scroll wheel
118118-```
8585+# Login once and save state
8686+agent-browser open https://app.example.com/login
8787+agent-browser snapshot -i
8888+agent-browser fill @e1 "$USERNAME"
8989+agent-browser fill @e2 "$PASSWORD"
9090+agent-browser click @e3
9191+agent-browser wait --url "**/dashboard"
9292+agent-browser state save auth.json
11993120120-### Semantic locators (alternative to refs)
121121-```bash
122122-agent-browser find role button click --name "Submit"
123123-agent-browser find text "Sign In" click
124124-agent-browser find label "Email" fill "user@test.com"
125125-agent-browser find first ".item" click
126126-agent-browser find nth 2 "a" text
9494+# Reuse in future sessions
9595+agent-browser state load auth.json
9696+agent-browser open https://app.example.com/dashboard
12797```
12898129129-### Browser settings
9999+### Data Extraction
100100+130101```bash
131131-agent-browser set viewport 1920 1080 # Set viewport size
132132-agent-browser set device "iPhone 14" # Emulate device
133133-agent-browser set geo 37.7749 -122.4194 # Set geolocation
134134-agent-browser set offline on # Toggle offline mode
135135-agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
136136-agent-browser set credentials user pass # HTTP basic auth
137137-agent-browser set media dark # Emulate color scheme
102102+agent-browser open https://example.com/products
103103+agent-browser snapshot -i
104104+agent-browser get text @e5 # Get specific element text
105105+agent-browser get text body > page.txt # Get all page text
106106+107107+# JSON output for parsing
108108+agent-browser snapshot -i --json
109109+agent-browser get text @e1 --json
138110```
139111140140-### Cookies & Storage
112112+### Parallel Sessions
113113+141114```bash
142142-agent-browser cookies # Get all cookies
143143-agent-browser cookies set name value # Set cookie
144144-agent-browser cookies clear # Clear cookies
145145-agent-browser storage local # Get all localStorage
146146-agent-browser storage local key # Get specific key
147147-agent-browser storage local set k v # Set value
148148-agent-browser storage local clear # Clear all
149149-```
115115+agent-browser --session site1 open https://site-a.com
116116+agent-browser --session site2 open https://site-b.com
117117+118118+agent-browser --session site1 snapshot -i
119119+agent-browser --session site2 snapshot -i
150120151151-### Network
152152-```bash
153153-agent-browser network route <url> # Intercept requests
154154-agent-browser network route <url> --abort # Block requests
155155-agent-browser network route <url> --body '{}' # Mock response
156156-agent-browser network unroute [url] # Remove routes
157157-agent-browser network requests # View tracked requests
158158-agent-browser network requests --filter api # Filter requests
121121+agent-browser session list
159122```
160123161161-### Tabs & Windows
162162-```bash
163163-agent-browser tab # List tabs
164164-agent-browser tab new [url] # New tab
165165-agent-browser tab 2 # Switch to tab
166166-agent-browser tab close # Close tab
167167-agent-browser window new # New window
168168-```
124124+### Visual Browser (Debugging)
169125170170-### Frames
171126```bash
172172-agent-browser frame "#iframe" # Switch to iframe
173173-agent-browser frame main # Back to main frame
127127+agent-browser --headed open https://example.com
128128+agent-browser highlight @e1 # Highlight element
129129+agent-browser record start demo.webm # Record session
174130```
175131176176-### Dialogs
177177-```bash
178178-agent-browser dialog accept [text] # Accept dialog
179179-agent-browser dialog dismiss # Dismiss dialog
180180-```
132132+### Local Files (PDFs, HTML)
181133182182-### JavaScript
183134```bash
184184-agent-browser eval "document.title" # Run JavaScript
135135+# Open local files with file:// URLs
136136+agent-browser --allow-file-access open file:///path/to/document.pdf
137137+agent-browser --allow-file-access open file:///path/to/page.html
138138+agent-browser screenshot output.png
185139```
186140187187-## Example: Form submission
141141+### iOS Simulator (Mobile Safari)
188142189143```bash
190190-agent-browser open https://example.com/form
191191-agent-browser snapshot -i
192192-# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
144144+# List available iOS simulators
145145+agent-browser device list
193146194194-agent-browser fill @e1 "user@example.com"
195195-agent-browser fill @e2 "password123"
196196-agent-browser click @e3
197197-agent-browser wait --load networkidle
198198-agent-browser snapshot -i # Check result
199199-```
147147+# Launch Safari on a specific device
148148+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
200149201201-## Example: Authentication with saved state
150150+# Same workflow as desktop - snapshot, interact, re-snapshot
151151+agent-browser -p ios snapshot -i
152152+agent-browser -p ios tap @e1 # Tap (alias for click)
153153+agent-browser -p ios fill @e2 "text"
154154+agent-browser -p ios swipe up # Mobile-specific gesture
202155203203-```bash
204204-# Login once
205205-agent-browser open https://app.example.com/login
206206-agent-browser snapshot -i
207207-agent-browser fill @e1 "username"
208208-agent-browser fill @e2 "password"
209209-agent-browser click @e3
210210-agent-browser wait --url "**/dashboard"
211211-agent-browser state save auth.json
156156+# Take screenshot
157157+agent-browser -p ios screenshot mobile.png
212158213213-# Later sessions: load saved state
214214-agent-browser state load auth.json
215215-agent-browser open https://app.example.com/dashboard
159159+# Close session (shuts down simulator)
160160+agent-browser -p ios close
216161```
217162218218-## Sessions (parallel browsers)
163163+**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
219164220220-```bash
221221-agent-browser --session test1 open site-a.com
222222-agent-browser --session test2 open site-b.com
223223-agent-browser session list
224224-```
165165+**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
225166226226-## JSON output (for parsing)
167167+## Ref Lifecycle (Important)
227168228228-Add `--json` for machine-readable output:
169169+Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
170170+171171+- Clicking links or buttons that navigate
172172+- Form submissions
173173+- Dynamic content loading (dropdowns, modals)
174174+229175```bash
230230-agent-browser snapshot -i --json
231231-agent-browser get text @e1 --json
176176+agent-browser click @e5 # Navigates to new page
177177+agent-browser snapshot -i # MUST re-snapshot
178178+agent-browser click @e1 # Use new refs
232179```
233180234234-## Debugging
181181+## Semantic Locators (Alternative to Refs)
182182+183183+When refs are unavailable or unreliable, use semantic locators:
235184236185```bash
237237-agent-browser open example.com --headed # Show browser window
238238-agent-browser console # View console messages
239239-agent-browser errors # View page errors
240240-agent-browser record start ./debug.webm # Record from current page
241241-agent-browser record stop # Save recording
242242-agent-browser open example.com --headed # Show browser window
243243-agent-browser --cdp 9222 snapshot # Connect via CDP
244244-agent-browser console # View console messages
245245-agent-browser console --clear # Clear console
246246-agent-browser errors # View page errors
247247-agent-browser errors --clear # Clear errors
248248-agent-browser highlight @e1 # Highlight element
249249-agent-browser trace start # Start recording trace
250250-agent-browser trace stop trace.zip # Stop and save trace
186186+agent-browser find text "Sign In" click
187187+agent-browser find label "Email" fill "user@test.com"
188188+agent-browser find role button click --name "Submit"
189189+agent-browser find placeholder "Search" type "query"
190190+agent-browser find testid "submit-btn" click
251191```
···11+---
22+name: web-search
33+description: Use when you need fast, headless web search or readable page content without a browser, especially when you only have curl/wget and need a deterministic URL-based endpoint
44+---
55+66+# Jina AI Search + Reader
77+88+## Overview
99+Use Jina AIโs public endpoints to (1) search the web and (2) fetch readable page content as plain text/markdown via a single URL.
1010+1111+## When to Use
1212+- You need quick web search results without an API key or browser
1313+- You need readable page content from a URL for summarization or analysis
1414+- Youโre in a headless environment and only have HTTP tools (curl/wget)
1515+1616+When NOT to use:
1717+- You need advanced ranking, filters, or custom search parameters (use a full search API)
1818+1919+## Quick Reference
2020+2121+### Search (s.jina.ai)
2222+```
2323+https://s.jina.ai/YOUR_SEARCH_QUERY
2424+```
2525+2626+### Reader (r.jina.ai)
2727+```
2828+https://r.jina.ai/YOUR_URL
2929+```
3030+3131+## Implementation
3232+3333+### 1) Search for pages
3434+```bash
3535+curl "https://s.jina.ai/jina%20ai%20reader%20usage"
3636+```
3737+- URL-encode spaces and special characters in the query.
3838+- Output returns search results with titles/snippets/links (plain text).
3939+4040+### 2) Fetch readable page content
4141+```bash
4242+curl "https://r.jina.ai/https://example.com/article"
4343+```
4444+- Prepend `https://r.jina.ai/` to any HTTP/HTTPS URL.
4545+- Output is readable text/markdown for the target page.
4646+4747+### 3) Typical workflow
4848+1. Use `s.jina.ai` to discover relevant links.
4949+2. Use `r.jina.ai` to fetch readable content from those links.
5050+5151+## Common Mistakes
5252+- Forgetting to URL-encode the search query โ results in malformed requests.
5353+- Omitting the original URL scheme (http/https) after `r.jina.ai/`.
5454+- Assuming `r.jina.ai` performs search (it only reads a specific URL).