# Scrape Do

Scrape.do is a web scraping API offering rotating residential, data-center, and mobile proxies with headless browser support and session management to bypass anti-bot protections (e.g., Cloudflare, Akamai) and extract data at scale in formats like JSON and HTML.

- **Category:** ai web scraping
- **Auth:** API_KEY
- **Composio Managed App Available?** N/A
- **Tools:** 17
- **Triggers:** 0
- **Slug:** `SCRAPE_DO`
- **Version:** 20260316_00

## Tools

### Cancel Async Job

**Slug:** `SCRAPE_DO_CANCEL_ASYNC_JOB`

Tool to cancel an asynchronous scraping job. Use when you need to stop processing of pending tasks in a job. Completed tasks remain available.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `token` | string | Yes | Authentication token for Scrape.do API |
| `job_id` | string | Yes | Unique identifier of the job to cancel |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Create Async Scraping Job

**Slug:** `SCRAPE_DO_CREATE_ASYNC_JOB`

Tool to create an asynchronous scraping job with specified targets and options. Use when you need to scrape multiple URLs in parallel without waiting for results. Returns a job ID immediately for polling results later via the get job status action.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `Body` | string | No | HTTP request body for POST/PUT/PATCH requests |
| `Super` | boolean | No | Use residential/mobile proxy networks (default: false) |
| `Device` | string ("desktop" | "mobile" | "tablet") | No | Device types for scraping emulation. |
| `Method` | string ("GET" | "POST" | "PUT" | "PATCH" | "HEAD" | "DELETE") | No | HTTP methods for async scraping requests. |
| `Output` | string ("raw" | "markdown") | No | Output format for scraped content. |
| `Render` | object | No | Options for headless browser rendering. |
| `GeoCode` | string | No | Country code for geo-targeting (e.g., 'us', 'gb', 'de') |
| `Headers` | object | No | Custom HTTP headers to send with requests |
| `Targets` | array | Yes | Array of target URLs to scrape. Each URL will be processed asynchronously. |
| `Timeout` | integer | No | Total request timeout in milliseconds (default: 60000) |
| `SessionID` | string | No | Sticky session ID to reuse same IP address across requests |
| `SetCookies` | string | No | Cookies to include with the request |
| `WebhookURL` | string | No | Webhook URL to send results to when job completes |
| `DisableRetry` | boolean | No | Disable automatic retry mechanism (default: false) |
| `RetryTimeout` | integer | No | Retry timeout per request in milliseconds (default: 15000) |
| `ForwardHeaders` | boolean | No | Use only provided headers, discard default headers (default: false) |
| `WebhookHeaders` | object | No | Additional headers to send with webhook notification |
| `RegionalGeoCode` | string | No | Regional code for more specific geo-targeting |
| `DisableRedirection` | boolean | No | Disable following HTTP redirects (default: false) |
| `TransparentResponse` | boolean | No | Return raw target website response without processing (default: false) |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Account Information

**Slug:** `SCRAPE_DO_GET_ACCOUNT_INFO`

Retrieves account information and usage statistics from Scrape.do. This action makes a GET request to the Scrape.do info endpoint to fetch: - Subscription status - Concurrent request limits and usage - Monthly request limits and remaining requests - Real-time usage statistics Rate limit: Maximum 10 requests per minute. Use remaining request counts to monitor credits proactively, as different scraping operations (e.g., rendered-page requests) consume varying credit amounts and exhaustion mid-run causes failures.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `token` | string | Yes | Authentication token for Scrape.do API |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Amazon Product Offers

**Slug:** `SCRAPE_DO_GET_AMAZON_OFFERS`

Get all seller offers for any Amazon product. Retrieves every seller listing including pricing, shipping costs, seller information, and Buy Box status in structured JSON format. Use when you need to compare prices across multiple sellers or find the best deal for a specific product.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `asin` | string | Yes | Amazon Standard Identification Number (10-character product ID) |
| `geocode` | string | Yes | Country code for Amazon marketplace (e.g., us, gb, de, jp, fr, es, it, ca) |
| `zipcode` | string | Yes | Postal/ZIP code formatted according to country requirements |
| `super_mode` | boolean | No | Enable residential/mobile proxies for higher success rates. Costs 10x credits |
| `include_html` | boolean | No | When true, includes the full raw HTML alongside structured JSON |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Amazon product details

**Slug:** `SCRAPE_DO_GET_AMAZON_PRODUCT`

Extract structured product data from Amazon product detail pages (PDP). Returns comprehensive product information including title, pricing, ratings, images, best seller rankings, and technical specifications in JSON format.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `asin` | string | Yes | Amazon Standard Identification Number (10-character product ID) |
| `geocode` | string | Yes | Country code (e.g., us, gb, de, jp, fr, ca) |
| `zipcode` | string | Yes | Postal code formatted according to country requirements |
| `language` | string | No | Language code in ISO 639-1 format (e.g., EN, DE, FR) |
| `super_mode` | boolean | No | Enable residential/mobile proxies for higher success rates. Costs 10x credits |
| `include_html` | boolean | No | When true, includes the full raw HTML alongside structured JSON |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Amazon raw HTML

**Slug:** `SCRAPE_DO_GET_AMAZON_RAW_HTML`

Tool to get raw HTML from any Amazon page with ZIP code geo-targeting. Use when you need complete unprocessed HTML source from Amazon URLs with location-based targeting. Ideal for scraping pages not covered by other structured endpoints.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | Full Amazon URL to scrape (e.g., https://www.amazon.com/dp/B08N5WRWNW) |
| `super` | boolean | No | Enable residential/mobile proxies for higher success rates. Costs 10x credits. Default is false. |
| `output` | string | No | Output format - must be 'html' for raw HTML content |
| `geocode` | string | Yes | Country code for geo-targeting (e.g., us, gb, de, jp) |
| `timeout` | integer | No | Request timeout in milliseconds |
| `zipcode` | string | Yes | Postal code formatted according to country requirements (e.g., 10001 for US, SW1A 1AA for UK) |
| `language` | string | No | Language code in ISO 639-1 format (e.g., EN, DE, FR, ES) |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Async API Account Information

**Slug:** `SCRAPE_DO_GET_ASYNC_ACCOUNT_INFO`

Tool to get account information for the Async API including concurrency limits and usage statistics. Use when you need to check available concurrency slots, active jobs, or remaining credits for Async API operations.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Async Job Details

**Slug:** `SCRAPE_DO_GET_ASYNC_JOB`

Tool to retrieve details and status of a specific asynchronous scraping job. Use when you need to check the progress, status, or results of a previously created async job. Returns job metadata including creation time, completion time, task counts, and detailed task list.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `jobID` | string | Yes | Unique identifier of the job to retrieve |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Async Task Result

**Slug:** `SCRAPE_DO_GET_ASYNC_TASK`

Tool to retrieve the result of a specific task within an asynchronous job. Returns the scraped content for that particular URL. Use when you need to check the status and result of a previously submitted async scraping task.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `token` | string | Yes | Authentication token for Scrape.do API |
| `job_id` | string | Yes | Unique identifier of the job |
| `task_id` | string | Yes | Unique identifier of the task within the job |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape webpage using scrape.do

**Slug:** `SCRAPE_DO_GET_PAGE`

A tool to scrape web pages using scrape.do's API service. Makes a basic GET request to fetch webpage content while handling anti-bot protections and proxy rotation automatically. Does not execute JavaScript by default — pages requiring client-side rendering (SPAs, dynamically loaded content) will return incomplete HTML; use SCRAPE_DO_GET_RENDER_PAGE or set render=true for those cases.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | Target web page URL to scrape |
| `super` | boolean | No | Use residential & mobile proxy networks |
| `width` | integer | No | Browser viewport width |
| `device` | string ("desktop" | "mobile" | "tablet") | No | Specify device type (desktop, mobile, tablet) |
| `height` | integer | No | Browser viewport height |
| `output` | string ("raw" | "markdown") | No | Output format (raw or markdown) |
| `render` | boolean | No | Enable headless browser rendering Use for JS-heavy pages, SPAs, or sites with anti-bot JS challenges. Increase `timeout` when enabling to ensure full page load before cutoff. |
| `timeout` | integer | No | Maximum request timeout in ms (5000-120000) |
| `geo_code` | string | No | Choose country for target web page (e.g. 'us', 'gb') |
| `return_json` | boolean | No | Return network requests in JSON format |
| `set_cookies` | string | No | Set cookies for target web page |
| `extra_headers` | boolean | No | Add/modify headers |
| `retry_timeout` | integer | No | Maximum retry timeout in ms (5000-55000) |
| `custom_headers` | boolean | No | Handle all request headers |
| `block_resources` | boolean | No | Block CSS and image sources |
| `disable_redirection` | boolean | No | Disable request redirection |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### List Asynchronous Scraping Jobs

**Slug:** `SCRAPE_DO_LIST_ASYNC_JOBS`

Tool to list all asynchronous scraping jobs. Returns paginated list of jobs with their status and metadata. Use when you need to retrieve job history or monitor job statuses. Supports pagination with up to 100 jobs per page.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `page` | integer | No | Page number for pagination (default: 1, minimum: 1) |
| `page_size` | integer | No | Number of jobs per page (default: 10, maximum: 100) |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Use Scrape.do Proxy Mode

**Slug:** `SCRAPE_DO_PROXY_MODE`

This tool implements the Proxy Mode functionality of scrape.do, which allows routing requests through their proxy server. It provides an alternative way to access web scraping capabilities by handling complex JavaScript-rendered pages, geolocation-based routing, device simulation, and built-in anti-bot and retry mechanisms.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The target URL to scrape |
| `device` | string | No | Device type to simulate (desktop, mobile, tablet) |
| `render` | boolean | No | Enable/disable JavaScript rendering |
| `geo_code` | string | No | Geographic location for the request (e.g., 'us', 'uk') |
| `custom_headers` | boolean | No | Whether to forward custom headers to the target website |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape URL using POST method

**Slug:** `SCRAPE_DO_SCRAPE_URL_POST`

Tool to scrape web pages using POST method via scrape.do API. Use when you need to send POST requests to target websites with custom request body data. Supports all parameters from GET endpoint plus request body customization for POST/PUT/PATCH methods.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | Target web page URL to scrape with POST request |
| `body` | string | No | HTTP request body for POST request. Can be JSON string, form data, or plain text |
| `super` | boolean | No | Enable residential/mobile proxies. Costs 10x credits |
| `device` | string ("desktop" | "mobile" | "tablet") | No | Device types for scraping emulation. |
| `render` | boolean | No | Enable JavaScript rendering with headless browser |
| `geoCode` | string | No | Country code for geo-targeting (e.g. 'us', 'gb', 'de') |
| `timeout` | integer | No | Total request timeout in milliseconds (5000-120000) |
| `sessionId` | string | No | Sticky session ID to reuse same IP address across multiple requests |
| `setCookies` | string | No | Cookies to include with the request (format: key1=value1; key2=value2) |
| `customHeaders` | boolean | No | Enable sending custom headers with the request |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Search Amazon products

**Slug:** `SCRAPE_DO_SEARCH_AMAZON`

Tool to search Amazon and scrape product listings with structured results. Performs keyword searches and returns structured product data including titles, prices, ratings, Prime status, sponsored flags, and position rankings in JSON format. Use when you need to search for products on Amazon marketplace or gather product information from search results.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `page` | integer | No | Page number for pagination (default: 1) |
| `super` | boolean | No | Enable residential/mobile proxies for higher success rates. Costs 10x credits (default: false) |
| `geocode` | string | Yes | Country code for Amazon marketplace (e.g., us, gb, de, jp, ca, fr, it, es, in) |
| `keyword` | string | Yes | Search query term (will be URL-encoded automatically) |
| `zipcode` | string | Yes | Postal/ZIP code formatted according to country requirements (e.g., 10001 for US, SW1A 1AA for UK) |
| `language` | string | No | Language code in ISO 639-1 format (e.g., EN, DE, FR, ES) |
| `include_html` | boolean | No | When true, includes the full raw HTML alongside structured JSON (default: false) |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Block specific URLs during scraping

**Slug:** `SCRAPE_DO_SET_BLOCK_URLS`

This tool allows users to block specific URLs during the scraping process. It's particularly useful for blocking unwanted resources like analytics scripts, advertisements, or any other URLs that might interfere with the scraping process or slow it down. It provides granular control by allowing users to specify URL patterns to block, thereby improving scraping performance and maintaining privacy.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `urls` | array | Yes | List of URL patterns to block during scraping. Can be full URLs or patterns. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Set Regional Geolocation for Scraping

**Slug:** `SCRAPE_DO_SET_REGIONAL_GEO_CODE`

This tool allows users to set a broader geographical targeting by specifying a region code instead of a specific country code. This is useful when you want to scrape content from an entire region rather than a specific country. Note that this feature requires super mode to be enabled and is only available for Business Plan or higher subscriptions.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The target URL to scrape with the specified regional geo code |
| `regional_geo_code` | string ("europe" | "asia" | "africa" | "oceania" | "northamerica" | "southamerica") | Yes | The region code to target for scraping requests |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |