# Scrapingbee

ScrapingBee is a web scraping API that handles headless browsers and proxy rotation, allowing developers to extract HTML from any website in a single API call.

- **Category:** ai web scraping
- **Auth:** API_KEY
- **Composio Managed App Available?** N/A
- **Tools:** 5
- **Triggers:** 0
- **Slug:** `SCRAPINGBEE`
- **Version:** 20260316_00

## Tools

### ScrapingBee Data Extraction

**Slug:** `SCRAPINGBEE_DATA_EXTRACTION`

Tool to extract structured data from a webpage using CSS or XPath selectors. Use ScrapingBee's extract_rules feature.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The webpage URL to extract data from. |
| `wait` | integer | No | Seconds to wait before extraction (for dynamic content). |
| `device` | string ("desktop" | "mobile") | No | Emulate device type (desktop or mobile). |
| `api_key` | string | Yes | Your ScrapingBee API key. |
| `extractor` | object | Yes | JSON object defining fields to extract and their CSS/XPath selectors. For nested selectors, use object with 'selector' and optional 'type' keys. Misaligned or invalid selectors silently drop fields with no error — verify each selector matches the target DOM before large-scale use. |
| `javascript` | boolean | No | Whether to render JavaScript before extraction. |
| `country_code` | string | No | Two-letter country code for proxy geolocation (e.g., 'us', 'de'). |
| `premium_proxy` | boolean | No | Use premium proxy for higher reliability. |
| `block_resources` | boolean | No | Block images, CSS, and resources to speed up extraction. |
| `forward_headers` | object | No | Custom HTTP headers to forward to the target website. Provide as a dict, e.g., {'Accept-Language': 'en-US'}. Headers will be prefixed with 'Spb-' and forwarded to the target. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### ScrapingBee HTML Fetch

**Slug:** `SCRAPINGBEE_HTML_FETCH`

Tool to fetch HTML or screenshot via ScrapingBee HTML API. Use when you need page markup or image after optional JS rendering and resource controls. For anti-bot or CAPTCHA-protected sites (e.g., Cloudflare), combine render_js=true with premium_proxy=true or stealth_proxy=true to avoid blocks.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The URL to scrape. |
| `wait` | integer | No | Milliseconds to wait before returning content. |
| `retry` | integer | No | Number of retries on request failure. |
| `device` | string ("desktop" | "mobile") | No | Device type to emulate ('desktop' or 'mobile'). |
| `cookies` | string | No | Cookies to send in requests (HTTP header string). |
| `wait_for` | string | No | CSS selector to wait for before returning content. |
| `block_ads` | boolean | No | Block ads and tracking scripts. |
| `render_js` | boolean | No | Render JavaScript before returning HTML. Required for client-side rendered pages where dynamic data is absent in raw HTML. |
| `js_snippet` | string | No | JavaScript snippet to execute before returning content. |
| `screenshot` | boolean | No | Return screenshot as base64-encoded PNG. |
| `js_scenario` | string | No | JSON scenario for custom headless browser actions. |
| `country_code` | string | No | Two-letter country code for geolocation (e.g., 'us'). |
| `extract_rules` | string | No | Extraction rules (CSS selector or JSONPath). |
| `premium_proxy` | boolean | No | Use premium proxy for scraping. |
| `stealth_proxy` | boolean | No | Use stealth (undetectable) proxy mode. |
| `block_resources` | boolean | No | Block images and CSS resources on the page to speed up scraping. |
| `screenshot_selector` | string | No | CSS selector of element to screenshot. |
| `screenshot_full_page` | boolean | No | Capture full-page screenshot instead of only viewport. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### ScrapingBee Proxy Mode

**Slug:** `SCRAPINGBEE_SCRAPING_BEE_PROXY_MODE`

Tool to fetch web content via ScrapingBee's Proxy Mode. Use when you need to route requests through ScrapingBee proxies with optional JS rendering and resource blocking.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The target URL to scrape through ScrapingBee Proxy Mode. |
| `cookies` | object | No | Cookies to send with the request as a key-value mapping. |
| `headers` | object | No | Additional HTTP headers to forward to the target site. Each header will be prefixed with 'Spb-' and forwarded when forward_headers is enabled. |
| `timeout` | integer | No | Request timeout in milliseconds. |
| `block_ads` | boolean | No | Block ads and tracking scripts to speed up scraping. |
| `render_js` | boolean | No | Enable JavaScript rendering before returning content. |
| `session_id` | integer | No | Session identifier (integer) to keep the same IP for multiple requests. Use the same number to maintain consistent IP across requests. |
| `js_scenario` | string | No | Custom JavaScript scenario name for advanced interactions. |
| `country_code` | string ("us" | "de" | "fr" | "uk" | "ca" | "it" | "es") | No | Two-letter country code for geolocated proxy (e.g., 'us', 'fr'). |
| `premium_proxy` | boolean | No | Use premium proxies for higher reliability. |
| `stealth_proxy` | boolean | No | Use stealth proxy mode for extra undetectability. |
| `block_resources` | boolean | No | Block images and CSS resources to speed up scraping. Only relevant when render_js is enabled. |
| `forward_headers` | boolean | No | Forward original request headers to the target site. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### ScrapingBee Stealth Proxy

**Slug:** `SCRAPINGBEE_STEALTH_PROXY`

Tool to perform stealth scraping via ScrapingBee's Stealth Proxy mode. Use when you encounter anti-bot measures requiring undetectable requests.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The URL of the webpage to retrieve using stealth proxy. |
| `wait` | integer | No | Wait time in milliseconds before returning the response. |
| `device` | string ("desktop" | "mobile") | No | Device type to emulate during rendering. Options: 'desktop' or 'mobile'. |
| `cookies` | string | No | Custom cookies in semicolon-separated format: 'name1=value1;name2=value2'. |
| `js_render` | boolean | No | Render JavaScript on the page before returning the response. |
| `country_code` | string | No | Two-letter country code for proxy geolocation (e.g., 'us', 'de'). |
| `extract_rules` | string | No | Extraction rules in JSON string for structured data. |
| `premium_proxy` | boolean | No | Use premium proxies for higher reliability. |
| `stealth_proxy` | boolean | No | Enable stealth proxy mode. Use when the target site blocks bots. |
| `block_resources` | boolean | No | Block images, styles, and fonts for faster loads. |
| `forward_headers` | boolean | No | Forward original request headers from the browser. |
| `return_page_source` | boolean | No | Return the raw page source instead of text. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### ScrapingBee Usage Stats

**Slug:** `SCRAPINGBEE_USAGE_STATS`

Tool to retrieve usage statistics for your ScrapingBee account. Use when you need to monitor remaining credits and request count.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |
