# Parsera

Parsera is a lightweight Python library for scraping websites using large language models (LLMs).

- **Category:** ai web scraping
- **Auth:** API_KEY
- **Composio Managed App Available?** N/A
- **Tools:** 14
- **Triggers:** 0
- **Slug:** `PARSERA`
- **Version:** 20260312_00

## Tools

### Create Scraper

**Slug:** `PARSERA_CREATE_SCRAPER`

Tool to create a new empty scraper for your account. Returns a scraper_id that can be used with the generate endpoint to generate scraping code.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Delete Scraper

**Slug:** `PARSERA_DELETE_SCRAPER`

Tool to delete an existing scraper by its ID. Use when you need to remove a scraper that was created through the /v1/scrapers/new endpoint.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `scraper_id` | string | Yes | Unique identifier of the scraper to delete. Only scrapers created through the /v1/scrapers/new endpoint can be deleted. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Extract Data from Webpage

**Slug:** `PARSERA_EXTRACT_DATA`

Tool to perform LLM-powered data extraction from a live webpage URL with specified attributes. Use when you need to extract structured data from web pages based on field descriptions.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The webpage URL to extract data from. |
| `mode` | string ("standard" | "precision") | No | Extraction mode. 'standard' mode performs efficient extraction. 'precision' mode minimizes page reduction to detect data hidden in HTML tags but uses more credits. |
| `prompt` | string | No | Additional scraping instructions to guide the extraction process. |
| `cookies` | array | No | Authentication or session cookies for extraction. Each cookie should be a dictionary with cookie properties. |
| `attributes` | object | Yes | Map of field names and descriptions to extract. Supports two formats: 1) Simple format: {'field_name': 'description'}, 2) Typed format: {'field_name': {'description': '...', 'type': 'string\|integer\|number\|bool\|list\|object\|any'}}. NOTE: You cannot mix both formats in a single request - use either all simple strings or all typed objects. |
| `proxy_country` | string | No | Geographic location for proxy routing. Recommended to set as pages may be unavailable from certain locations. Use GET /v1/proxy-countries to retrieve supported countries. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Extract Markdown

**Slug:** `PARSERA_EXTRACT_MARKDOWN`

Tool to extract markdown content from a file or URL.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | No | URL of the page to extract markdown from. |
| `file_path` | string | No | Local path to the document file to be uploaded for extraction. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get LLM Specifications

**Slug:** `PARSERA_GET_LLM_SPECS`

Tool to retrieve standardized LLM capabilities and pricing specifications. Use to get up-to-date information about models from various providers.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Get Proxy Countries

**Slug:** `PARSERA_GET_PROXY_COUNTRIES`

Tool to retrieve the list of available proxy countries for web scraping requests. Use when you need to know which countries are supported for proxy-based scraping.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Health Check

**Slug:** `PARSERA_HEALTH_CHECK`

Tool to verify API availability and operational status. Use to check if the Parsera service is accessible before making other API calls.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### List Agents

**Slug:** `PARSERA_LIST_AGENTS`

Tool to retrieve all available agents for the authenticated user. Use when you need to list agents that can be used for scraping tasks.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### List Scrapers

**Slug:** `PARSERA_LIST_SCRAPERS`

Tool to list all templates and old scrapers for the authenticated user. Use when you need to retrieve available scraper configurations.

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Parse Content (Enhanced)

**Slug:** `PARSERA_PARSE_CONTENT2`

Tool to extract structured data from raw HTML or text content using AI with advanced options. Use when you have content already loaded and need to extract specific fields with pagination or different extraction modes.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `mode` | string ("standard" | "precision" | "code") | No | Extraction mode selection. 'standard' for regular extraction, 'precision' for enhanced data discovery in HTML, 'code' for code extraction. |
| `prompt` | string | No | Additional instructions or context for the extraction process. |
| `content` | string | Yes | Raw HTML or plain text content to parse and extract data from. |
| `max_pages` | integer | No | Maximum number of pages to parse when pagination is enabled. |
| `attributes` | string | Yes | Field mapping defining what data to extract. Can be formatted as: (1) Object with name-description pairs: {'title': 'News title'}, (2) Object with detailed schema: {'title': {'description': 'Article title', 'type': 'string'}}, or (3) Array of Attribute objects with name, description, and type fields. Supported types: string, integer, number, bool, list, object, any, image. Default type: any. |
| `enable_pagination` | boolean | No | Enable multi-page parsing. When enabled, the parser will attempt to extract data across multiple pages. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Remove Agent

**Slug:** `PARSERA_REMOVE_AGENT`

Tool to delete an existing agent by name. Use when you need to remove a previously created agent from the Parsera platform.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `name` | string | Yes | Name of the agent to be removed. This should match the name used when the agent was created. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Run Scraper Template

**Slug:** `PARSERA_RUN_SCRAPER_TEMPLATE`

Tool to run a scraper template on a specified URL with optional proxy and cookies. Use when you need to execute a pre-defined scraper template to extract structured data from web pages.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | No | Target URL(s) for extraction. Can be a single URL string, an array of URL strings, or null. When multiple URLs are provided, the scraper will process each URL independently. |
| `cookies` | array | No | Browser cookies to include with the request. Useful for authenticated sessions or maintaining state. |
| `template_id` | string | Yes | Identifier for the template or scraper to execute. Template IDs prefixed with 'scraper:' route to the legacy scrapers API, while others execute as template extractions. |
| `proxy_country` | string | No | Country code for proxy routing during requests. Use to route requests through a proxy server in a specific country. See GET /v1/proxy-countries for available country codes. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |

### Scrape With Agent

**Slug:** `PARSERA_SCRAPE_WITH_AGENT`

Tool to run a previously generated scraper agent on a specific URL to extract structured data. Use when you need to apply an existing scraper to a webpage.

#### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | string | Yes | The webpage URL where data extraction will occur. |
| `name` | string | Yes | Identifier for the agent/scraper to use. For pre-built agents, prefix with 'public/' (e.g., 'public/hackernews', 'public/crunchbase'). |
| `cookies` | array | No | Authentication or session cookies for the request. Each cookie should be a dictionary with 'name' and 'value' fields. |
| `proxy_country` | string | No | Geographic location for proxy routing (e.g., 'UnitedStates', 'UnitedKingdom'). Default: 'UnitedStates'. |

#### Output

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `data` | string | Yes | Data from the action execution |
| `error` | string | No | Error if any occurred during the execution of the action |
| `successful` | boolean | Yes | Whether or not the action execution was successful or not |
