---
title: "Website"
lang: "en-US"
draft: false
description: "Learn about how to set up a VDP Website connector https://github.com/instill-ai/instill-core"
---
The Website component is a data connector that allows users to scrape websites.
It can carry out the following tasks:
- [Scrape Website](#scrape-website)
## Release Stage
`Alpha`
## Configuration
The component configuration is defined and maintained [here](https://github.com/instill-ai/component/blob/main/pkg/connector/website/v0/config/definition.json).
## Supported Tasks
### Scrape Website
Scrape the website contents.
| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_SCRAPE_WEBSITE` |
| Query (required) | `target_url` | string | The root URL to scrape. All links on this page will be scraped, and all links on those pages, and so on. |
| Allowed Domains | `allowed_domains` | array[string] | A list of domains that are allowed to be scraped. If empty, all domains are allowed. |
| Max Number of Pages (required) | `max_k` | integer | The max number of pages to return. If the number is set to 0, all pages will be returned. If the number is set to a positive integer, at most max k pages will be returned. |
| Include Link Text | `include_link_text` | boolean | Indicate whether to scrape the link and include the text of the link associated with this page in the 'link_text' field |
| Include Link HTML | `include_link_html` | boolean | Indicate whether to scrape the link and include the raw HTML of the link associated with this page in the 'link_html' field |
| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Pages | `pages` | array[object] | The scraped webpages |
type ScrapeWebsiteInput struct {
// TargetURL: The URL of the website to scrape. TargetURL string `json:"target_url"`
// AllowedDomains: The list of allowed domains to scrape. AllowedDomains []string `json:"allowed_domains"`
// MaxK: The maximum number of pages to scrape. MaxK int `json:"max_k"`
// IncludeLinkText: Whether to include the scraped text of the scraped web page. IncludeLinkText *bool `json:"include_link_text"`
// IncludeLinkHTML: Whether to include the scraped HTML of the scraped web page. IncludeLinkHTML *bool `json:"include_link_html"`
}
ScrapeWebsiteInput defines the input of the scrape website task