scrape

command module

v1.0.2 Latest Latest Go to latest Published: Feb 18, 2019 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jakewarren/scrape

Links

Open Source Insights

README ¶

scrape

A command line scraping utility inspired by scrape.

Features

Scrape using XPath or CSS selectors
Process HTML from a URL, STDIN, or a local file
Extract a particular attribute

Install

Option 1: Binary

Download the latest release from https://github.com/jakewarren/scrape/releases/latest

Option 2: From source

go get github.com/jakewarren/scrape

Usage

Usage of scrape:
  -A, --agent string   user agent string (default "Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)")
  -a, --attr string    attribute to scrape (default "html")
  -c, --css string     css selector
  -h, --help           usage information
  -k, --insecure       skip SSL verification
  -x, --xpath string   xpath query

Examples:

Read from URL:

❯ scrape -c "h4 a" -a href "https://www.webscraper.io/test-sites/e-commerce/allinone"
/test-sites/e-commerce/allinone/product/244
/test-sites/e-commerce/allinone/product/269
/test-sites/e-commerce/allinone/product/192

Read from STDIN:

❯ curl -A 'Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)' -s "https://www.webscraper.io/test-sites/e-commerce/allinone" | scrape -x "//h4/a" -a href
/test-sites/e-commerce/allinone/product/223
/test-sites/e-commerce/allinone/product/280
/test-sites/e-commerce/allinone/product/278

Read from file:

❯ scrape -x "//h4/a" /tmp/webscrapetest.html
<a href="/test-sites/e-commerce/allinone/product/223" class="title" title="Aspire E1-510">Aspire E1-510</a>
<a href="/test-sites/e-commerce/allinone/product/280" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>
<a href="/test-sites/e-commerce/allinone/product/278" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL