scrape

command module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2019 License: MIT Imports: 14 Imported by: 0

README

scrape

Build Status GitHub release MIT License Go Report Card PRs Welcome

A command line scraping utility inspired by scrape.

Features

  • Scrape using XPath or CSS selectors
  • Process HTML from a URL, STDIN, or a local file
  • Extract a particular attribute

Install

Option 1: Binary

Download the latest release from https://github.com/jakewarren/scrape/releases/latest

Option 2: From source
go get github.com/jakewarren/scrape

Usage

Usage of scrape:
  -A, --agent string   user agent string (default "Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)")
  -a, --attr string    attribute to scrape (default "html")
  -c, --css string     css selector
  -h, --help           usage information
  -k, --insecure       skip SSL verification
  -x, --xpath string   xpath query
Examples:
Read from URL:
❯ scrape -c "h4 a" -a href "https://www.webscraper.io/test-sites/e-commerce/allinone"
/test-sites/e-commerce/allinone/product/244
/test-sites/e-commerce/allinone/product/269
/test-sites/e-commerce/allinone/product/192
Read from STDIN:
❯ curl -A 'Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)' -s "https://www.webscraper.io/test-sites/e-commerce/allinone" | scrape -x "//h4/a" -a href
/test-sites/e-commerce/allinone/product/223
/test-sites/e-commerce/allinone/product/280
/test-sites/e-commerce/allinone/product/278
Read from file:
❯ scrape -x "//h4/a" /tmp/webscrapetest.html
<a href="/test-sites/e-commerce/allinone/product/223" class="title" title="Aspire E1-510">Aspire E1-510</a>
<a href="/test-sites/e-commerce/allinone/product/280" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>
<a href="/test-sites/e-commerce/allinone/product/278" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL