restify

package module
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 15, 2021 License: Apache-2.0 Imports: 9 Imported by: 1

README

A little utility/library (written in Go) that enables REST-like access to HTML pages by scraping and parsing them into JSON.

CircleCI

usage: restify [<flags>] <url>

Flags:
  --help                      Show context-sensitive help (also try --help-long and --help-man).
  --class=CLASS               If specified, first-level elements encountered with this class will be extracted.
  --id=ID                     If specified, the element with this id will be extracted.
  --attribute=ATTRIBUTE       If specified, as key=value, the element with the given attribute name set to the given value is extracted.
  --version                   Print version and exit
  --debug                     Enable debugging output
  --user-agent="restify/1.4.0"  user-agent header to provide with request

Args:
  <url>  A URL to RESTify into JSON

Output Structure

When a successful URL retrieval and match occurs, the utility will output the JSON conversion to stdout.

The top-level structure is an array of jsonNode, where each jsonNode is structured as:

{
  "name":   "...element name...",
  "class":  "...class attribute, if present...",
  "id":     "...id attribute, if present...",
  "href":   "...href attribute, if present...",
  "text":   "...element text content, if present...",
  "elements": [
    ...jsonNodes, if present...
  ]
}

Examples

Locate the latest Minecraft Bedrock server version by picking off the <a>'s with data-platform set:

restify --attribute=data-platform https://www.minecraft.net/en-us/download/server/bedrock/

which produces:

[
  {"name":"a","attributes":{"data-platform":"serverBedrockWindows","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-win/bedrock-server-1.12.0.28.zip","text":"Download"},
  {"name":"a","attributes":{"data-platform":"serverBedrockLinux","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-linux/bedrock-server-1.12.0.28.zip","text":"Download"}
]

or to grab just the Linux instance:

restify --attribute=data-platform=serverBedrockLinux https://www.minecraft.net/en-us/download/server/bedrock/

Using as a library

The package github.com/itzg/restify provides the library functions used by the command-line utility.

Documentation

Overview

Package restify provides inter-related functions for retrieving HTML content, selecting a subset of those HTML nodes, and converting HTML nodes to a JSON representation.

The main function in the cmd package can be referenced as an example use of the functions provided here.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertHtmlToJson

func ConvertHtmlToJson(nodes []*html.Node) ([]byte, error)

ConvertHtmlToJson the given HTML nodes into JSON content where each HTML node is represented by the JsonNode structure.

func FindSubsetByAttributeName

func FindSubsetByAttributeName(root *html.Node, attribute string) []*html.Node

FindSubsetByAttributeName retrieves the HTML nodes that have the requested attribute, regardless of their values.

func FindSubsetByAttributeNameValue

func FindSubsetByAttributeNameValue(root *html.Node, attribute string, value string) []*html.Node

FindSubsetByAttributeNameValue retrieves the HTML nodes that have the requested attribute with a specific value.

func FindSubsetByClass

func FindSubsetByClass(root *html.Node, className string) []*html.Node

FindSubsetByClass locates the HTML nodes with the given root that have the given className.

func FindSubsetById

func FindSubsetById(root *html.Node, id string) (n *html.Node, ok bool)

FindSubsetById locates the HTML node within the given root that has an id attribute of given value. If the node is not found, then ok will be false.

func LoadContent

func LoadContent(url *url.URL, userAgent string, configs ...RequestConfig) (*html.Node, error)

LoadContent retrieves the HTML content from the given url. The userAgent is optional, but if provided should conform with https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

Types

type JsonNode

type JsonNode struct {
	// Name is the name/tag of the element
	Name string `json:"name,omitempty"`
	// Attributes contains the attributs of the element other than id, class, and href
	Attributes map[string]string `json:"attributes,omitempty"`
	// Class contains the class attribute of the element
	Class string `json:"class,omitempty"`
	// Id contains the id attribute of the element
	Id string `json:"id,omitempty"`
	// Href contains the href attribute of the element
	Href string `json:"href,omitempty"`
	// Text contains the inner text of the element
	Text string `json:"text,omitempty"`
	// Elements contains the child elements of the element
	Elements []JsonNode `json:"elements,omitempty"`
}

JsonNode is a JSON-ready representation of an HTML node.

type RequestConfig added in v1.5.0

type RequestConfig func(*http.Request)

func WithHeaders added in v1.5.0

func WithHeaders(headers map[string]string) RequestConfig

WithHeaders configures additional headers in the request used in LoadContent

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL