restify

package module
v0.0.0-...-49a397f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2024 License: Apache-2.0 Imports: 13 Imported by: 0

README

A little utility/library (written in Go) that enables REST-like access to HTML pages by scraping and parsing them into JSON.

Test

usage: restify [<flags>] <url>

Flags:
  --help                      Show context-sensitive help (also try --help-long and --help-man).
  --class=CLASS               If specified, first-level elements encountered with this class will be extracted.
  --id=ID                     If specified, the element with this id will be extracted.
  --tag=TAGNAME               If specified, the first-level element with this tag name will be extracted.
  --attribute=ATTRIBUTE       If specified, as key=value, the element with the given attribute name set to the given value is extracted.
  --version                   Print version and exit
  --debug                     Enable debugging output
  --user-agent="restify/1.4.0"  user-agent header to provide with request

Args:
  <url>  A URL to RESTify into JSON

Output Structure

When a successful URL retrieval and match occurs, the utility will output the JSON conversion to stdout.

The top-level structure is an array of jsonNode, where each jsonNode is structured as:

{
  "name":   "...element name...",
  "class":  "...class attribute, if present...",
  "id":     "...id attribute, if present...",
  "href":   "...href attribute, if present...",
  "text":   "...element text content, if present...",
  "elements": [
    ...jsonNodes, if present...
  ]
}

Examples

Locate the latest Minecraft Bedrock server version by picking off the <a>'s with data-platform set:

restify --attribute=data-platform https://www.minecraft.net/en-us/download/server/bedrock/

which produces:

[
  {"name":"a","attributes":{"data-platform":"serverBedrockWindows","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-win/bedrock-server-1.12.0.28.zip","text":"Download"},
  {"name":"a","attributes":{"data-platform":"serverBedrockLinux","role":"button"},"class":"btn btn-disabled-outline mt-4 downloadlink","href":"https://minecraft.azureedge.net/bin-linux/bedrock-server-1.12.0.28.zip","text":"Download"}
]

or to grab just the Linux instance:

restify --attribute=data-platform=serverBedrockLinux https://www.minecraft.net/en-us/download/server/bedrock/

Using as a library

The package github.com/itzg/restify provides the library functions used by the command-line utility.

Documentation

Overview

Package restify provides inter-related functions for retrieving HTML content, selecting a subset of those HTML nodes, and converting HTML nodes to a JSON representation.

The main function in the cmd package can be referenced as an example use of the functions provided here.

Index

Constants

View Source
const HttpRequestTimeout = time.Second * 60

Variables

This section is empty.

Functions

func ConvertHtmlToJson

func ConvertHtmlToJson(nodes []*html.Node) ([]byte, error)

ConvertHtmlToJson the given HTML nodes into JSON content where each HTML node is represented by the JsonNode structure.

func FindSubsetByAttributeName

func FindSubsetByAttributeName(root *html.Node, attribute string) []*html.Node

FindSubsetByAttributeName retrieves the HTML nodes that have the requested attribute, regardless of their values.

func FindSubsetByAttributeNameValue

func FindSubsetByAttributeNameValue(root *html.Node, attribute string, value string) []*html.Node

FindSubsetByAttributeNameValue retrieves the HTML nodes that have the requested attribute with a specific value.

func FindSubsetByClass

func FindSubsetByClass(root *html.Node, className string) []*html.Node

FindSubsetByClass locates the HTML nodes with the given root that have the given className.

func FindSubsetById

func FindSubsetById(root *html.Node, id string) (n *html.Node, ok bool)

FindSubsetById locates the HTML node within the given root that has an id attribute of given value. If the node is not found, then ok will be false.

func FindSubsetByTagName

func FindSubsetByTagName(root *html.Node, tagName string) []*html.Node

FindSubsetByTagName retrieves the HTML nodes with the given tagName

func LoadBuffer

func LoadBuffer(buffer []byte) (*html.Node, error)

LoadFile retrieves the HTML content from the given file URL.

func LoadContent

func LoadContent(url *url.URL, userAgent string, configs ...RequestConfig) (*html.Node, error)

LoadContent retrieves the HTML content from the given url. The userAgent is optional, but if provided should conform with https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

func LoadFile

func LoadFile(url *url.URL, userAgent string, configs ...RequestConfig) (*html.Node, error)

func LoadReader

func LoadReader(reader *io.Reader) (*html.Node, error)

Types

type JsonNode

type JsonNode struct {
	// Name is the name/tag of the element
	Name string `json:"name,omitempty"`
	// Attributes contains the attributs of the element other than id, class, and href
	Attributes map[string]string `json:"attributes,omitempty"`
	// Class contains the class attribute of the element
	Class string `json:"class,omitempty"`
	// Id contains the id attribute of the element
	Id string `json:"id,omitempty"`
	// Href contains the href attribute of the element
	Href string `json:"href,omitempty"`
	// Text contains the inner text of the element
	Text string `json:"text,omitempty"`
	// Elements contains the child elements of the element
	Elements []JsonNode `json:"elements,omitempty"`
}

JsonNode is a JSON-ready representation of an HTML node.

type RequestConfig

type RequestConfig func(*http.Request)

func WithHeaders

func WithHeaders(headers map[string]string) RequestConfig

WithHeaders configures additional headers in the request used in LoadContent

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL