head

package module
v0.0.0-...-8fc14ae Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2023 License: MIT Imports: 12 Imported by: 0

README

head

GoDoc

A go library for parsing information in a HTML head tag.

Installation

go get github.com/gpahal/head

Usage

Processing a URL

The head.ProcessURL function takes a url string and a *http.Client, makes a GET request to the url using the http client and returns a new *head.Object from the returned HTML. If client is nil, a default client is used.

object, err := head.ProcessURL("http://ogp.me", nil)
Parsing HTML

The head.ParseHTML function takes an io.Reader, reads HTML, parses the HTML and returns a new *head.Object.

resp, _ := http.Get("http://ogp.me")
// ignoring the error and other response attributes (like status code)
// for simplicity
defer resp.Body.Close()

object, err := head.ParseHTML(resp.Body)
Documentation

The complete API documentation is available on GoDoc.

License

Licensed under MIT license (LICENSE or opensource.org/licenses/MIT)

Documentation

Overview

Package head provides functions for parsing information in a HTML head tag.

The head.ProcessURL function takes a url string and a *http.Client, makes a GET request to the url using the http client and returns a new *head.Object from the returned HTML. If client is nil, a default client is used.

object, err := head.ProcessURL("http://ogp.me", nil)

The head.ParseHTML function takes an io.Reader, reads HTML, parses the HTML and returns a new *head.Object.

resp, _ := http.Get("http://ogp.me")
// ignoring the error and other response attributes (like status code)
// for simplicity
defer resp.Body.Close()

object, err := head.ParseHTML(resp.Body)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultGetHTML

func DefaultGetHTML(r *http.Request) (string, error)

DefaultGetHTML is used to extract the HTML from an HTTP request. It simply reads the HTTP request body and returns it.

func DefaultGetURL

func DefaultGetURL(r *http.Request) (string, error)

DefaultGetURL is used to extract the URL from an HTTP request. It first tries to read the `url` query paramter and then the HTTP request body. Whichever is present is used as the URL that needs to be processed.

func DefaultWriteResponse

func DefaultWriteResponse(w http.ResponseWriter, obj *Object)

DefaultWriteResponse serializes *Object as JSON and writes it to w.

Types

type ContentTypeNotHTMLError

type ContentTypeNotHTMLError string

ContentTypeNotHTMLError is returned when the Content-Type HTTP response header indicates that the content is not HTML.

func (ContentTypeNotHTMLError) Error

func (e ContentTypeNotHTMLError) Error() string

type HTMLHandler

type HTMLHandler struct {
	// GetHTML is used to extract the HTML string from an HTTP request. If
	// GetHTML is nil, DefaultGetHTML is used.
	GetHTML func(r *http.Request) (string, error)

	// WriteResponse is used by HTMLHandler to produce an HTTP response. It
	// transforms and serializes *Object and writes it to w. If WriteResponse
	// is nil, DefaultWriteResponse is used.
	WriteResponse func(w http.ResponseWriter, obj *Object)
}

HTMLHandler is an HTTP handler that responds with a serialized *Object after processing an HTML string's <head> tag. This HTML string is obtained from the HTTP request. It can provided as query parameter or in the HTTP body. GetHTML function determines how the HTML string is extracted from an HTTP request.

func (*HTMLHandler) ServeHTTP

func (h *HTMLHandler) ServeHTTP(w http.ResponseWriter, r *http.Request)

type InvalidContentTypeError

type InvalidContentTypeError string

InvalidContentTypeError is returned when the Content-Type HTTP response header is invalid and cannot be parsed.

func (InvalidContentTypeError) Error

func (e InvalidContentTypeError) Error() string

type InvalidStatusCodeError

type InvalidStatusCodeError int

InvalidStatusCodeError is returned when the HTTP response status code is < 200 or >= 300. This is the final HTTP response after the http.Client has followed the redirect policy.

func (InvalidStatusCodeError) Error

func (e InvalidStatusCodeError) Error() string
type Link struct {
	HREF  string
	Type  string
	Title string
}

Link represents information attached to a <link> element.

type Object

type Object struct {
	Title   string `json:"title"`
	Base    string `json:"base"`
	Charset string `json:"charset"`

	Links map[string][]*Link `json:"links"`
	Metas map[string]string  `json:"metas"`
}

Object represents parsed HTML elements inside the <head> tag.

func ParseHTML

func ParseHTML(buf io.Reader) (*Object, error)

ParseHTML takes an io.Reader, reads HTML, parses the HTML and returns a new *Object.

func ProcessURL

func ProcessURL(urlString string, client *http.Client) (*Object, error)

ProcessURL makes a GET request to the url using the http client and returns a new *Object from the returned HTML. If client is nil, a default client is used.

type URLHandler

type URLHandler struct {
	// Client is the *http.Client used to make HTTP requests.
	Client *http.Client

	// GetURL is used to extract the URL from an HTTP request. If GetURL is
	// nil, DefaultGetURL is used.
	GetURL func(r *http.Request) (string, error)

	// WriteResponse is used by URLHandler to produce an HTTP response. It
	// transforms and serializes *Object and writes it to w. If WriteResponse
	// is nil, DefaultWriteResponse is used.
	WriteResponse func(w http.ResponseWriter, obj *Object)
}

URLHandler is an HTTP handler that responds with a serialized *Object after processing a URL's <head> tag. This URL is obtained from the HTTP request. It can provided as query parameter or in the HTTP body. GetURL function determines how the URL is extracted from an HTTP request.

func (*URLHandler) ServeHTTP

func (h *URLHandler) ServeHTTP(w http.ResponseWriter, r *http.Request)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL