swimmy

package module
v1.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2019 License: BSD-3-Clause Imports: 22 Imported by: 0

README

swimmy

Overview

Swimmy is the tool that pull meta info from url and write info to html or json.

It is a package that fetch URL Info and process it. It is for embedding external site information as card or outputting as JSON format.

suquiya is a beginner of go programming, so pull requests and issues are appropriated. Please help suquiya......

Description

This package contain cli tool and library.

Swimmy can create embed card as below:

  • First: swimmy fetch information of website specified in given URL(html document and text document)
  • Second: swimmy sanitize html contents and parse it in order to get the information of webpage
  • Third: After Parse, swimmy get PageData of the website and create Card or output information as JSON format.

Of cource, Swimmy is not only cli tool but also library. Using as library(package), you can use function of parts of swimmy.

Usage

To install,

go get -u github.com/suquiya/swimmy/swimmy
Command
swimmy url <output> <flags>

Flags: + -i, --IfOutputExist: define behavitor in case that output file specified by user is already exist + -f, --format: user can choose html or json + -l, --list: If this flag is true, url is interpreted as path to file listing urls.

More details, see swimmy's help

Package

User can choose this as library.

Example:

url, ctype, content, _ := swimmy.Fetch(url)
pd := swimmy.BuildPageData(url, ctype, string(content))
jbyte, _ := pd.ToJSON()
cb := swimmy.DefaultCardBuilder
/*
user can also set like below.
cb := NewCardBuilder(cardTemplate, classNames)
*/
cb.Execute(pd, w)   //w is io.Writer

To Do:

  • Implement Test
  • Add Function about favicon and images
  • improve readme

Documentation

Overview

Package swimmy is a package that fetch and process URL Info for embedding external site information as card or outputting as JSON. First: swimmy fetch url information (html document and text document). Second: swimmy sanitize html contents and parse it in order to get the information of webpage.

Index

Constants

View Source
const (
	//IsNotURLError represents error occur from input is not URL
	IsNotURLError = iota
	//BadEncodeError represents url contents is not encoded with encode methods that swimmy can handle
	BadEncodeError
	//InvalidContentTypeError represents content is not html or text
	InvalidContentTypeError
	//StatusError is StatusError
	StatusError
)

Variables

View Source
var IDCount int

IDCount count of PageData's ID

Functions

func CPolicy

func CPolicy() *bluemonday.Policy

CPolicy return default policy of swimmy

func CommentifyString

func CommentifyString(input string) string

CommentifyString commentify string inspired by cobra's commentifyString

func DefaultClasses

func DefaultClasses() map[string]string

DefaultClasses return default classNames in card as map

func DefaultTemplate

func DefaultTemplate() *template.Template

DefaultTemplate return swimmy's default template

func EscapeBytes

func EscapeBytes(str []byte) string

EscapeBytes html escape byte array

func ExecLicenseTextTemp

func ExecLicenseTextTemp(templateStr string, data interface{}) (string, error)

ExecLicenseTextTemp exec template using templateStr and data

func Fetch

func Fetch(url string) (string, string, []byte, error)

Fetch with DefaultContentFetcher

func Init

func Init()

Init is initialize function. If you want to use default variables, use this function.

func IsExistFilePath

func IsExistFilePath(val string) (bool, error)

IsExistFilePath validate whether val is exist filepath or not and confirm that it exist and it is not directory.

func IsFilePath

func IsFilePath(val string) (bool, error)

IsFilePath is validate filepath

func IsPlainTextContentType

func IsPlainTextContentType(ctype string) bool

IsPlainTextContentType return whether given contentType represents text/plain or not

func ParseTime

func ParseTime(timeStr string) (*time.Time, string, error)

ParseTime parse time string. This function test four formats following ISO8601.

func ReadList

func ReadList(listPath string) ([]string, error)

ReadList read list with newline-delimited.

func Sanitize

func Sanitize(htmlContent string, policy ...*bluemonday.Policy) string

Sanitize sanitize html or txt with blueMonday

func ShowLicense

func ShowLicense() string

ShowLicense show license of swimmy

func TPolicy

func TPolicy() *bluemonday.Policy

TPolicy return default tag policy of swimmy

func TakeMarkedUpText

func TakeMarkedUpText(ct *html.Tokenizer, tagName []byte) string

TakeMarkedUpText is take marked-up text between begin tag and end tag

func WriteCurrentString

func WriteCurrentString(tokenizer *html.Tokenizer, tokenType html.TokenType, sb *strings.Builder)

WriteCurrentString write string of now tag or text to strings.Builder

func WriteHTML

func WriteHTML(pd *PageData, cb *CardBuilder, w io.Writer, messageWriter io.Writer, hasPrev bool) error

WriteHTML create html from pagedata and write it to w.

func WriteJSON

func WriteJSON(pd *PageData, w io.Writer, messageWriter io.Writer, hasPrev bool) error

WriteJSON write json data from PageData. This is used in cli tool. This write data to w and return error if error occur.

Types

type CardBuilder

type CardBuilder struct {
	CardTemplate *template.Template
	ClassNames   map[string]string
}

CardBuilder build card string from pagedata

var DefaultCardBuilder *CardBuilder

DefaultCardBuilder is swimmy's default CardDataBuilder

func NewCardBuilder

func NewCardBuilder(cardtemplate *template.Template, classnames map[string]string) *CardBuilder

NewCardBuilder create a empty instance of CardBuilder and return it

func (*CardBuilder) Execute

func (cb *CardBuilder) Execute(pd *PageData, w io.Writer) error

Execute build card by execute html template.

func (*CardBuilder) WriteCardHTML

func (cb *CardBuilder) WriteCardHTML(pd *PageData, w io.Writer, minify bool)

WriteCardHTML write card html tag.

type ContentFetcher

type ContentFetcher struct {
	HTTPClient *http.Client
}

ContentFetcher fetch net content in Fetch(url string)

var DefaultContentFetcher *ContentFetcher

DefaultContentFetcher is swimmy's defaultContentFetcher

func NewContentFetcher

func NewContentFetcher(HTTPClient *http.Client) *ContentFetcher

NewContentFetcher create new instance of ContentFetcher

func (*ContentFetcher) Fetch

func (cf *ContentFetcher) Fetch(url string) (string, string, []byte, error)

Fetch fetch url contents with f's HTTPClient. If you don't set your custom http client, Fetch use DefaultClient of net/http package.

type FetchError added in v1.0.1

type FetchError struct {
	ErrorType int
	// contains filtered or unexported fields
}

FetchError is error struct for fetch

func NewFetchError added in v1.0.1

func NewFetchError(t int, s string) *FetchError

NewFetchError create fetch error

func (*FetchError) Error added in v1.0.1

func (fe *FetchError) Error() string

Error is implement of fetcherror for error interface

type ImageData

type ImageData struct {
	URL        string `json:"URL"`
	SecureURL  string `json:"SecureURL"`
	FormatType string `json:"FormatType"`
	AltText    string `json:"AltText"`
	Width      int    `json:"Width"`
	Height     int    `json:"Height"`
}

ImageData storage properties of image

func CreateImageData

func CreateImageData(url, secureURL, formatType, alt string, width, height int) *ImageData

CreateImageData return new instance of ImageData

func NewImageData

func NewImageData() *ImageData

NewImageData return new initialized(emply) instance of ImageData

type OpenGraphProtocol

type OpenGraphProtocol struct {
	URL          string            `json:"URL"`
	SiteName     string            `json:"SiteName"`
	Title        string            `json:"Title"`
	Description  string            `json:"Description"`
	Locale       string            `json:"Locale"`
	Type         string            `json:"Type"`
	OgImage      *ImageData        `json:"OgImage"`
	TwitterImage *ImageData        `json:"TwitterImage"`
	TwitterID    string            `json:"TwitterID"`
	UpdatedTime  *time.Time        `json:"UpdatedTime"`
	OtherAttrs   map[string]string `json:"OtherAttrs"`
	OtherInfo    map[string]string `json:"OtherInfo"`
}

OpenGraphProtocol is strage for open graph protocol. OpenGraphProtocol in swimmy is only for creating data for embedding in website, so it does not storage video and music.

func NewOGP

func NewOGP() *OpenGraphProtocol

NewOGP return new instance of OGP

func (*OpenGraphProtocol) Set

func (ogp *OpenGraphProtocol) Set(nameAttr, contentAttr string)

Set set meta values to ogp fields. contentAttr is assumed after sanitizing.

type PageData

type PageData struct {
	URL           string             `json:"URL"`
	ID            int                `json:"ID"`
	CannonicalURL string             `json:"CannonicalURL"`
	ContentType   string             `json:"ContentType"`
	Title         string             `json:"Title"`
	Description   string             `json:"Description"`
	FaviconURL    []string           `json:"FaviconURL"`
	OGP           *OpenGraphProtocol `json:"OGP"`
}

PageData is a struct for storage data(information) of web page specified with url in order to create embed card or json data

func BuildPageData

func BuildPageData(url string, ctype string, htmlContent string) *PageData

BuildPageData build pagedata on base pagedata

func ErrorPageData added in v1.0.1

func ErrorPageData(url, ctype string, content []byte, err error) *PageData

ErrorPageData return pagedata if get err in fetch.

func FetchAndBuildPageData

func FetchAndBuildPageData(URL string, messageWriter io.Writer) (*PageData, error)

FetchAndBuildPageData fetch information about url and build pagedata

func NewPageData

func NewPageData(url string, ctype string) *PageData

NewPageData return new instance of PageData

func (*PageData) ComplementBasicFields

func (pd *PageData) ComplementBasicFields()

ComplementBasicFields complement pagedata basic fields if some basic field is empty.

func (*PageData) IsPlainText

func (pd *PageData) IsPlainText() bool

IsPlainText return whether pagedata is text/plain or not.

func (*PageData) ToJSON

func (pd *PageData) ToJSON() ([]byte, error)

ToJSON convert PageData to json.

type PageDataBuilder

type PageDataBuilder struct {
	PreSanitizePolicy        *bluemonday.Policy
	TagContentSanitizePolicy *bluemonday.Policy
}

PageDataBuilder is processer for creating pagedata

var DefaultPageDataBuilder *PageDataBuilder

DefaultPageDataBuilder is swimmy's default PageDataBuilder

func NewPageDataBuilder

func NewPageDataBuilder(PrePolicy, tagContentPolicy *bluemonday.Policy) *PageDataBuilder

NewPageDataBuilder generate New instance of PageDataBuilder

func (*PageDataBuilder) BuildPageData

func (p *PageDataBuilder) BuildPageData(url string, ctype string, htmlContent string) *PageData

BuildPageData parse html content, retrieve tag info and fill PageData. Before parsing, Parse sanitize html content with its SanitizePolicy.

func (*PageDataBuilder) Sanitize

func (p *PageDataBuilder) Sanitize(htmlContent string) string

Sanitize sanitize html content with p's sanitize policy.

func (*PageDataBuilder) TagContentSanitize

func (p *PageDataBuilder) TagContentSanitize(str string) string

TagContentSanitize sanitize content of tag

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL