cloudcat

package module
v0.4.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 22, 2023 License: AGPL-3.0 Imports: 19 Imported by: 2

README

Cloudcat

GitHub go.mod Go version Go Report Card GitHub
Cloudcat is a tool for extracting structured data from websites using extensible YAML syntax rules.
Before v1.0.0 is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Documentation

See Wiki

License

cloudcat is distributed under the AGPL-3.0 license.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrInvalidSchema invalid schema error
	ErrInvalidSchema = errors.New("invalid schema")
	// ErrAliasRecursive invalid alias error
	ErrAliasRecursive = errors.New("alias can't be recursive")
	// ErrInvalidAction invalid action error
	ErrInvalidAction = errors.New("invalid action")
	// ErrInvalidStep invalid step error
	ErrInvalidStep = errors.New("invalid step")
)

Functions

func Analyze

func Analyze(ctx *plugin.Context, s *Schema, content string) any

Analyze analyze a schema.Schema, returns the result

func CookieToString

func CookieToString(cookies []*http.Cookie) []string

CookieToString returns the slice string of the slice http.Cookie.

func EmptyOr

func EmptyOr[T any](value, defaultValue []T) []T

EmptyOr if slice is empty returns the defaultValue

func GetElement

func GetElement(act Action, ctx *plugin.Context, content any) (string, error)

GetElement run the action returns an element string

func GetElements

func GetElements(act Action, ctx *plugin.Context, content any) ([]string, error)

GetElements run the action returns a slice of element string

func GetString

func GetString(act Action, ctx *plugin.Context, content any) (string, error)

GetString run the action returns a string

func GetStrings

func GetStrings(act Action, ctx *plugin.Context, content any) ([]string, error)

GetStrings run the action returns a slice of string

func MustResolve

func MustResolve[T any]() T

MustResolve get the value, if not exist create panic

func MustResolveNamed

func MustResolveNamed[T any](name string) T

MustResolveNamed get the value for the name, if not exist create panic

func Override

func Override[T any](value T) bool

Override save the value and return is it override

func OverrideLazy

func OverrideLazy[T any](initFunc func() (T, error)) bool

OverrideLazy save the value for the name and return is it override

func OverrideNamed

func OverrideNamed(name string, value any) (ok bool)

OverrideNamed save the value for the name and return is it override

func ParseCookie

func ParseCookie(cookies string) []*http.Cookie

ParseCookie parses the cookie string and return a slice http.Cookie.

func ParseSetCookie

func ParseSetCookie(cookies ...string) []*http.Cookie

ParseSetCookie parses the set-cookie strings and return a slice http.Cookie.

func Provide

func Provide[T any](value T) bool

Provide save the value and return is it saved

func ProvideLazy

func ProvideLazy[T any](initFunc func() (T, error)) bool

ProvideLazy save the lazy init value and return is it saved

func ProvideNamed

func ProvideNamed(name string, value any) (ok bool)

ProvideNamed save the value for the name and return is it saved

func ProxyFromRequest

func ProxyFromRequest(req *http.Request) (*url.URL, error)

ProxyFromRequest returns a proxy URL on request context.

func Resolve

func Resolve[T any]() (T, error)

Resolve get the value, if not exist returns error

func ResolveNamed

func ResolveNamed[T any](name string) (value T, err error)

ResolveNamed get the value for the name if not exist returns error

func SetFormatter

func SetFormatter(formatHandler FormatHandler)

SetFormatter set the formatter

func WithProxyURL

func WithProxyURL(ctx context.Context, proxy *url.URL) context.Context

WithProxyURL returns a copy of parent context in which the proxy associated with context.

func ZeroOr

func ZeroOr[T comparable](value, defaultValue T) T

ZeroOr if value is zero value returns the defaultValue

Types

type Action

type Action interface {
	// Left returns the left Action
	Left() Action
	// Right returns the right Action
	Right() Action
}

Action The Schema Action

type And

type And struct {
	// contains filtered or unexported fields
}

And Action node of Operator and

func NewAnd

func NewAnd(left, right Action) *And

NewAnd create new And action with left and right Action

func (And) Left

func (a And) Left() Action

func (And) MarshalYAML

func (a And) MarshalYAML() (any, error)

func (And) Right

func (a And) Right() Action

func (And) String

func (a And) String() string

type Cache

type Cache interface {
	Get(key string, opts ...CacheOptions) ([]byte, bool)
	Set(key string, value []byte, opts ...CacheOptions)
	Del(key string, opts ...CacheOptions)
}

A Cache interface is used to store bytes.

func NewCache

func NewCache() Cache

NewCache returns a new Cache that will store items in in-memory.

type CacheOptions

type CacheOptions struct {
	// Timeout the key expire time.
	Timeout time.Duration
	// Context
	Context context.Context
}
type Cookie interface {
	http.CookieJar

	// CookieString returns the cookies string for the given URL.
	CookieString(u *url.URL) []string
	// DeleteCookie delete the cookies for the given URL.
	DeleteCookie(u *url.URL)
}

Cookie manages storage and use of cookies in HTTP requests. Implementations of Cookie must be safe for concurrent use by multiple goroutines.

func NewCookie

func NewCookie() Cookie

NewCookie returns a new Cookie that will store cookies in in-memory.

type Fetch

type Fetch interface {
	// Do sends an HTTP request and returns an HTTP response, following
	// policy (such as redirects, cookies, auth) as configured on the
	// client.
	Do(*http.Request) (*http.Response, error)
}

Fetch http client interface

type FormatHandler

type FormatHandler interface {
	// Format the data to the given Type
	Format(data any, format Type) (any, error)
}

FormatHandler schema property formatter

func GetFormatter

func GetFormatter() FormatHandler

GetFormatter get the formatter

type Not

type Not struct {
	// contains filtered or unexported fields
}

Not Action node of Operator not

func NewNot

func NewNot(left, right Action) *Not

NewNot create new Not action with left and right Action

func (Not) Left

func (a Not) Left() Action

func (Not) MarshalYAML

func (a Not) MarshalYAML() (any, error)

func (Not) Right

func (a Not) Right() Action

func (Not) String

func (a Not) String() string

type Operator

type Operator string

Operator The Action operator.

const (
	// OperatorAnd The Operator of and.
	// Action result A, B; Join the A + B.
	OperatorAnd Operator = "and"
	// OperatorOr The Operator of or.
	// Action result A, B; if result A is nil return B else return A.
	OperatorOr Operator = "or"
	// OperatorNot The Operator of not.
	// Action result A, B; if result A is not nil return B else return nil.
	OperatorNot Operator = "not"
)

type Or

type Or struct {
	// contains filtered or unexported fields
}

Or Action node of Operator or

func NewOr

func NewOr(left, right Action) *Or

NewOr create new Or action with left and right Action

func (Or) Left

func (a Or) Left() Action

func (Or) MarshalYAML

func (a Or) MarshalYAML() (any, error)

func (Or) Right

func (a Or) Right() Action

func (Or) String

func (a Or) String() string

type Property

type Property map[string]Schema

Property The Schema property.

type Schema

type Schema struct {
	Type       Type     `yaml:"type"`
	Format     Type     `yaml:"format,omitempty"`
	Init       Action   `yaml:"init,omitempty"`
	Rule       Action   `yaml:"rule,omitempty"`
	Properties Property `yaml:"properties,omitempty"`
}

Schema The schema.

func NewSchema

func NewSchema(types ...Type) *Schema

NewSchema returns a new Schema with the given Type. The first argument is the Schema.Type, second is the Schema.Format.

func (*Schema) AddProperty

func (schema *Schema) AddProperty(field string, s Schema) *Schema

AddProperty append a field string with Schema to Schema.Properties.

func (*Schema) CloneWithType

func (schema *Schema) CloneWithType(typ Type) *Schema

CloneWithType returns a copy of Schema. Schema.Format and Schema.Rule will be copied.

func (Schema) MarshalText

func (schema Schema) MarshalText() ([]byte, error)

MarshalText encodes the receiver into UTF-8-encoded text and returns the result.

func (Schema) MarshalYAML

func (schema Schema) MarshalYAML() (any, error)

MarshalYAML encodes the Schema

func (*Schema) SetInit

func (schema *Schema) SetInit(action Action) *Schema

SetInit set the Init Action to Schema.Init.

func (*Schema) SetProperty

func (schema *Schema) SetProperty(m Property) *Schema

SetProperty set the Property to Schema.Properties.

func (*Schema) SetRule

func (schema *Schema) SetRule(action Action) *Schema

SetRule set the Init Action to Schema.Rule.

func (*Schema) UnmarshalText

func (schema *Schema) UnmarshalText(text []byte) error

UnmarshalText must be able to decode the form generated by MarshalText.

func (*Schema) UnmarshalYAML

func (schema *Schema) UnmarshalYAML(node *yaml.Node) (err error)

UnmarshalYAML decodes the Schema from yaml

type Step

type Step struct{ K, V string }

Step The Action of step

func (Step) MarshalYAML

func (s Step) MarshalYAML() (any, error)

MarshalYAML encodes to yaml

type Steps

type Steps []Step

Steps slice of Step

func NewSteps

func NewSteps(str ...string) *Steps

NewSteps return new Steps

func (*Steps) Left

func (s *Steps) Left() Action

Left returns the left Action

func (Steps) MarshalYAML

func (s Steps) MarshalYAML() (any, error)

func (*Steps) Right

func (s *Steps) Right() Action

Right returns the right Action

func (*Steps) UnmarshalYAML

func (s *Steps) UnmarshalYAML(value *yaml.Node) error

type Type

type Type string

Type The property type.

const (
	// StringType The Type of string.
	StringType Type = "string"
	// NumberType The Type of number.
	NumberType Type = "number"
	// IntegerType The Type of integer.
	IntegerType Type = "integer"
	// BooleanType The Type of boolean.
	BooleanType Type = "boolean"
	// ObjectType The Type of object.
	ObjectType Type = "object"
	// ArrayType The Type of array.
	ArrayType Type = "array"
)

func ToType

func ToType(s any) (Type, error)

ToType parses the schema type.

Directories

Path Synopsis
core module
ctl module
fetch module
js
Package js the JavaScript implementation
Package js the JavaScript implementation
modules/cache
Package cache the cache JS implementation
Package cache the cache JS implementation
modules/cookie
Package cookie the cookie JS implementation
Package cookie the cookie JS implementation
modules/crypto
Package crypto the crypto JS implementation
Package crypto the crypto JS implementation
modules/encoding
Package encoding the encoding JS implementation
Package encoding the encoding JS implementation
modules/http
Package http the http JS implementation
Package http the http JS implementation
modulestest
Package modulestest the module test vm
Package modulestest the module test vm
jsmodules module
gq
Package gq the goquery parser
Package gq the goquery parser
js
Package js the js parser
Package js the js parser
json
Package json the json parser
Package json the json parser
regex
Package regex the regexp parser
Package regex the regexp parser
xpath
Package xpath the xpath parser
Package xpath the xpath parser
plugin module
sample
env

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL