Documentation ¶
Index ¶
- type Content
- type Extractor
- func (e *Extractor) Dump() []Rule
- func (e *Extractor) Extract(r io.Reader, rawurl string) (*Content, error)
- func (e *Extractor) ExtractURL(rawurl string) (*Content, error)
- func (e *Extractor) Init()
- func (e *Extractor) Load(r io.Reader) error
- func (e *Extractor) LoadFile(path string) error
- func (e *Extractor) Match(rawurl string) *Rule
- type IExtractor
- type Rule
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Extractor ¶
type Extractor struct {
// contains filtered or unexported fields
}
Extractor is actual extractor
func (*Extractor) Extract ¶
Extract load body from io.Reader and extract contents by rule. If URL is not match wedata rule, skip it and return nil. io.Reader should be UTF-8 byte stream.
func (*Extractor) ExtractURL ¶
ExtractURL fetch contents from URL, and parse it, and extract.
type IExtractor ¶
IExtractor is interface of extractor.
type Rule ¶
type Rule struct { ResourceURL string `json:"resource_url"` Name string `json:"name"` CreatedBy string `json:"created_by"` DatabaseResourceURL string `json:"database_resource_url"` UpdatedAt time.Time `json:"updated_at"` CreatedAt time.Time `json:"created_at"` Data struct { URL string `json:"url"` // URL is the regex pattern of target pages Type string `json:"type"` Enc string `json:"enc"` // Enc is encoding of the page contents XPath string `json:"xpath"` Base string `json:"base"` MicroFormats string `json:"microformats"` } // contains filtered or unexported fields }
Rule is
Click to show internal directories.
Click to hide internal directories.