dataflowkit: github.com/slotix/dataflowkit

Directories

PathSynopsis
cmdPackage cmd of the Dataflow kit contains the following CLI daemons:
cmd/fetch.cliFetcher CLI of the Dataflow kit downloads html content from web pages via Fetcher service endpoint.
cmd/fetch.dFetcher service of the Dataflow kit downloads html content from web pages to feed Dataflow kit scrapers.
cmd/parse.dParse service of the Dataflow kit parses html content from web pages following the rules described in configuration JSON file.
errsPackage errs of the Dataflow kit allows to create more detailed errors than with errors.New() or fmt.Errorf().
extractPackage extract of the Dataflow kit describes available extractors to retrieve a structured data from html web pages.
fetchPackage fetch of the Dataflow kit is used by fetch.d service which downloads html content from web pages to feed Dataflow kit scrapers.
healthcheckPackage healthcheck of the Dataflow kit checks if specified services are alive.
loggerPackage log of the Dataflow kit implements modified sirupsen/logrus logger enabling to show Log filename and line number.
paginatePackage paginate of the Dataflow kit describes Paginator interface to retrieve the next page from the current one.
parsePackage parse of the Dataflow kit is used by parse.d service which parses html content from web pages following the rules described in Payload JSON file.
scrapePackage scrape of the Dataflow kit is for structured data extraction from webpages starting from JSON payload processing to encoding scraped data to one of output formats like JSON, CSV, XML
storagePackage storage of the Dataflow kit describes Store interface for read/ write operations with downloaded data and parsed results.
testserver
utilsPackage utils of the Dataflow kit includes various functions and helpers to be used by other packages.

Updated 2018-09-20. Refresh now. Tools for package owners.