crawl-cache

command module

v0.0.0-...-9c1906e Latest Latest Go to latest Published: Mar 24, 2017 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/crackcomm/crawl-cache

Links

Open Source Insights

README ¶

crawl-cache

NSQ crawl queue interceptor caching requests.

Ignores http://, https://, www. prefixes.

Usage

Example usage from command line:

# Install command line application for crawl scheduling
$ go install github.com/crackcomm/crawl/nsq/crawl-schedule
# It will consumer `google_search_cache` and produce `google_search`
$ crawl-cache --topic google_search_cache:google_search &
# Schedule crawl of google search results
$ crawl-schedule \
      --topic google_search_cache \
      --callback github.com/crackcomm/go-google-search/spider.Google \
      "https://www.google.com/search?q=Github"

Callbacks are currently ignored, only URLs are cached.

License

                             Apache License
                       Version 2.0, January 2004
                    http://www.apache.org/licenses/

Authors

Łukasz Kurowski

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cache
leveldb

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL