ganda

command module
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2020 License: Apache-2.0 Imports: 8 Imported by: 0

README

What is ganda?

Ganda lets you make HTTP/HTTPS requests to hundreds to millions of URLs in just a few minutes.

It's designed with the unix philosophy of "do one thing well" and wants to be used in a chain of command line pipes to make its requests in parallel.

By default, it will echo all response bodies to standard out but can optionally save the results of each request in a directory for later analysis.

Given a file with a list of IDs in it, you could do something like:

cat id_list.txt | awk '{printf "https://api.example.com/resource/%s?apikey=foo\n", $1}' | ganda

and that will pipe a stream of urls into ganda in the format https://api.example.com/resource/<ID>?apikey=foo.

Alternatively, if you have a file full of urls (one per line), you can just tell ganda to run that:

ganda my_file_of_urls.txt

If you give ganda a -o <directory name> parameter, it will save the body of each in a separate file inside <directory name>. If you want a single file, just pipe stdout the normal way ... | ganda > result.txt.

For many more examples, see "Using HTTP APIs on the Command Line - Part 3 - ganda".

Installing

You currently have 3 options:

  1. on MacOS you can install with homebrew
brew tap tednaleid/homebrew-ganda
brew install ganda
  1. download the appropriate binary from the releases page and put it in your path

  2. Compile from source with golang:

go get -u github.com/tednaleid/ganda

to install in your $GOPATH/bin (which you want in your $PATH)

Usage

ganda help
NAME:
   ganda

USAGE:
   ganda [options] [file of urls/requests]  OR  <urls/requests on stdout> | ganda [options]

VERSION:
   1.7

DESCRIPTION:
   Pipe urls to ganda over stdout or give it a file with one url per line for it to make http requests to each url in parallel.

AUTHOR:
   Ted Naleid <contact@naleid.com>

COMMANDS:
     help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --output value, -o value         the output base directory to save downloaded files, if omitted will stream response bodies to stdout
   --request value, -X value        HTTP request method to use (default: "GET")
   --header value, -H value         headers to send with every request, can be used multiple times (gzip and keep-alive are already there)
   --data-template value, -d value  template string (or literal string) for the body, can use %s placeholders that will be replaced by fields 1..N from the input (all fields on a line after the url), '%%' can be used to insert a single percent symbol
   --workers value, -W value        number of concurrent workers that will be making requests, increase this for more requests in parallel (default: 1)
   --response-workers value         number of concurrent workers that will be processing responses, if not specified will be same as --workers (default: 0)
   --subdir-length value, -S value  length of hashed subdirectory name to put saved files when using -o; use 2 for > 5k urls, 4 for > 5M urls (default: 0)
   --connect-timeout value          number of seconds to wait for a connection to be established before timeout (default: 10)
   --throttle value, -t value       max number of requests to process per second, default is unlimited (default: -1)
   --insecure, -k                   if flag is present, skip verification of https certificates
   --silent, -s                     if flag is present, omit showing response code for each url only output response bodies
   --no-color                       if flag is present, don't add color to success/warn messages
   --json-envelope                  EXPERIMENTAL: emit result with JSON envelope with url, status, length, and body fields, assumes result is valid json
   --hash-body                      EXPERIMENTAL: instead of emitting full body in JSON, emit the SHA256 of the bytes of the body, useful for checksums, only has meaning with --json-envelope flag
   --discard-body                   EXPERIMENTAL: instead of emitting full body, just discard it
   --retry value                    max number of retries on transient errors (5XX status codes/timeouts) to attempt (default: 0)
   --help, -h                       show help
   --version, -v                    print the version

Example

This command takes the first 1000 words from the macOS dictionary file, then turns each of them into a Wikipedia API url.

Those urls are then piped into ganda and saved in a directory called out in the current directory.

head -1000 /usr/share/dict/words |\
awk '{printf "https://en.wikipedia.org/w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json\n", $1}' |\
ganda -o out --subdir-length 2

Output (shows hte HTTP status code of 200 OK for each along with the resulting output file that each was saved at):

Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aam&prop=revisions&rvprop=content&format=json -> out/95/https-en-wikipedia-org-w-api-php-action-query-titles-aam-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=A&prop=revisions&rvprop=content&format=json -> out/71/https-en-wikipedia-org-w-api-php-action-query-titles-A-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aal&prop=revisions&rvprop=content&format=json -> out/99/https-en-wikipedia-org-w-api-php-action-query-titles-aal-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=a&prop=revisions&rvprop=content&format=json -> out/69/https-en-wikipedia-org-w-api-php-action-query-titles-a-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aardwolf&prop=revisions&rvprop=content&format=json -> out/31/https-en-wikipedia-org-w-api-php-action-query-titles-aardwolf-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aalii&prop=revisions&rvprop=content&format=json -> out/91/https-en-wikipedia-org-w-api-php-action-query-titles-aalii-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aa&prop=revisions&rvprop=content&format=json -> out/ae/https-en-wikipedia-org-w-api-php-action-query-titles-aa-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=Aani&prop=revisions&rvprop=content&format=json -> out/7f/https-en-wikipedia-org-w-api-php-action-query-titles-Aani-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=Aaron&prop=revisions&rvprop=content&format=json -> out/db/https-en-wikipedia-org-w-api-php-action-query-titles-Aaron-prop-revisions-rvprop-content-format-json
Response:  200 https://en.wikipedia.org/w/api.php?action=query&titles=aardvark&prop=revisions&rvprop=content&format=json -> out/c4/https-en-wikipedia-org-w-api-php-action-query-titles-aardvark-prop-revisions-rvprop-content-format-json
... 990 more lines

As ganda is designed to make many thousands of requests, you can use the --subdir-length to avoid making your filesystem unhappy with 1M files in a single directory. That switch will hash each url and place the response in a subdirectory (similar to how git stores its objects).

example run:

ganda example run against wikipedia API

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL