s3grep

command module
v0.0.0-...-d433642 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2020 License: MIT Imports: 2 Imported by: 0

README

About

A pretty simple tool to search for text in objects stored in AWS's S3.

Usage

Installation

To install you can simply:

go get github.com/joboscribe/s3grep

Then inside the source directory:

go install

Now (assuming your GOPATH is in your PATH) you should be able to run s3grep.

AWS Credentials

Since this is built on the AWS SDK it will use credentials in the same order of preference as laid out in the SDK documents. I've tested it with environment variables and a credentials file as generated by running aws configure.

Options and Arguments
s3grep [-i] [-e pattern] [-k path] [-n num] [--ignore-case] [--keep=path] [--num-workers=num] [--regexp=pattern] [pattern] [bucket] [key] [region]

The only required arguments are:

  1. pattern: a regex that object contents will be matched against (should probably be in quotes)
  2. bucket: the name of the bucket containing the objects to be search
  3. key: a regex used to find objects in which the search will be performed (you can avoid shell expansion by putting this in quotes)
  4. region: the AWS region where the bucket is located

The optional arguments and flags are:

  • -i or --ignore-case: performs a case-insensitive match
  • -e [pattern] or --regexp [pattern]: addition regex that will be matched against, can be used multiple times
  • -k [path] or --keep [path]: objects containing matches will be stored locally at the file path indicated by [path]
  • -n [num] or --num-workers [num]: how many S3 operations to perform in parallel (WARNING: on a 'nix OS you can pretty easily run out of file descriptors if you set this much higher than 1000)
Examples

Let's say you want to search for the string "super-duper" in all the objects in the bucket named my-wonderful-secret-stuff in the us-east-1 region:

s3grep "super-duper" my-wonderful-secret-stuff ".*" us-east-1

Then you realize that there are photos and mp3s and various other file formats in that bucket so you cancel that search and perform it again but this time only in objects with keys ending in .txt:

s3grep "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

Then you realize that you wanted to find all occurrences of "super-duper" regardless of case:

s3grep -i "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

This takes forever. You remember that you have several thousand files to look through and doing it 10 at a time means you'll be here all day, so you increase the number of workers to 500:

s3grep -i -n 500 "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

Suddenly you remember that you want not just "super-duper" but also "awesome-possum":

s3grep -i -e "awesome-possum" "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

And that's when you think maybe it'd be a good idea to hang onto all those matching objects locally in your ~/literature directory so you can read them at your leisure:

s3grep -i -k "~/literature" -e "awesome-possum" "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1
Possible FAQs

Q: Why would anyone be searching inside text files stored on S3?

A: Maybe you work at a company that uses S3 to store things like, oh, i don't know, logs or digitized documents, and now you need to find all the files containing certain entries. I've had to do almost exactly that, hence my inspiration to create this tool.

Q: Is using this going to cost me money?

A: Probably, i mean, almost assuredly this is going to cost you something, though you would have to consult the S3 pricing guide to get an estimate of how much. I figure one GET request per 1,000 objects in the bucket in order to get a list of all the relevant keys and then at least one GET request per relevant key. The math is left as an exercise for the reader.

Q: Little presumptuous, don't you think, naming your work after the venerable grep?

A: I know right? Believe me, i don't feel entirely comfortable with it, but i decided to go with that name and then try to make it look and behave as similarly as possible even though there are some obvious differences since grep doesn't need to know things like "region" or "bucket".

Q: OMGOSH it's taking forever!

A: Lots of big files, huh? Bummer. In terms of improvements there are certainly some low-hanging fruit, which i plan to address (grab? pick?). In the meantime, i guess it makes for a good excuse for the user to go get something to drink, do a little burst of exercise, read an article or what have you.

Q: What's all this about file headers?

A: If you use the -n or --num-workers option with a high enough number (say 1024 on Linux) without increasing the default number of file descriptors per process then s3grep will (assuming there are enough objects in the relevant bucket) bump up against that limit because every HTTP request uses up a file descriptor. That said, if you are looking for snippets of text in millions of files this might not be the best tool for the job.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL