s3grep

command module

v0.0.0-...-d433642 Latest Latest Go to latest Published: Jun 30, 2020 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/joboscribe/s3grep

Links

Open Source Insights

README ¶

About

A pretty simple tool to search for text in objects stored in AWS's S3.

Usage

Installation

To install you can simply:

go get github.com/joboscribe/s3grep

Then inside the source directory:

go install

Now (assuming your GOPATH is in your PATH) you should be able to run s3grep.

AWS Credentials

Since this is built on the AWS SDK it will use credentials in the same order of preference as laid out in the SDK documents. I've tested it with environment variables and a credentials file as generated by running aws configure.

Options and Arguments

s3grep [-i] [-e pattern] [-k path] [-n num] [--ignore-case] [--keep=path] [--num-workers=num] [--regexp=pattern] [pattern] [bucket] [key] [region]

The only required arguments are:

pattern: a regex that object contents will be matched against (should probably be in quotes)
bucket: the name of the bucket containing the objects to be search
key: a regex used to find objects in which the search will be performed (you can avoid shell expansion by putting this in quotes)
region: the AWS region where the bucket is located

The optional arguments and flags are:

-i or --ignore-case: performs a case-insensitive match
-e [pattern] or --regexp [pattern]: addition regex that will be matched against, can be used multiple times
-k [path] or --keep [path]: objects containing matches will be stored locally at the file path indicated by [path]
-n [num] or --num-workers [num]: how many S3 operations to perform in parallel (WARNING: on a 'nix OS you can pretty easily run out of file descriptors if you set this much higher than 1000)

Examples

Let's say you want to search for the string "super-duper" in all the objects in the bucket named my-wonderful-secret-stuff in the us-east-1 region:

s3grep "super-duper" my-wonderful-secret-stuff ".*" us-east-1

Then you realize that there are photos and mp3s and various other file formats in that bucket so you cancel that search and perform it again but this time only in objects with keys ending in .txt:

s3grep "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

Then you realize that you wanted to find all occurrences of "super-duper" regardless of case:

s3grep -i "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

This takes forever. You remember that you have several thousand files to look through and doing it 10 at a time means you'll be here all day, so you increase the number of workers to 500:

s3grep -i -n 500 "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

Suddenly you remember that you want not just "super-duper" but also "awesome-possum":

s3grep -i -e "awesome-possum" "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

And that's when you think maybe it'd be a good idea to hang onto all those matching objects locally in your ~/literature directory so you can read them at your leisure:

s3grep -i -k "~/literature" -e "awesome-possum" "super-duper" my-wonderful-secret-stuff ".*\.txt" us-east-1

Possible FAQs

Q: Why would anyone be searching inside text files stored on S3?

A: Maybe you work at a company that uses S3 to store things like, oh, i don't know, logs or digitized documents, and now you need to find all the files containing certain entries. I've had to do almost exactly that, hence my inspiration to create this tool.

Q: Is using this going to cost me money?

A: Probably, i mean, almost assuredly this is going to cost you something, though you would have to consult the S3 pricing guide to get an estimate of how much. I figure one GET request per 1,000 objects in the bucket in order to get a list of all the relevant keys and then at least one GET request per relevant key. The math is left as an exercise for the reader.

Q: Little presumptuous, don't you think, naming your work after the venerable grep?

A: I know right? Believe me, i don't feel entirely comfortable with it, but i decided to go with that name and then try to make it look and behave as similarly as possible even though there are some obvious differences since grep doesn't need to know things like "region" or "bucket".

Q: OMGOSH it's taking forever!

A: Lots of big files, huh? Bummer. In terms of improvements there are certainly some low-hanging fruit, which i plan to address (grab? pick?). In the meantime, i guess it makes for a good excuse for the user to go get something to drink, do a little burst of exercise, read an article or what have you.

Q: What's all this about file headers?

A: If you use the -n or --num-workers option with a high enough number (say 1024 on Linux) without increasing the default number of file descriptors per process then s3grep will (assuming there are enough objects in the relevant bucket) bump up against that limit because every HTTP request uses up a file descriptor. That said, if you are looking for snippets of text in millions of files this might not be the best tool for the job.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cli
tool

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL