s3s

package module
v0.5.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2023 License: MIT Imports: 16 Imported by: 0

README

s3s

s3s is a go binary instead of vast-engineering/s3select.

Features

s3s query all files lower than S3 prefix.

Available below:

  • Input JSON to Output JSON
  • Input CSV to Output JSON
  • Input Application Load Balancer Logs to Output JSON
  • Input CloudFront Logs to Output JSON

Usage

$ s3s --help
NAME:
   s3s - Easy S3 select like searching in directories

USAGE:
   s3s [global options] command [command options] [arguments...]

VERSION:
   current

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --debug        erorr check for developer (default: false)
   --help, -h     show help
   --version, -v  print the version

   AWS:

   --max-retries value, -M value   max number of api requests to retry (default: 20)
   --region value                  region of target s3 bucket exist (default: ENV["AWS_REGION"])
   --thread-count value, -t value  max number of api requests to concurrently (default: 150)

   Input Format:

   --alb-logs, --alb_logs  (default: false)
   --cf-logs, --cf_logs    (default: false)
   --csv                   (default: false)

   Query:

   --count, -c              max number of results from each key to return (default: false)
   --limit value, -l value  max number of results from each key to return (default: 0)
   --query value, -q value  a query for S3 Select
   --where value, -w value  WHERE part of the query

   Run:

   --delve               like directory move before querying (default: false)
   --dry-run, --dry_run  pre request for s3 select (default: false)

   Target:

   --duration value  from current time if alb or cf (ex: "2h3m") (default: 0s)
   --since value     end at if alb or cf (ex: "2006-01-02 15:04:05")
   --until value     start at if alb or cf (ex: "2006-01-02 15:04:05")

s3s is execution S3 Select from json to json (default).

$ s3s s3://bucket/prefix
{"time":1654848930,"type":"speak"}
{"time":1654848969,"type":"sleep"}

// $ s3s s3://bucket/prefix_A s3://bucket/prefix_B s3://bucket/prefix_C
$ s3s -q 'SELECT * FROM S3Object s WHERE s.type = "speak"' s3://bucket/prefix
{"time":1654848930,"type":"speak"}

// alternate
// $ s3s -w 's.type = "speak"' s3://bucket/prefix

s3s can execute S3 Select from csv to json when --csv option enabled.

// 122, hello
$ s3s s3://bucket/prefix
{"_1":122,"_2":"hello"}
ALB and CF logs support

--alb-logs is a format for Application Load Balancer (ALB). --cf-logs is a format for CloudFront (CF).

Each options are tagging available instead of _1, _2, etc.

And also, --where replace column names to column numbers. But --query does not replace columns for execution raw query.

// below query is same as $ s3s --alb-logs --query="'SELECT * FROM S3Object s WHERE s.`_2` = '2022-09-01T00:00:00.000000Z'" s3://prefix
$ s3s --alb-logs --where="s.`time` = '2022-09-01T00:00:00.000000Z'" s3://prefix
index ALB CF
_1 type date
_2 time time
_3 elb x-edge-location
_4 client:port sc-bytes
_5 target:port c-ip
_6 request_processing_time cs-method
_7 target_processing_time cs(Host)
_8 response_processing_time cs-uri-stem
_9 elb_status_code sc-status
_10 target_status_code cs(Referer)
_11 received_bytes cs(User-Agent)
_12 sent_bytes cs-uri-query
_13 request cs(Cookie)
_14 user_agent x-edge-result-type
_15 ssl_cipher x-edge-request-id
_16 ssl_protocol x-host-header
_17 target_group_arn cs-protocol
_18 trace_id cs-bytes
_19 domain_name time-taken
_20 chosen_cert_arn x-forwarded-for
_21 matched_rule_priority ssl-protocol
_22 request_creation_time ssl-cipher
_23 actions_executed x-edge-response-result-type
_24 redirect_url cs-protocol-version
_25 error_reason fle-status
_26 target:port_list fle-encrypted-fields
_27 target_status_code_list c-port
_28 classification time-to-first-byte
_29 classification_reason x-edge-detailed-result-type
_30 sc-content-type
_31 sc-range-start
_32 sc-range-end

Support log range when alb and cf. time format is 2006-01-02 15:04:05 as UTC.

  • --duration is a duration from now.
  • --since is start time
  • --until is end time

However, s3s stop when you target cloudfront and using --duration or --since only, because s3s hit too many keys.

-delve, like directory move before querying

search from prefix

$ s3s -delve s3://bucket/prefix

search from bucket list

$ s3s -delve
  bucket/prefix/C/
  bucket/prefix/B/
  bucket/prefix/A/        # delve more lower path than this prefix
  Query↵ (s3://bucket/prefix/) # choose and execute s3select this prefix
> ←Back upper path        # back to parent prefix
5/5
>

Querying after Enter.

{"time":1654848930,"type":"speak"}
{"time":1654848969,"type":"sleep"}

...

bucket/prefix/A/ (print path to stderr at end)

Documentation

Index

Constants

View Source
const (
	DEFAULT_THREAD_COUNT = 150
)
View Source
const (
	ErrTimeParseFailed = "time parse failed"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Client added in v0.5.0

type Client struct {
	// contains filtered or unexported fields
}

func New added in v0.5.0

func New(ctx context.Context) (*Client, error)

func (*Client) GetS3Bucket added in v0.5.0

func (c *Client) GetS3Bucket(ctx context.Context) ([]string, error)

func (*Client) GetS3Dir added in v0.5.0

func (c *Client) GetS3Dir(ctx context.Context, bucket string, prefix string) ([]string, error)

func (*Client) GetS3Keys added in v0.5.0

func (c *Client) GetS3Keys(ctx context.Context, sender chan<- s3Object, bucket string, prefix string, info *Query) error

func (*Client) GetS3OneKey added in v0.5.0

func (c *Client) GetS3OneKey(ctx context.Context, bucket string, prefix string) (*s3Object, error)

func (*Client) OptimizateALBPrefixes added in v0.5.0

func (c *Client) OptimizateALBPrefixes(ctx context.Context, prefixes []string, keyInfo *Query) ([]string, error)

func (*Client) OptimizateCFPrefixes added in v0.5.0

func (c *Client) OptimizateCFPrefixes(ctx context.Context, prefixes []string, keyInfo *Query) ([]string, error)

func (*Client) Run added in v0.5.0

func (c *Client) Run(ctx context.Context, prefixes []string, query *Query, option *Option) (*Result, error)

type FormatType added in v0.3.0

type FormatType int
const (
	FormatTypeJSON FormatType = iota + 1
	FormatTypeCSV
	FormatTypeALBLogs
	FormatTypeCFLogs
)

type Option added in v0.5.0

type Option struct {
	IsDryRun    bool
	IsCountMode bool
	Output      string
}

type Query added in v0.5.0

type Query struct {
	FormatType FormatType
	Query      string
	Since      time.Time
	Until      time.Time
}

type Result added in v0.3.1

type Result struct {
	Count int
	Bytes int64
}

Directories

Path Synopsis
cmd
s3s
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL