pail

package module
v0.0.0-...-85d0981 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2020 License: Apache-2.0 Imports: 30 Imported by: 1

README

===========================================
``pail`` -- Blob Storage System Abstraction
===========================================

Overview
--------

Pail is a high-level Go interface to blob storage containers like AWS's
S3 and similar services. Pail also provides implementation backed by
local file systems or MongoDB's GridFS for testing and different kinds
of applications.

Historically, ``pail`` is a component of `Evergreen
<https://github.com/evergreen-ci/>`_ , a CI platform: this fork removes some
legacy components and drops support older versions of Golang, thereby adding
support for modules, and may see additional development.

Documentation
-------------

The core API documentation is in the `godoc
<https://godoc.org/github.com/deciduosity/pail/>`_.

Development
-----------

Contribute
~~~~~~~~~~

Feel free to open issues or submit pull requests!

The pail package is available under the terms of the Apache License (v2).

Goals
~~~~~

- Higher order bucket implementations to provide more plug-and-play operations
  for common storage patterns (archiving, compression).

- Additional backend bucket implementations based to support blob storage
  systems using different APIs, including Azure and GCP.

- Alternate deduplicating "block store" storage formats.

- Add benchmarks and improve speed of common operations.
  
Buildsystem
~~~~~~~~~~~

The pail project uses a ``makefile`` to coordinate testing. Use the following
command to build the cedar binary: ::

  make build

The artifact is at ``build/pail``. The makefile provides the following
targets:

``test``
   Runs all tests, sequentially, for all packages.

``test-<package>``
   Runs all tests for a specific package

``race``, ``race-<package>``
   As with their ``test`` counterpart, these targets run tests with
   the race detector enabled.

``lint``, ``lint-<package>``
   Installs and runs the ``gometaliter`` with appropriate settings to
   lint the project.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateAWSCredentials

func CreateAWSCredentials(awsKey, awsPassword, awsToken string) *credentials.Credentials

CreateAWSCredentials is a wrapper for creating AWS credentials.

func IsKeyNotFoundError

func IsKeyNotFoundError(err error) bool

IsKeyNotFoundError checks an error object to see if it is a key not found error.

func MakeKeyNotFoundError

func MakeKeyNotFoundError(err error) error

MakeKeyNotFoundError constructs a key not found error from an existing error of any type.

func NewKeyNotFoundError

func NewKeyNotFoundError(msg string) error

NewKeyNotFoundError creates a new error object to represent a key not found error.

func NewKeyNotFoundErrorf

func NewKeyNotFoundErrorf(msg string, args ...interface{}) error

NewKeyNotFoundErrorf creates a new error object to represent a key not found error with a formatted message.

Types

type Bucket

type Bucket interface {
	// Check validity of the bucket. This is dependent on the underlying
	// implementation.
	Check(context.Context) error

	// Produces a Writer and Reader interface to the file named by
	// the string.
	Writer(context.Context, string) (io.WriteCloser, error)
	Reader(context.Context, string) (io.ReadCloser, error)

	// Put and Get write simple byte streams (in the form of
	// io.Readers) to/from specfied keys.
	//
	// TODO: consider if these, particularly Get are not
	// substantively different from Writer/Reader methods, or
	// might just be a wrapper.
	Put(context.Context, string, io.Reader) error
	Get(context.Context, string) (io.ReadCloser, error)

	// Upload and Download write files from the local file
	// system to the specified key.
	Upload(context.Context, string, string) error
	Download(context.Context, string, string) error

	SyncBucket

	// Copy does a special copy operation that does not require
	// downloading a file. Note that CopyOptions.DestinationBucket must
	// have the same type as the calling bucket object.
	Copy(context.Context, CopyOptions) error

	// Remove the specified object(s) from the bucket.
	// RemoveMany continues on error and returns any accumulated errors.
	Remove(context.Context, string) error
	RemoveMany(context.Context, ...string) error

	// Remove all objects with the given prefix, continuing on error and
	// returning any accumulated errors.
	// Note that this operation is not atomic.
	RemovePrefix(context.Context, string) error

	// Remove all objects matching the given regular expression,
	// continuing on error and returning any accumulated errors.
	// Note that this operation is not atomic.
	RemoveMatching(context.Context, string) error

	// List provides a way to iterator over the contents of a
	// bucket (for a given prefix.)
	List(context.Context, string) (BucketIterator, error)
}

Bucket defines an interface for accessing a remote blob store, like S3. Should be generic enough to be implemented for GCP equivalent, or even a GridFS backed system (mostly just for kicks.)

Other goals of this project are to allow us to have a single interface for interacting with blob storage, and allow us to fully move off of our legacy goamz package and stabalize all blob-storage operations across all projects. There should be no interface dependencies on external packages required to use this library.

The preferred aws sdk is here: https://docs.aws.amazon.com/sdk-for-go/api/

In no particular order:

  • implementation constructors should make it possible to use custom http.Clients (to aid in pooling.)
  • We should probably implement .String methods.
  • Do use the grip package for logging.
  • get/put should support multipart upload/download?
  • we'll want to do retries with back-off (potentially configurable in bucketinfo?)
  • we might need to have variants that Put/Get byte slices rather than readers.
  • pass contexts to requests for timeouts.

func NewGridFSBucket

func NewGridFSBucket(ctx context.Context, opts GridFSOptions) (Bucket, error)

NewGridFSBucket creates a Bucket instance backed by the new MongoDb driver, creating a new client and connecting to the URI. Use the Check method to verify that this bucket ise operationsal.

func NewGridFSBucketWithClient

func NewGridFSBucketWithClient(ctx context.Context, client *mongo.Client, opts GridFSOptions) (Bucket, error)

NewGridFSBucketWithClient constructs a Bucket implementation using GridFS and the new MongoDB driver. If client is nil, then this method falls back to the behavior of NewGridFS bucket. Use the Check method to verify that this bucket ise operationsal.

func NewLocalBucket

func NewLocalBucket(opts LocalOptions) (Bucket, error)

NewLocalBucket returns an implementation of the Bucket interface that stores files in the local file system. Returns an error if the directory doesn't exist.

func NewLocalTemporaryBucket

func NewLocalTemporaryBucket(opts LocalOptions) (Bucket, error)

NewLocalTemporaryBucket returns an "local" bucket implementation that stores resources in the local filesystem in a temporary directory created for this purpose. Returns an error if there were issues creating the temporary directory. This implementation does not provide a mechanism to delete the temporary directory.

func NewParallelSyncBucket

func NewParallelSyncBucket(opts ParallelBucketOptions, b Bucket) (Bucket, error)

NewParallelSyncBucket returns a layered bucket implemenation that supports parallel sync operations.

func NewS3Bucket

func NewS3Bucket(options S3Options) (Bucket, error)

NewS3Bucket returns a Bucket implementation backed by S3. This implementation does not support multipart uploads, if you would like to add objects larger than 5 gigabytes see `NewS3MultiPartBucket`.

func NewS3BucketWithHTTPClient

func NewS3BucketWithHTTPClient(client *http.Client, options S3Options) (Bucket, error)

NewS3BucketWithHTTPClient returns a Bucket implementation backed by S3 with an existing HTTP client connection. This implementation does not support multipart uploads, if you would like to add objects larger than 5 gigabytes see `NewS3MultiPartBucket`.

func NewS3MultiPartBucket

func NewS3MultiPartBucket(options S3Options) (Bucket, error)

NewS3MultiPartBucket returns a Bucket implementation backed by S3 that supports multipart uploads for large objects.

func NewS3MultiPartBucketWithHTTPClient

func NewS3MultiPartBucketWithHTTPClient(client *http.Client, options S3Options) (Bucket, error)

NewS3MultiPartBucketWithHTTPClient returns a Bucket implementation backed by S3 with an existing HTTP client connection that supports multipart uploads for large objects.

type BucketItem

type BucketItem interface {
	Bucket() string
	Name() string
	Hash() string
	Get(context.Context) (io.ReadCloser, error)
}

BucketItem provides a basic interface for getting an object from a bucket.

type BucketIterator

type BucketIterator interface {
	Next(context.Context) bool
	Err() error
	Item() BucketItem
}

BucketIterator provides a way to interact with the contents of a bucket, as in the output of the List operation.

type CopyOptions

type CopyOptions struct {
	SourceKey         string
	DestinationKey    string
	DestinationBucket Bucket
	IsDestination     bool
}

CopyOptions describes the arguments to the Copy method for moving objects between Buckets.

type GridFSOptions

type GridFSOptions struct {
	Name         string
	Prefix       string
	Database     string
	MongoDBURI   string
	DryRun       bool
	DeleteOnSync bool
	DeleteOnPush bool
	DeleteOnPull bool
	Verbose      bool
}

GridFSOptions support the use and creation of GridFS backed buckets.

type LocalOptions

type LocalOptions struct {
	Path         string
	Prefix       string
	DryRun       bool
	DeleteOnSync bool
	DeleteOnPush bool
	DeleteOnPull bool
	Verbose      bool
}

LocalOptions describes the configuration of a local Bucket.

type ParallelBucketOptions

type ParallelBucketOptions struct {
	// Workers sets the number of worker threads.
	Workers int
	// DryRun enables running in a mode that will not execute any
	// operations that modify the bucket.
	DryRun bool
	// DeleteOnSync will delete all objects from the target that do not
	// exist in the source after the completion of a sync operation
	// (Push/Pull).
	DeleteOnSync bool
	// DeleteOnPush will delete all objects from the target that do not
	// exist in the source after the completion of Push.
	DeleteOnPush bool
	// DeleteOnPull will delete all objects from the target that do not
	// exist in the source after the completion of Pull.
	DeleteOnPull bool
}

ParallelBucketOptions support the use and creation of parallel sync buckets.

type S3Options

type S3Options struct {
	// DryRun enables running in a mode that will not execute any
	// operations that modify the bucket.
	DryRun bool
	// DeleteOnSync will delete all objects from the target that do not
	// exist in the destination after the completion of a sync operation
	// (Push/Pull).
	DeleteOnSync bool
	// DeleteOnPush will delete all objects from the target that do not
	// exist in the source after the completion of Push.
	DeleteOnPush bool
	// DeleteOnPull will delete all objects from the target that do not
	// exist in the source after the completion of Pull.
	DeleteOnPull bool
	// Compress enables gzipping of uploaded objects.
	Compress bool
	// UseSingleFileChecksums forces the bucket to checksum files before
	// running uploads and download operation (rather than doing these
	// operations independently.) Useful for large files, particularly in
	// coordination with the parallel sync bucket implementations.
	UseSingleFileChecksums bool
	// Verbose sets the logging mode to "debug".
	Verbose bool
	// MaxRetries sets the number of retry attempts for s3 operations.
	MaxRetries int
	// Credentials allows the passing in of explicit AWS credentials. These
	// will override the default credentials chain. (Optional)
	Credentials *credentials.Credentials
	// SharedCredentialsFilepath, when not empty, will override the default
	// credentials chain and the Credentials value (see above). (Optional)
	SharedCredentialsFilepath string
	// SharedCredentialsProfile, when not empty, will temporarily set the
	// AWS_PROFILE environment variable to its value. (Optional)
	SharedCredentialsProfile string
	// Region specifies the AWS region.
	Region string
	// Name specifies the name of the bucket.
	Name string
	// Prefix specifies the prefix to use. (Optional)
	Prefix string
	// Permissions sets the S3 permissions to use for each object. Defaults
	// to FULL_CONTROL. See
	// `https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html`
	// for more information.
	Permissions S3Permissions
	// ContentType sets the standard MIME type of the objet data. Defaults
	// to nil. See
	//`https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17`
	// for more information.
	ContentType string
}

S3Options support the use and creation of S3 backed buckets.

type S3Permissions

type S3Permissions string

S3Permissions is a type that describes the object canned ACL from S3.

const (
	S3PermissionsPrivate                S3Permissions = s3.ObjectCannedACLPrivate
	S3PermissionsPublicRead             S3Permissions = s3.ObjectCannedACLPublicRead
	S3PermissionsPublicReadWrite        S3Permissions = s3.ObjectCannedACLPublicReadWrite
	S3PermissionsAuthenticatedRead      S3Permissions = s3.ObjectCannedACLAuthenticatedRead
	S3PermissionsAWSExecRead            S3Permissions = s3.ObjectCannedACLAwsExecRead
	S3PermissionsBucketOwnerRead        S3Permissions = s3.ObjectCannedACLBucketOwnerRead
	S3PermissionsBucketOwnerFullControl S3Permissions = s3.ObjectCannedACLBucketOwnerFullControl
)

Valid S3 permissions.

func (S3Permissions) Validate

func (p S3Permissions) Validate() error

Validate s3 permissions.

type SyncBucket

type SyncBucket interface {
	// Sync methods: these methods are the recursive, efficient
	// copy methods of files from s3 to the local file
	// system.
	Push(context.Context, SyncOptions) error
	Pull(context.Context, SyncOptions) error
}

SyncBucket defines an interface to access a remote blob store and synchronize the local file system tree with the remote store.

func NewS3ArchiveBucket

func NewS3ArchiveBucket(options S3Options) (SyncBucket, error)

NewS3ArchiveBucket returns a SyncBucket implementation backed by S3 that supports syncing the local file system as a single archive file in S3 rather than creating an individual object for each file. This SyncBucket is not compatible with regular Bucket implementations.

func NewS3ArchiveBucketWithHTTPClient

func NewS3ArchiveBucketWithHTTPClient(client *http.Client, options S3Options) (SyncBucket, error)

NewS3ArchiveBucketWithHTTPClient is the same as NewS3ArchiveBucket but allows the user to specify an existing HTTP client connection.

type SyncOptions

type SyncOptions struct {
	Local   string
	Remote  string
	Exclude string
}

SyncOptions describes the arguments to the sync operations (Push and Pull). Note that exclude is a regular expression.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL