base: github.com/grailbio/base/file Index | Examples | Files | Directories

package file

import "github.com/grailbio/base/file"

Package file provides basic file operations across multiple file-system types. It is designed for use in applications that operate uniformly on multiple storage types, such as local files, S3 and HTTP.

Overview

This package is designed with following goals:

- Support popular file systems, especially S3 and the local file system.

- Define operation semantics that are implementable on all the supported file systems, yet practical and usable.

- Extensible. Provide leeway to do things like registering new file system types or ticket-based authorizations.

This package defines two key interfaces, Implementation and File.

- Implementation provides filesystem operations, such as Open, Remove, and List (directory walking).

- File implements operations on a file. It is created by Implementation.{Open,Create} calls. File is similar to go's os.File object but provides limited functionality.

Reading and writing files

The following snippet shows registering an S3 implementation, then writing and reading a S3 file.

import (
 "context"
 "ioutil"

 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
 "github.com/aws/aws-sdk-go/aws/session"
 "github.com/stretchr/testify/require"
)

func init() {
  file.RegisterImplementation("s3", s3file.NewImplementation(
    s3file.NewDefaultProvider(session.Options{})))
}

// Caution: this code ignores all errors.
func WriteTest() {
  ctx := context.Background()
  f, err := file.Create(ctx, "s3://grail-saito/tmp/test.txt")
  n, err = f.Writer(ctx).Write([]byte{"Hello"})
  err = f.Close(ctx)
}

func ReadTest() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://grail-saito/tmp/test.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err = f.Close(ctx)
}

To open a file for reading or writing, run file.Open("s3://bucket/key") or file.Create("s3://bucket/key"). A File object does not implement an io.Reader or io.Writer directly. Instead, you must call File.Reader or File.Writer to start reading or writing. These methods are split from the File itself so that an application can pass different contexts to different I/O operations.

File-system operations

The file package provides functions similar to those in the standard os class. For example, file.Remove("s3://bucket/key") removes a file, and file.Stat("s3://bucket/key") provides a metadata about the file.

Pathname utility functions

The file package also provides functions that are similar to those in the standard filepath package. Functions file.Base, file.Dir, file.Join work just like filepath.{Base,Dir,Join}, except that they handle the URL pathnames properly. For example, file.Join("s3://foo", "bar") will return "s3://foo/bar", whereas filepath.Join("s3://foo", "bar") would return "s3:/foo/bar".

Registering a filesystem implementation

Function RegisterImplementation associates an implementation to a scheme ("s3", "http", "git", etc). A local file system implementation is automatically available without any explicit registration. RegisterImplementation is usually invoked when a process starts up, for all the supported file system types. For example:

import (
 "ioutil"
 "github.com/grailbio/base/context"
 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
)
func init() {
  file.RegisterImplementation("s3:", s3file.NewImplementation(...))
}
func main() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://somebucket/foo.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err := f.Close(ctx)
  ...
}

Once an implementation is registered, the files for that scheme can be opened or created using "scheme:name" pathname.

Differences from the os package

The file package is similar to Go's standard os package. The differences are the following.

- The file package focuses on providing a file-like API for object storage systems, such as S3 or GCS.

- Mutations to a File are restricted to whole-file writes. There is no option to overwrite a part of an existing file.

- All the operations take a context parameter.

- file.File does not implement io.Reader nor io.Writer directly. One must call File.Reader or File.Writer methods to obtains a reader or writer object.

- Directories are simulated in a best-effort manner on implementations that do not support directories as first-class entities, such as S3. Lister provides IsDir() for the current path. Info(path) returns nil for directories.

Concurrency

The Implementation and File provide an open-close consistency. More specifically, this package linearizes fileops, with a fileop defined in the following way: fileop is a set of operations, starting from Implementation.{Open,Create}, followed by read/write/stat operations on the file, followed by File.Close. Operations such as Implementation.{Stat,Remove,List} and Lister.Scan form a singleton fileop.

Caution: a local file system on NFS (w/o cache leasing) doesn't provide this guarantee. Use NFS at your own risk.

Example_localfile is an example of basic read/write operations on the local file system.

Code:

doWrite := func(ctx context.Context, data []byte, path string) {
    out, err := file.Create(ctx, path)
    if err != nil {
        panic(err)
    }
    if _, err = out.Writer(ctx).Write(data); err != nil {
        panic(err)
    }
    if err := out.Close(ctx); err != nil {
        panic(err)
    }
}

doRead := func(ctx context.Context, path string) []byte {
    in, err := file.Open(ctx, path)
    if err != nil {
        panic(err)
    }
    data, err := ioutil.ReadAll(in.Reader(ctx))
    if err != nil {
        panic(err)
    }
    if err := in.Close(ctx); err != nil {
        panic(err)
    }
    return data
}

ctx := context.Background()
doWrite(ctx, []byte("Blue box jumped over red bat"), "/tmp/foohah.txt")
fmt.Printf("Got: %s\n", string(doRead(ctx, "/tmp/foohah.txt")))

Output:

Got: Blue box jumped over red bat

Index

Examples

Package Files

doc.go file.go implementation.go info.go localfile.go path.go util.go

func Base Uses

func Base(path string) string

Base returns the last element of the path. It is the same as filepath.Base for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself.

Example:

file.Base("s3://") returns "s3://".
file.Base("s3://foo/hah/") returns "hah".

Code:

fmt.Println(file.Base(""))
fmt.Println(file.Base("foo1"))
fmt.Println(file.Base("foo2/"))
fmt.Println(file.Base("/"))
fmt.Println(file.Base("s3://"))
fmt.Println(file.Base("s3://blah1"))
fmt.Println(file.Base("s3://blah2/"))
fmt.Println(file.Base("s3://foo/blah3//"))

Output:

.
foo1
foo2
/
s3://
blah1
blah2
blah3

func CloseAndReport Uses

func CloseAndReport(ctx context.Context, f Closer, err *error)

CloseAndReport returns a defer-able helper that calls f.Close and reports errors, if any, to *err. Pass your function's named return error. Example usage:

func processFile(filename string) (_ int, err error) {
  ctx := context.Background()
  f, err := file.Open(ctx, filename)
  if err != nil { ... }
  defer file.CloseAndReport(ctx, f, &err)
  ...
}

If your function returns with an error, any f.Close error will be chained appropriately.

func Dir Uses

func Dir(path string) string

Dir returns the all but the last element of the path. It the same as filepath.Dir for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself. (3) The path is not cleaned; for example repeated "/"s in the path is preserved.

Code:

fmt.Println(file.Dir("foo"))
fmt.Println(file.Dir("."))
fmt.Println(file.Dir("/a/b"))
fmt.Println(file.Dir("a/b"))
fmt.Println(file.Dir("s3://ab/cd"))
fmt.Println(file.Dir("s3://ab//cd"))
fmt.Println(file.Dir("s3://a/b/"))
fmt.Println(file.Dir("s3://a/b//"))
fmt.Println(file.Dir("s3://a//b//"))
fmt.Println(file.Dir("s3://a"))

Output:

.
.
/a
a
s3://ab
s3://ab
s3://a/b
s3://a/b
s3://a//b
s3://

func IsAbs Uses

func IsAbs(path string) bool

IsAbs returns true if pathname is absolute local path. For non-local file, it always returns true.

Code:

fmt.Println(file.IsAbs("foo"))
fmt.Println(file.IsAbs("/foo"))
fmt.Println(file.IsAbs("s3://foo"))

Output:

false
true
true

func Join Uses

func Join(elems ...string) string

Join joins any number of path elements into a single path, adding a separator if necessary. It is the same as filepath.Join if elems[0] is a local filesystem path. Else, it works like filepath.Join, with the following differences: (1) the path separator is always '/'. (2) Each element is not cleaned; for example if an element contains repeated "/"s in the middle, they are preserved.

Code:

fmt.Println(file.Join())
fmt.Println(file.Join(""))
fmt.Println(file.Join("foo", "bar"))
fmt.Println(file.Join("foo", ""))
fmt.Println(file.Join("foo", "/bar/"))
fmt.Println(file.Join(".", "foo:bar"))
fmt.Println(file.Join("s3://foo"))
fmt.Println(file.Join("s3://foo", "/bar/"))
fmt.Println(file.Join("s3://foo", "", "bar"))
fmt.Println(file.Join("s3://foo", "0"))
fmt.Println(file.Join("s3://foo", "abc"))
fmt.Println(file.Join("s3://foo//bar", "/", "/baz"))

Output:

foo/bar
foo
foo/bar
./foo:bar
s3://foo
s3://foo/bar
s3://foo/bar
s3://foo/0
s3://foo/abc
s3://foo//bar/baz

func MustClose Uses

func MustClose(ctx context.Context, f Closer)

MustClose is a defer-able function that calls f.Close and panics on error.

Example:

ctx := context.Background()
f, err := file.Open(ctx, filename)
if err != nil { panic(err) }
defer file.MustClose(ctx, f)
...

func MustParsePath Uses

func MustParsePath(path string) (scheme, suffix string)

MustParsePath is similar to ParsePath, but crashes the process on error.

func ParsePath Uses

func ParsePath(path string) (scheme, suffix string, err error)

ParsePath parses "path" and find the namespace object that can handle the path. The path can be of form either "scheme://path" just "path0/.../pathN". The latter indicates a local file.

On success, "schema" will be the schema part of the path. "suffix" will be the path part after the scheme://. For example, ParsePath("s3://key/bucket") will return ("s3", "key/bucket", nil).

For a local-filesystem path, this function returns ("", path, nil).

Code:

parse := func(path string) {
    scheme, suffix, err := file.ParsePath(path)
    if err != nil {
        fmt.Printf("%s 🢥 error %v\n", path, err)
        return
    }
    fmt.Printf("%s 🢥 scheme \"%s\", suffix \"%s\"\n", path, scheme, suffix)
}
parse("/tmp/test")
parse("foo://bar")
parse("foo:///bar")
parse("foo:bar")
parse("/foo:bar")

Output:

/tmp/test 🢥 scheme "", suffix "/tmp/test"
foo://bar 🢥 scheme "foo", suffix "bar"
foo:///bar 🢥 scheme "foo", suffix "/bar"
foo:bar 🢥 error parsepath foo:bar: a URL must start with 'scheme://'
/foo:bar 🢥 scheme "", suffix "/foo:bar"

func Presign Uses

func Presign(ctx context.Context, path, method string, expiry time.Duration) (string, error)

Presign is a shortcut for calling ParsePath(), then calling Implementation.Presign method.

func ReadFile Uses

func ReadFile(ctx context.Context, path string, opts ...Opts) ([]byte, error)

ReadFile reads the given file and returns the contents. A successful call returns err == nil, not err == EOF. Arg opts is passed to file.Open.

func RegisterImplementation Uses

func RegisterImplementation(scheme string, implFactory func() Implementation)

RegisterImplementation arranges so that ParsePath(schema + "://anystring") will return (impl, "anystring", nil) in the future. Schema is a string such as "s3", "http".

RegisterImplementation() should generally be called when the process starts. implFactory will be invoked exactly once, upon the first request to this scheme; this allows you to register with a factory that has not yet been full configured (e.g., it requires parsing command line flags) as long as it will be configured before the first request.

REQUIRES: This function has not been called with the same schema before.

func Remove Uses

func Remove(ctx context.Context, path string) error

Remove is a shortcut for calling ParsePath(), then calling Implementation.Remove method.

func RemoveAll Uses

func RemoveAll(ctx context.Context, path string) error

RemoveAll removes path and any children it contains. It is unspecified whether empty directories are removed by this function. It removes everything it can but returns the first error it encounters. If the path does not exist, RemoveAll returns nil.

func WriteFile Uses

func WriteFile(ctx context.Context, path string, data []byte) error

WriteFile writes data to the given file. If the file does not exist, WriteFile creates it; otherwise WriteFile truncates it before writing.

type Closer Uses

type Closer interface {
    // Close tries to clean up the resource. Implementations can define whether
    // Close can be called more than once and whether callers should retry on error.
    Close(context.Context) error
}

Closer cleans up a resource. Generally, resource provider implementations will return a Closer when opening a resource (like File above).

type ETagged Uses

type ETagged interface {
    // ETag is an identifier assigned to a specific version of the file.
    ETag() string
}

ETagged defines a getter for a file with an ETag.

type Error Uses

type Error struct {
    // contains filtered or unexported fields
}

Error implements io.{Reader,Writer,Seeker,Closer}. It returns the given error to any call.

func NewError Uses

func NewError(err error) *Error

NewError returns a new Error object that returns the given error to any Read/Write/Seek/Close call.

func (*Error) Close Uses

func (r *Error) Close() error

Close implements io.Closer.

func (*Error) Read Uses

func (r *Error) Read([]byte) (int, error)

Read implements io.Reader

func (*Error) Seek Uses

func (r *Error) Seek(int64, int) (int64, error)

Seek implements io.Seeker.

func (*Error) Write Uses

func (r *Error) Write([]byte) (int, error)

Write implements io.Writer.

type File Uses

type File interface {
    // String returns a diagnostic string.
    String() string

    // Name returns the path name given to file.Open or file.Create when this
    // object was created.
    Name() string

    // Stat returns file metadata.
    //
    // REQUIRES: Close has not been called
    Stat(ctx context.Context) (Info, error)

    // Reader creates an io.ReadSeeker object that operates on the file.  If
    // Reader() is called multiple times, they share the seek pointer.
    //
    // REQUIRES: Close has not been called
    Reader(ctx context.Context) io.ReadSeeker

    // Writer creates a writes that to the file. If Writer() is called multiple
    // times, they share the seek pointer.
    //
    // REQUIRES: Close has not been called
    Writer(ctx context.Context) io.Writer

    // Discard discards a file before it is closed, relinquishing any
    // temporary resources implied by pending writes. This should be
    // used if the caller decides not to complete writing the file.
    // Discard is a best-effort operation. Discard is not defined for
    // files opened for reading. Exactly one of Discard or Close should
    // be called. No other File, io.ReadSeeker, or io.Writer methods
    // shall be called after Discard.
    Discard(ctx context.Context)

    // Closer commits the contents of a written file, invalidating the
    // File and all Readers and Writers created from the file. Exactly
    // one of Discard or Close should be called. No other File or
    // io.ReadSeeker, io.Writer methods shall be called after Close.
    Closer
}

File defines operations on a file. Implementations must be thread safe.

func Create Uses

func Create(ctx context.Context, path string, opts ...Opts) (File, error)

Create opens the given file writeonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Create.

func Open Uses

func Open(ctx context.Context, path string, opts ...Opts) (File, error)

Open opens the given file readonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Open.

Open returns an error of kind errors.NotExist if the file at the provided path does not exist.

type Implementation Uses

type Implementation interface {
    // String returns a diagnostic string.
    String() string

    // Open opens a file for reading. The pathname given to file.Open() is passed
    // here unchanged. Thus, it contains the URL prefix such as "s3://".
    //
    // Open returns an error of kind errors.NotExist if there is
    // no file at the provided path.
    Open(ctx context.Context, path string, opts ...Opts) (File, error)

    // Create opens a file for writing. If "path" already exists, the old contents
    // will be destroyed. If "path" does not exist already, the file will be newly
    // created.  If the directory part of the path does not exist already, it will
    // be created. The pathname given to file.Open() is passed here unchanged.
    // Thus, it contains the URL prefix such as "s3://".
    Create(ctx context.Context, path string, opts ...Opts) (File, error)

    // List finds files and directories. If "path" points to a regular file, the
    // lister will return information about the file itself and finishes.
    //
    // If "path" is a directory, the lister will list file and directory under the
    // given path.  When "recursive" is set to false, List finds files "one level"
    // below dir.  Dir may end in /, but need not.  All the files and directories
    // returned by the lister will have pathnames of the form dir/something.
    //
    // For key based storage engines (e.g. S3), a dir prefix not ending in "/" must
    // be followed immediately by "/" in some object keys, and only such keys
    // will be returned.
    // With "recursive=true" List finds all files whose pathnames under "dir" or its
    // subdirectories.  All the files returned by the lister will have pathnames of
    // the form dir/something.  Directories will not be returned as separate entities.
    // For example List(ctx, "foo",true) will yield "foo/bar/bat.txt", but not "foo.txt"
    // or "foo/bar/", while List(ctx, "foo", false) will yield "foo/bar", and
    // "foo/bat.txt", but not "foo.txt" or "foo/bar/bat.txt".  There is no difference
    // in the return value of List(ctx, "foo", ...) and List(ctx, "foo/", ...)
    List(ctx context.Context, path string, recursive bool) Lister

    // Stat returns the file metadata.  It returns nil if path is
    // a directory. (There is no direct test for existence of a
    // directory.)
    //
    // Stat returns an error of kind errors.NotExist if there is
    // no file at the provided path.
    Stat(ctx context.Context, path string, opts ...Opts) (Info, error)

    // Remove removes the file. The path passed to file.Remove() is passed here
    // unchanged.
    Remove(ctx context.Context, path string) error

    // Presign returns a URL that can be used to perform the given HTTP method,
    // usually one of "GET", "PUT" or "DELETE", on the path for the duration
    // specified in expiry.
    //
    // It returns an error of kind errors.NotSupported for implementations that
    // do not support signed URLs, or that do not support the given HTTP method.
    //
    // Unlike Open and Stat, this method does not return an error of kind
    // errors.NotExist if there is no file at the provided path.
    Presign(ctx context.Context, path, method string, expiry time.Duration) (url string, err error)
}

Implementation implements operations for a file-system type. Thread safe.

func FindImplementation Uses

func FindImplementation(scheme string) Implementation

FindImplementation returns an Implementation object registered for the given scheme. It returns nil if the scheme is not registered.

func NewLocalImplementation Uses

func NewLocalImplementation() Implementation

NewLocalImplementation returns a new file.Implementation for the local file system that uses Go's native "os" module. This function is only for unittests. Applications should use functions such as file.Open, file.Create to access the local file system.

type Info Uses

type Info interface {
    // Size returns the length of the file in bytes for regular files; system-dependent for others
    Size() int64
    // ModTime returns modification time for regular files; system-dependent for others
    ModTime() time.Time
}

Info represents file metadata.

func Stat Uses

func Stat(ctx context.Context, path string, opts ...Opts) (Info, error)

Stat returns the give file's metadata. Is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Stat.

Stat returns an error of kind errors.NotExist if the file at the provided path does not exist.

type Lister Uses

type Lister interface {
    // Scan advances the lister to the next entry.  It returns
    // false either when the scan stops because we have reached the end of the input
    // or else because there was error.  After Scan returns, the Err method returns
    // any error that occurred during scanning.
    Scan() bool

    // Err returns the first error that occurred while scanning.
    Err() error

    // Path returns the last path that was scanned. The path always starts with
    // the directory path given to the List method.
    //
    // REQUIRES: Last call to Scan returned true.
    Path() string

    // IsDir() returns true if Path() refers to a directory in a file system
    // or a common prefix ending in "/" in S3.
    //
    // REQUIRES: Last call to Scan returned true.
    IsDir() bool

    // Info returns metadata of the file that was scanned.
    //
    // REQUIRES: Last call to Scan returned true.
    Info() Info
}

Lister lists files in a directory tree. Not thread safe.

func List Uses

func List(ctx context.Context, prefix string, recursive bool) Lister

List finds all files whose pathnames under "dir" or its subdirectories. All the files returned by the lister will have pathnames of form dir/something. For example List(ctx, "foo") will yield "foo/bar.txt", but not "foo.txt".

Example: impl.List(ctx, "s3://grail-data/foo")

type Opts Uses

type Opts struct {
    // When set, this flag causes the file package to keep retrying when the file
    // is reported as not found. This flag should be set when:
    //
    // 1. you are accessing a file on S3, and
    //
    // 2. an application may have attempted to GET the same file in recent past
    // (~5 minutes). The said application may be on a different machine.
    //
    // This flag is honored only by S3 to work around the problem where s3 may
    // report spurious KeyNotFound error after a GET request to the same file.
    // For more details, see
    // https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts,
    // section "S3 Data Consistency Model". In particular:
    //
    //   The caveat is that if you make a HEAD or GET request to the key
    //   name (to find if the object exists) before creating the object, Amazon S3
    //   provides eventual consistency for read-after-write.
    RetryWhenNotFound bool

    // When set, Close will ignore NoSuchUpload error from S3
    // CompleteMultiPartUpload and silently returns OK.
    //
    // This is to work around a bug where concurrent uploads to one file sometimes
    // causes an upload request to be lost on the server side.
    // https://console.aws.amazon.com/support/cases?region=us-west-2#/6299905521/en
    // https://github.com/yasushi-saito/s3uploaderror
    //
    // Set this flag only if:
    //
    //  1. you are writing to a file on S3, and
    //
    //  2. possible concurrent writes to the same file produce the same
    //  contents, so you are ok with taking any of them.
    //
    // If you don't set this flag, then concurrent writes to the same file may
    // fail with a NoSuchUpload error, and it is up to you to retry.
    //
    // On non-S3 file systems, this flag is ignored.
    IgnoreNoSuchUpload bool
}

Opts controls the file access requests, such as Open and Stat.

Directories

PathSynopsis
internal/testutil
s3filePackage s3file implements grail file interface for S3.

Package file imports 13 packages (graph) and is imported by 23 packages. Updated 2020-05-06. Refresh now. Tools for package owners.