muxfys

package module
v3.0.5+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2018 License: GPL-3.0 Imports: 27 Imported by: 1

README

muxfys

GoDoc Go Report Card Build Status Coverage Status

go get github.com/VertebrateResequencing/muxfys

muxfys is a pure Go library for temporarily in-process mounting multiple different remote file systems or object stores on to the same mount point as a "filey" system. Currently only support for S3-like systems has been implemented.

It has high performance, and is easy to use with nothing else to install, and no root permissions needed (except to initially install/configure fuse: on old linux you may need to install fuse-utils, and for macOS you'll need to install osxfuse; for both you must ensure that 'user_allow_other' is set in /etc/fuse.conf or equivalent).

It has good S3 compatibility, working with AWS Signature Version 4 (Amazon S3, Minio, et al.) and AWS Signature Version 2 (Google Cloud Storage, Openstack Swift, Ceph Object Gateway, Riak CS, et al).

It allows "multiplexing": you can mount multiple different S3 buckets (or sub directories of the same bucket) on the same local directory. This makes commands you want to run against the files in your buckets much simpler, eg. instead of mounting s3://publicbucket, s3://myinputbucket and s3://myoutputbucket to separate mount points and running:

$ myexe -ref /mnt/publicbucket/refs/human/ref.fa -i /mnt/myinbucket/xyz/123/
  input.file > /mnt/myoutputbucket/xyz/123/output.file

You could multiplex the 3 buckets (at the desired paths) on to the directory you will work from and just run:

$ myexe -ref ref.fa -i input.file > output.file

It is a "filey" system ('fys' instead of 'fs') in that it cares about performance and efficiency first, and POSIX second. It is designed around a particular use-case:

Non-interactively read a small handful of files who's paths you already know, probably a few times for small files and only once for large files, then upload a few large files. Eg. we want to mount S3 buckets that contain thousands of unchanging cache files, and a few big input files that we process using those cache files, and finally generate some results.

In particular this means we hold on to directory and file attributes forever and assume they don't change externally. Permissions are ignored and only you get read/write access.

When using muxfys, you 1) mount, 2) do something that needs the files in your S3 bucket(s), 3) unmount. Then repeat 1-3 for other things that need data in your S3 buckets.

Performance

To get a basic sense of performance, a 1GB file in a Ceph Object Gateway S3 bucket was read, twice in a row for tools with caching, using the methods that worked for me (I had to hack minfs to get it to work); units are seconds (average of 3 attempts) needed to read the whole file:

| method         | fresh | cached |
|----------------|-------|--------|
| s3cmd          | 5.9   | n/a    |
| mc             | 7.9   | n/a    |
| minfs          | 40    | n/a    |
| s3fs           | 12.1  | n/a    |
| s3fs caching   | 12.2  | 1.0    |
| muxfys         | 5.7   | n/a    |
| muxfys caching | 5.8   | 0.7    |

Ie. minfs is very slow, and muxfys is about 2x faster than s3fs, with no noticeable performance penalty for fuse mounting vs simply downloading the files you need to local disk. (You also get the benefit of being able to seek and read only small parts of the remote file, without having to download the whole thing.)

The same story holds true when performing the above test 100 times ~simultaneously; while some reads take much longer due to Ceph/network overload, muxfys remains on average twice as fast as s3fs. The only significant change is that s3cmd starts to fail.

For a real-world test, some data processing and analysis was done with samtools, a tool that can end up reading small parts of very large files. www.htslib.org/workflow was partially followed to map fastqs with 441 read pairs (extracted from an old human chr20 mapping). Mapping, sorting and calling was carried out, in addition to creating and viewing a cram. The different caching strategies used were: cup == reference-related files cached, fastq files uncached, working in a normal POSIX directory; cuf == as cup, but working in a fuse mounted writable directory; uuf == as cuf, but with no caching for the reference-related files. The local(mc) method involved downloading all files with mc first, with the cached result being the maximum possible performance: that of running bwa and samtools when all required files are accessed from the local POSIX filesystem. Units are seconds (average of 3 attempts):

| method     | fresh | cached |
|------------|-------|--------|
| local(mc)  | 157   | 40     |
| s3fs.cup   | 175   | 50     |
| muxfys.cup | 80    | 45     |
| muxfys.cuf | 79    | 44     |
| muxfys.uuf | 88    | n/a    |

Ie. muxfys is about 2x faster than just downloading all required files manually, and over 2x faster than using s3fs. There isn't much performance loss when the data is cached vs maximum possible performance. There's no noticeable penalty (indeed it's a little faster) for working directly in a muxfys-mounted directory.

Finally, to compare to a highly optimised tool written in C that has built-in support (via libcurl) for reading from S3, samtools was once again used, this time to read 100bp (the equivalent of a few lines) from an 8GB indexed cram file. The builtin(mc) method involved downloading the single required cram cache file from S3 first using mc, then relying on samtools' built-in S3 support by giving it the s3:// path to the cram file; the cached result involves samtools reading this cache file and the cram's index files from the local POSIX filesystem, but it still reads cram data itself from the remote S3 system. The other methods used samtools normally, giving it paths within the fuse mount(s) created. The different caching strategies used were: cu == reference-related files cached, cram-related files uncached; cc == everything cached; uu == nothing cached. Units are seconds (average of 3 attempts):

| method      | fresh | cached |
|-------------|-------|--------|
| builtin(mc) | 1.3   | 0.5    |
| s3fs.cu     | 4.3   | 1.7    |
| s3fs.cc     | 4.4   | 0.5    |
| s3fs.uu     | 4.4   | 2.2    |
| muxfys.cu   | 0.3   | 0.1    |
| muxfys.cc   | 0.3   | 0.06   |
| muxfys.uu   | 0.3   | 0.1    |

Ie. muxfys is much faster than s3fs (more than 2x faster probably due to much faster and more efficient stating of files), and using it also gives a significant benefit over using a tools' built-in support for S3.

Status & Limitations

The only RemoteAccessor implemented so far is for S3-like object stores.

In cached mode, random reads and writes have been implemented.

In non-cached mode, random reads and serial writes have been implemented. (It is unlikely that random uncached writes will be implemented.)

Non-POSIX behaviours:

  • does not store file mode/owner/group
  • does not support hardlinks
  • symlinks are only supported temporarily in a cached writeable mount: they can be created and used, but do not get uploaded
  • atime (and typically ctime) is always the same as mtime
  • mtime of files is not stored remotely (remote file mtimes are of their upload time, and muxfys only guarantees that files are uploaded in the order of their mtimes)
  • does not upload empty directories, can't rename remote directories
  • fsync is ignored, files are only flushed on close

Guidance

CacheData: true will usually give you the best performance. Not setting an explicit CacheDir will also give the best performance, as if you read a small part of a large file, only the part you read will be downloaded and cached in the unique CacheDir.

Only turn on Write mode if you have to write.

Use CacheData: false if you will read more data than can be stored on local disk.

If you know that you will definitely end up reading the same data multiple times (either during a mount, or from different mounts) on the same machine, and have sufficient local disk space, use CacheData: true and set an explicit CacheDir (with a constant absolute path, eg. starting in /tmp). Doing this results in any file read downloading the whole remote file to cache it, which can be wasteful if you only need to read a small part of a large file. (But this is the only way that muxfys can coordinate the cache amongst independent processes.)

Usage

import "github.com/VertebrateResequencing/muxfys"

// fully manual S3 configuration
accessorConfig := &muxfys.S3Config{
    Target:    "https://s3.amazonaws.com/mybucket/subdir",
    Region:    "us-east-1",
    AccessKey: os.Getenv("AWS_ACCESS_KEY_ID"),
    SecretKey: os.Getenv("AWS_SECRET_ACCESS_KEY"),
}
accessor, err := muxfys.NewS3Accessor(accessorConfig)
if err != nil {
    log.Fatal(err)
}
remoteConfig1 := &muxfys.RemoteConfig{
    Accessor: accessor,
    CacheDir: "/tmp/muxfys/cache",
    Write:    true,
}

// or read configuration from standard AWS S3 config files and environment
// variables
accessorConfig, err = muxfys.S3ConfigFromEnvironment("default",
    "myotherbucket/another/subdir")
if err != nil {
    log.Fatalf("could not read config from environment: %s\n", err)
}
accessor, err = muxfys.NewS3Accessor(accessorConfig)
if err != nil {
    log.Fatal(err)
}
remoteConfig2 := &muxfys.RemoteConfig{
    Accessor:  accessor,
    CacheData: true,
}

cfg := &muxfys.Config{
    Mount:     "/tmp/muxfys/mount",
    CacheBase: "/tmp",
    Retries:   3,
    Verbose:   true,
}

fs, err := muxfys.New(cfg)
if err != nil {
    log.Fatalf("bad configuration: %s\n", err)
}

err = fs.Mount(remoteConfig, remoteConfig2)
if err != nil {
    log.Fatalf("could not mount: %s\n", err)
}
fs.UnmountOnDeath()

// read from & write to files in /tmp/muxfys/mount, which contains the
// contents of mybucket/subdir and myotherbucket/another/subdir; writes will
// get uploaded to mybucket/subdir when you Unmount()

err = fs.Unmount()
if err != nil {
    log.Fatalf("could not unmount: %s\n", err)
}

logs := fs.Logs()

Provenance

There are many ways of accessing data in S3 buckets. Common tools include s3cmd for direct up/download of particular files, and s3fs for fuse-mounting a bucket. But these are not written in Go.

Amazon provide aws-sdk-go for interacting with S3, but this does not work with (my) Ceph Object Gateway and possibly other implementations of S3.

minio-go is an alternative Go library that provides good compatibility with a wide variety of S3-like systems.

There are at least 3 Go libraries for creating fuse-mounted file-systems. github.com/jacobsa/fuse was based on bazil.org/fuse, claiming higher performance. Also claiming high performance is github.com/hanwen/go-fuse.

There are at least 2 projects that implement fuse-mounting of S3 buckets:

  • github.com/minio/minfs is implemented using minio-go and bazil, but in my hands was very slow. It is designed to be run as root, requiring file-based configuration.
  • github.com/kahing/goofys is implemented using aws-sdk-go and jacobsa/fuse, making it incompatible with (my) Ceph Object Gateway.

Both are designed to be run as daemons as opposed to being used in-process.

muxfys is implemented using minio-go for compatibility, and hanwen/go-fuse for speed. (In my testing, hanwen/go-fuse and jacobsa/fuse did not have noticeably difference performance characteristics, but go-fuse was easier to write for.) However, some of its read code is inspired by goofys. Thanks to minimising remote calls to the remote S3 system, and only implementing what S3 is generally capable of, it shares and adds to goofys' non-POSIX behaviours.

Versioning

This project adheres to Semantic Versioning. See CHANGELOG.md for a description of changes.

If you want to rely on a stable API, vendor the library, updating within a desired version. For example, you could use Glide and:

$ glide get github.com/VertebrateResequencing/muxfys#^2.0.0

Documentation

Overview

Package muxfys is a pure Go library that lets you in-process temporarily fuse-mount remote file systems or object stores as a "filey" system. Currently only support for S3-like systems has been implemented.

It has high performance, and is easy to use with nothing else to install, and no root permissions needed (except to initially install/configure fuse: on old linux you may need to install fuse-utils, and for macOS you'll need to install osxfuse; for both you must ensure that 'user_allow_other' is set in /etc/fuse.conf or equivalent).

It allows "multiplexing": you can mount multiple different buckets (or sub directories of the same bucket) on the same local directory. This makes commands you want to run against the files in your buckets much simpler, eg. instead of mounting s3://publicbucket, s3://myinputbucket and s3://myoutputbucket to separate mount points and running:

$ myexe -ref /mnt/publicbucket/refs/human/ref.fa -i /mnt/myinputbucket/xyz/123/
  input.file > /mnt/myoutputbucket/xyz/123/output.file

You could multiplex the 3 buckets (at the desired paths) on to the directory you will work from and just run:

$ myexe -ref ref.fa -i input.file > output.file

When using muxfys, you 1) mount, 2) do something that needs the files in your S3 bucket(s), 3) unmount. Then repeat 1-3 for other things that need data in your S3 buckets.

Usage

import "github.com/VertebrateResequencing/muxfys"

// fully manual S3 configuration
accessorConfig := &muxfys.S3Config{
    Target:    "https://s3.amazonaws.com/mybucket/subdir",
    Region:    "us-east-1",
    AccessKey: os.Getenv("AWS_ACCESS_KEY_ID"),
    SecretKey: os.Getenv("AWS_SECRET_ACCESS_KEY"),
}
accessor, err := muxfys.NewS3Accessor(accessorConfig)
if err != nil {
    log.Fatal(err)
}
remoteConfig1 := &muxfys.RemoteConfig{
    Accessor: accessor,
    CacheDir: "/tmp/muxfys/cache",
    Write:    true,
}

// or read configuration from standard AWS S3 config files and environment
// variables
accessorConfig, err = muxfys.S3ConfigFromEnvironment("default",
    "myotherbucket/another/subdir")
if err != nil {
    log.Fatalf("could not read config from environment: %s\n", err)
}
accessor, err = muxfys.NewS3Accessor(accessorConfig)
if err != nil {
    log.Fatal(err)
}
remoteConfig2 := &muxfys.RemoteConfig{
    Accessor:  accessor,
    CacheData: true,
}

cfg := &muxfys.Config{
    Mount:     "/tmp/muxfys/mount",
    CacheBase: "/tmp",
    Retries:   3,
    Verbose:   true,
}

fs, err := muxfys.New(cfg)
if err != nil {
    log.Fatalf("bad configuration: %s\n", err)
}

err = fs.Mount(remoteConfig, remoteConfig2)
if err != nil {
    log.Fatalf("could not mount: %s\n", err)
}
fs.UnmountOnDeath()

// read from & write to files in /tmp/muxfys/mount, which contains the
// contents of mybucket/subdir and myotherbucket/another/subdir; writes will
// get uploaded to mybucket/subdir when you Unmount()

err = fs.Unmount()
if err != nil {
    log.Fatalf("could not unmount: %s\n", err)
}

logs := fs.Logs()

Extending

To add support for a new kind of remote file system or object store, simply implement the RemoteAccessor interface and supply an instance of that to RemoteConfig.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SetLogHandler

func SetLogHandler(h log15.Handler)

SetLogHandler defines how log messages (globally for this package) are logged. Logs are always retrievable as strings from individual MuxFys instances using MuxFys.Logs(), but otherwise by default are discarded.

To have them logged somewhere as they are emitted, supply a github.com/inconshreveable/log15.Handler. For example, supplying log15.StderrHandler would log everything to STDERR.

Types

type CacheTracker

type CacheTracker struct {
	sync.Mutex
	// contains filtered or unexported fields
}

CacheTracker struct is used to track what parts of which files have been cached.

func NewCacheTracker

func NewCacheTracker() *CacheTracker

NewCacheTracker creates a new *CacheTracker.

func (*CacheTracker) CacheDelete

func (c *CacheTracker) CacheDelete(path string)

CacheDelete should be used if you delete a cache file.

func (*CacheTracker) CacheOverride

func (c *CacheTracker) CacheOverride(path string, iv Interval)

CacheOverride should be used if you do something like delete a cache file and then recreate it and cache some data inside it. This is the slightly more efficient alternative to calling Delete(path) followed by Cached(path, iv).

func (*CacheTracker) CacheRename

func (c *CacheTracker) CacheRename(oldPath, newPath string)

CacheRename should be used if you rename a cache file on disk.

func (*CacheTracker) CacheTruncate

func (c *CacheTracker) CacheTruncate(path string, offset int64)

CacheTruncate should be used to update the tracker if you truncate a cache file. The internal knowledge of what you have cached for that file will then be updated to exclude anything beyond the truncation point.

func (*CacheTracker) CacheWipe

func (c *CacheTracker) CacheWipe()

CacheWipe should be used if you delete all your cache files.

func (*CacheTracker) Cached

func (c *CacheTracker) Cached(path string, iv Interval)

Cached updates the tracker with what you have now cached. Once you have stored bytes 0..9 in /abs/path/to/sparse.file, you would call: Cached("/abs/path/to/sparse.file", NewInterval(0, 10)).

func (*CacheTracker) Uncached

func (c *CacheTracker) Uncached(path string, iv Interval) Intervals

Uncached tells you what parts of a file in the given interval you haven't already cached (based on your prior Cached() calls). You would want to then cache the data in each of the returned intervals and call Cached() on each one afterwards.

type Config

type Config struct {
	// Mount is the local directory to mount on top of (muxfys will try to
	// create this if it doesn't exist). If not supplied, defaults to the
	// subdirectory "mnt" in the current working directory. Note that mounting
	// will only succeed if the Mount directory either doesn't exist or is
	// empty.
	Mount string

	// Retries is the number of times to automatically retry failed remote
	// system requests. The default of 0 means don't retry; at least 3 is
	// recommended.
	Retries int

	// CacheBase is the base directory that will be used to create cache
	// directories when a RemoteConfig that you Mount() has CacheData true but
	// CacheDir undefined. Defaults to the current working directory.
	CacheBase string

	// Verbose results in every remote request getting an entry in the output of
	// Logs(). Errors always appear there.
	Verbose bool
}

Config struct provides the configuration of a MuxFys.

type Interval

type Interval struct {
	Start int64
	End   int64
}

Interval struct is used to describe something with a start and end. End must be greater than start.

func NewInterval

func NewInterval(start, length int64) Interval

NewInterval is a convenience for creating a new Interval when you have a length instead of an end.

func (*Interval) Length

func (i *Interval) Length() int64

Length returns the length of this interval.

func (*Interval) Merge

func (i *Interval) Merge(j Interval) bool

Merge merges the supplied interval with this interval if they overlap or are adjacent. Returns true if a merge actually occurred.

func (*Interval) Overlaps

func (i *Interval) Overlaps(j Interval) bool

Overlaps returns true if this interval overlaps with the supplied one.

func (*Interval) OverlapsOrAdjacent

func (i *Interval) OverlapsOrAdjacent(j Interval) bool

OverlapsOrAdjacent returns true if this interval overlaps with or is adjacent to the supplied one.

type Intervals

type Intervals []Interval

Intervals type is a slice of Interval.

func (Intervals) Difference

func (ivs Intervals) Difference(iv Interval) Intervals

Difference returns any portions of iv that do not overlap with any of our intervals. Assumes that all of our intervals have been Merge()d in.

func (Intervals) Merge

func (ivs Intervals) Merge(iv Interval) Intervals

Merge adds another interval to this slice of intervals, merging with any prior intervals if it overlaps with or is adjacent to them. Returns the new slice of intervals, which have the property of not overlapping with or being adjacent to each other. They are also sorted by Start if Merge() was used to add all of them.

func (Intervals) Truncate

func (ivs Intervals) Truncate(pos int64) Intervals

Truncate removes all intervals that start after the given position, and truncates any intervals that overlap with the position. Assumes that all of our intervals have been Merge()d in.

type MuxFys

type MuxFys struct {
	pathfs.FileSystem

	log15.Logger
	// contains filtered or unexported fields
}

MuxFys struct is the main filey system object.

func New

func New(config *Config) (*MuxFys, error)

New returns a MuxFys that you'll use to Mount() your remote file systems or object stores, ensure you un-mount if killed by calling UnmountOnDeath(), then Unmount() when you're done. You might check Logs() afterwards. The other methods of MuxFys can be ignored in most cases.

func (*MuxFys) Access

func (fs *MuxFys) Access(name string, mode uint32, context *fuse.Context) fuse.Status

Access is ignored.

func (*MuxFys) Chmod

func (fs *MuxFys) Chmod(name string, mode uint32, context *fuse.Context) fuse.Status

Chmod is ignored.

func (*MuxFys) Chown

func (fs *MuxFys) Chown(name string, uid uint32, gid uint32, context *fuse.Context) fuse.Status

Chown is ignored.

func (*MuxFys) Create

func (fs *MuxFys) Create(name string, flags uint32, mode uint32, context *fuse.Context) (nodefs.File, fuse.Status)

Create creates a new file. mode and context are not currently used. When configured with CacheData the contents of the created file are only uploaded at Unmount() time.

func (*MuxFys) GetAttr

func (fs *MuxFys) GetAttr(name string, context *fuse.Context) (*fuse.Attr, fuse.Status)

GetAttr finds out about a given object, returning information from a permanent cache if possible. context is not currently used.

func (*MuxFys) Logs

func (fs *MuxFys) Logs() []string

Logs returns messages generated while mounted; you might call it after Unmount() to see how things went.

By default these will only be errors that occurred, but if this MuxFys was configured with Verbose on, it will also contain informational and warning messages.

If the muxfys package was configured with a log Handler (see SetLogHandler()), these same messages would have been logged as they occurred.

func (*MuxFys) Mkdir

func (fs *MuxFys) Mkdir(name string, mode uint32, context *fuse.Context) fuse.Status

Mkdir for a directory that doesn't exist yet. neither mode nor context are currently used.

func (*MuxFys) Mount

func (fs *MuxFys) Mount(rcs ...*RemoteConfig) error

Mount carries out the mounting of your supplied RemoteConfigs to your configured mount point. On return, the files in your remote(s) will be accessible.

Once mounted, you can't mount again until you Unmount().

If more than 1 RemoteConfig is supplied, the remotes will become multiplexed: your mount point will show the combined contents of all your remote systems. If multiple remotes have a directory with the same name, that directory's contents will in in turn show the contents of all those directories. If multiple remotes have a file with the same name in the same directory, reads will come from the first remote you configured that has that file.

func (*MuxFys) OnMount

func (fs *MuxFys) OnMount(nodeFs *pathfs.PathNodeFs)

OnMount prepares MuxFys for use once Mount() has been called.

func (*MuxFys) Open

func (fs *MuxFys) Open(name string, flags uint32, context *fuse.Context) (nodefs.File, fuse.Status)

Open is what is called when any request to read a file is made. The file must already have been stat'ed (eg. with a GetAttr() call), or we report the file doesn't exist. context is not currently used. If CacheData has been configured, we defer to openCached(). Otherwise the real implementation is in remoteFile.

func (*MuxFys) OpenDir

func (fs *MuxFys) OpenDir(name string, context *fuse.Context) ([]fuse.DirEntry, fuse.Status)

OpenDir gets the contents of the given directory for eg. `ls` purposes. It also caches the attributes of all the files within. context is not currently used.

func (fs *MuxFys) Readlink(name string, context *fuse.Context) (string, fuse.Status)

Readlink returns the destination of a symbolic link that was created with Symlink(). context is not currently used.

func (*MuxFys) RemoveXAttr

func (fs *MuxFys) RemoveXAttr(name string, attr string, context *fuse.Context) fuse.Status

RemoveXAttr is ignored.

func (*MuxFys) Rename

func (fs *MuxFys) Rename(oldPath string, newPath string, context *fuse.Context) fuse.Status

Rename only works where oldPath is found in the writeable remote. For files, first remotely copies oldPath to newPath (ignoring any local changes to oldPath), renames any local cached (and possibly modified) copy of oldPath to newPath, and finally deletes the remote oldPath; if oldPath had been modified, its changes will only be uploaded to newPath at Unmount() time. For directories, is only capable of renaming directories you have created whilst mounted. context is not currently used.

func (*MuxFys) Rmdir

func (fs *MuxFys) Rmdir(name string, context *fuse.Context) fuse.Status

Rmdir only works for non-existent or empty dirs. context is not currently used.

func (*MuxFys) SetXAttr

func (fs *MuxFys) SetXAttr(name string, attr string, data []byte, flags int, context *fuse.Context) fuse.Status

SetXAttr is ignored.

func (*MuxFys) StatFs

func (fs *MuxFys) StatFs(name string) *fuse.StatfsOut

StatFs returns a constant (faked) set of details describing a very large file system.

func (fs *MuxFys) Symlink(source string, dest string, context *fuse.Context) (status fuse.Status)

Symlink creates a symbolic link. Only implemented for temporary use when configured with CacheData: you can create and use symlinks but they don't get uploaded. context is not currently used.

func (*MuxFys) Truncate

func (fs *MuxFys) Truncate(name string, offset uint64, context *fuse.Context) fuse.Status

Truncate truncates any local cached copy of the file. Only currently implemented for when configured with CacheData; the results of the Truncate are only uploaded at Unmount() time. If offset is > size of file, does nothing and returns OK. context is not currently used.

func (fs *MuxFys) Unlink(name string, context *fuse.Context) fuse.Status

Unlink deletes a file from the remote system, as well as any locally cached copy. context is not currently used.

func (*MuxFys) Unmount

func (fs *MuxFys) Unmount(doNotUpload ...bool) error

Unmount must be called when you're done reading from/ writing to your remotes. Be sure to close any open filehandles before hand!

It's a good idea to defer this after calling Mount(), and possibly also call UnmountOnDeath().

In CacheData mode, it is only at Unmount() that any files you created or altered get uploaded, so this may take some time. You can optionally supply a bool which if true prevents any uploads.

If a remote was not configured with a specific CacheDir but CacheData was true, the CacheDir will be deleted.

func (*MuxFys) UnmountOnDeath

func (fs *MuxFys) UnmountOnDeath()

UnmountOnDeath captures SIGINT (ctrl-c) and SIGTERM (kill) signals, then calls Unmount() before calling os.Exit(1 if the unmount worked, 2 otherwise) to terminate your program. Manually calling Unmount() after this cancels the signal capture. This does NOT block.

func (*MuxFys) Utimens

func (fs *MuxFys) Utimens(name string, Atime *time.Time, Mtime *time.Time, context *fuse.Context) fuse.Status

Utimens only functions when configured with CacheData and the file is already in the cache; otherwise ignored. This only gets called by direct operations like os.Chtimes() (that don't first Open()/Create() the file). context is not currently used.

type RemoteAccessor added in v0.0.2

type RemoteAccessor interface {
	// DownloadFile downloads the remote source file to the local dest path.
	DownloadFile(source, dest string) error

	// UploadFile uploads the local source path to the remote dest path,
	// recording the given contentType if possible.
	UploadFile(source, dest, contentType string) error

	// UploadData uploads a data stream in real time to the remote dest path.
	// The reader is what the remote file system or object store reads from to
	// get the data it should write to the object at dest.
	UploadData(data io.Reader, dest string) error

	// ListEntries returns a slice of all the files and directories in the given
	// remote directory (or for object stores, all files and directories with a
	// prefix of dir but excluding those that have an additional forward slash).
	ListEntries(dir string) ([]RemoteAttr, error)

	// OpenFile opens a remote file ready for reading.
	OpenFile(path string, offset int64) (io.ReadCloser, error)

	// Seek should take an object returned by OpenFile() (from the same
	// RemoteAccessor implementation) and seek to the given offset from the
	// beginning of the file.
	Seek(path string, rc io.ReadCloser, offset int64) (io.ReadCloser, error)

	// CopyFile should do a remote copy of source to dest without involving the
	// the local file system.
	CopyFile(source, dest string) error

	// DeleteFile should delete the remote file at the given path.
	DeleteFile(path string) error

	// DeleteIncompleteUpload is like DeleteFile, but only called after a failed
	// Upload*() attempt.
	DeleteIncompleteUpload(path string) error

	// ErrorIsNotExists should return true if the supplied error (retrieved from
	// any of the above methods called on the same RemoteAccessor
	// implementation) indicates a file not existing.
	ErrorIsNotExists(err error) bool

	// ErrorIsNoQuota should return true if the supplied error (retrieved from
	// any of the above methods called on the same RemoteAccessor
	// implementation) indicates insufficient quota to write some data.
	ErrorIsNoQuota(err error) bool

	// Target should return a string describing the complete location details of
	// what the accessor has been configured to access. Eg. it might be a url.
	// It is only used for logging purposes, to distinguish this Accessor from
	// others.
	Target() string

	// RemotePath should return the absolute remote path given a path relative
	// to the target point the Accessor was originally configured with.
	RemotePath(relPath string) (absPath string)

	// LocalPath should return a stable non-conflicting absolute path relative
	// to the given local path for the given absolute remote path. It should
	// include directories that ensure that different targets with the same
	// directory structure and files get different local paths. The local path
	// returned from here will be used to decide where to cache files.
	LocalPath(baseDir, remotePath string) (localPath string)
}

RemoteAccessor is the interface used by remote to actually communicate with the remote file system or object store. All of the methods that return an error may be called multiple times if there's a problem, so they should be idempotent.

type RemoteAttr added in v0.0.2

type RemoteAttr struct {
	Name  string    // Name of the file, including its full path
	Size  int64     // Size of the file in bytes
	MTime time.Time // Time the file was last modified
	MD5   string    // MD5 checksum of the file (if known)
}

RemoteAttr struct describes the attributes of a remote file or directory. Directories should have their Name property suffixed with a forward slash.

type RemoteConfig added in v0.0.2

type RemoteConfig struct {
	// Accessor is the RemoteAccessor for your desired remote file system type.
	// Currently there is only one implemented choice: an S3Accessor. When you
	// make a new one of these (by calling NewS3Accessor()), you will provide
	// all the connection details for accessing your remote file system.
	Accessor RemoteAccessor

	// CacheData enables caching of remote files that you read locally on disk.
	// Writes will also be staged on local disk prior to upload.
	CacheData bool

	// CacheDir is the directory used to cache data if CacheData is true.
	// (muxfys will try to create this if it doesn't exist). If not supplied
	// when CacheData is true, muxfys will create a unique temporary directory
	// in MuxFys' CacheBase directory (these get automatically deleted on
	// Unmount() - specified CacheDirs do not). Defining this makes CacheData be
	// treated as true.
	CacheDir string

	// Write enables write operations in the mount. Only set true if you know
	// you really need to write.
	Write bool
}

RemoteConfig struct is how you configure what you want to mount, and how you want to cache.

type S3Accessor added in v0.0.2

type S3Accessor struct {
	// contains filtered or unexported fields
}

S3Accessor implements the RemoteAccessor interface by embedding minio-go.

func NewS3Accessor added in v0.0.2

func NewS3Accessor(config *S3Config) (*S3Accessor, error)

NewS3Accessor creates an S3Accessor for interacting with S3-like object stores.

func (*S3Accessor) CopyFile added in v0.0.2

func (a *S3Accessor) CopyFile(source, dest string) error

CopyFile implements RemoteAccessor by deferring to minio.

func (*S3Accessor) DeleteFile added in v0.0.2

func (a *S3Accessor) DeleteFile(path string) error

DeleteFile implements RemoteAccessor by deferring to minio.

func (*S3Accessor) DeleteIncompleteUpload

func (a *S3Accessor) DeleteIncompleteUpload(path string) error

DeleteIncompleteUpload implements RemoteAccessor by deferring to minio.

func (*S3Accessor) DownloadFile added in v0.0.2

func (a *S3Accessor) DownloadFile(source, dest string) error

DownloadFile implements RemoteAccessor by deferring to minio.

func (*S3Accessor) ErrorIsNoQuota

func (a *S3Accessor) ErrorIsNoQuota(err error) bool

ErrorIsNoQuota implements RemoteAccessor by looking for the QuotaExceeded error code.

func (*S3Accessor) ErrorIsNotExists added in v0.0.2

func (a *S3Accessor) ErrorIsNotExists(err error) bool

ErrorIsNotExists implements RemoteAccessor by looking for the NoSuchKey error code.

func (*S3Accessor) ListEntries added in v0.0.2

func (a *S3Accessor) ListEntries(dir string) ([]RemoteAttr, error)

ListEntries implements RemoteAccessor by deferring to minio.

func (*S3Accessor) LocalPath added in v0.0.2

func (a *S3Accessor) LocalPath(baseDir, remotePath string) string

LocalPath implements RemoteAccessor by including the initially configured host and bucket in the return value.

func (*S3Accessor) OpenFile added in v0.0.2

func (a *S3Accessor) OpenFile(path string, offset int64) (io.ReadCloser, error)

OpenFile implements RemoteAccessor by deferring to minio.

func (*S3Accessor) RemotePath added in v0.0.2

func (a *S3Accessor) RemotePath(relPath string) string

RemotePath implements RemoteAccessor by using the initially configured base path.

func (*S3Accessor) Seek added in v0.0.2

func (a *S3Accessor) Seek(path string, rc io.ReadCloser, offset int64) (io.ReadCloser, error)

Seek implements RemoteAccessor by deferring to minio.

func (*S3Accessor) Target added in v0.0.2

func (a *S3Accessor) Target() string

Target implements RemoteAccessor by returning the initial target we were configured with.

func (*S3Accessor) UploadData

func (a *S3Accessor) UploadData(data io.Reader, dest string) error

UploadData implements RemoteAccessor by deferring to minio.

func (*S3Accessor) UploadFile added in v0.0.2

func (a *S3Accessor) UploadFile(source, dest, contentType string) error

UploadFile implements RemoteAccessor by deferring to minio.

type S3Config added in v0.0.2

type S3Config struct {
	// The full URL of your bucket and possible sub-path, eg.
	// https://cog.domain.com/bucket/subpath. For performance reasons, you
	// should specify the deepest subpath that holds all your files.
	Target string

	// Region is optional if you need to use a specific region.
	Region string

	// AccessKey and SecretKey are your access credentials, and could be empty
	// strings for access to a public bucket.
	AccessKey string
	SecretKey string
}

S3Config struct lets you provide details of the S3 bucket you wish to mount. If you have Amazon's s3cmd or other tools configured to work using config files and/or environment variables, you can make one of these with the S3ConfigFromEnvironment() method.

func S3ConfigFromEnvironment added in v0.0.2

func S3ConfigFromEnvironment(profile, path string) (*S3Config, error)

S3ConfigFromEnvironment makes an S3Config with Target, AccessKey, SecretKey and possibly Region filled in for you.

It determines these by looking primarily at the given profile section of ~/.s3cfg (s3cmd's config file). If profile is an empty string, it comes from $AWS_DEFAULT_PROFILE or $AWS_PROFILE or defaults to "default".

If ~/.s3cfg doesn't exist or isn't fully specified, missing values will be taken from the file pointed to by $AWS_SHARED_CREDENTIALS_FILE, or ~/.aws/credentials (in the AWS CLI format) if that is not set.

If this file also doesn't exist, ~/.awssecret (in the format used by s3fs) is used instead.

AccessKey and SecretKey values will always preferably come from $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY respectively, if those are set.

If no config file specified host_base, the default domain used is s3.amazonaws.com. Region is set by the $AWS_DEFAULT_REGION environment variable, or if that is not set, by checking the file pointed to by $AWS_CONFIG_FILE (~/.aws/config if unset).

To allow the use of a single configuration file, users can create a non- standard file that specifies all relevant options: use_https, host_base, region, access_key (or aws_access_key_id) and secret_key (or aws_secret_access_key) (saved in any of the files except ~/.awssecret).

The path argument should at least be the bucket name, but ideally should also specify the deepest subpath that holds all the files that need to be accessed. Because reading from a public s3.amazonaws.com bucket requires no credentials, no error is raised on failure to find any values in the environment when profile is supplied as an empty string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL