horcrux

package module
v0.0.0-...-3124197 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2017 License: BSD-3-Clause, BSD-3-Clause Imports: 2 Imported by: 0

README

Horcrux - On Demand, Version controlled access to your Data

About Horcrux

Docker containers offer developers with the agility and flexibility of replicating the production/test setup in their development environment. So now, developers can develop features, unit test, fix issues in their local setup before pushing it to the test/production environment. Since containers are state-less, it can be moved anywhere easily, say within datacenter or into clouds etc. But in most cases, containers has to access/modify data that traditionally lives in a centralized storage. For example, one of the popular container stack LEMP, needs to access MySQL database. In order to access these data, Docker developed the concept of volume plugins, which can be associated with a container when it is created.

Now the container can move to a different location as long as the volume plugin works there as well. This solves the problem in production/test cases, but if developers have to access the centralized data, that again restricts their flexibily. At the same time, with ever increasing size of data, it is not possible to give each developer a separate copy of the data (say database). That would be prohibitively expensive. To solve this, we give you "Horcrux"...

What Horcrux provides?

  • Horcrux provides you (developer) a local view of the whole centralized Data (database etc), so you can develop/test your application without worrying about messing up your precious central repository.
  • Centralized repository can be located anywhere (local servers that provide scp access, Minio servers etc.) or in cloud (Amazon AWS S3, Microsoft Azure, Google Cloud etc.), so you are free to access it from anywhere (within your office, at home, in-flight (just kidding)...
  • The data volume is visible as a local FUSE filesystem in the developer/test environment.
  • When the data is accessed by the application (containers), only the particular chunk of data needed is fetched from the remote repository on-demand and stored locally (in the cache). The whole access is transparent to the applicaiton/container.
  • Since only portion of data accessed is retrieved and stored locally, you don't have to buy terabytes of storage for each developers setup or test machine.
  • When the working data set is accessed next time around, it is served from the local cache, blazingly fast (almost, as fast as the local file system/storage :))
  • The local view provided by Horcrux is a read/write view, so the application/container can modify the data locally.
  • and can view, at any time, what is changed.

In future versions, we will add git like capabilities, so you can:

  • Commit the local changes, and push it to the centralized repo with a comment (only modified portion is pushed).
  • Browse through changes in the remote repository and,
  • Access (mount) any version of remote data locally (roll back/forward) to develop/ troubleshoot issues with ease.
  • Bestof all, you don't have to do any evil spell (just a few good ones).

We would like to call it a git for DB (but technically it is not the same :), since it will provide all git compatible commands (may even provide a git extension) so you can do pretty much all things with your data that you are already doing with git for your source code.

Getting started

Steps Overview:

  1. Install Horcrux

  2. Generate a Horcrux version for your central data

  3. Place the Horcrux version of your data anywhere you like (local servers within your LAN, AWS S3 etc). We suggest putting it in more than one place. If you don't know yet, check out a cool project, Minio object store server. It can be used to store Horcrux as well.

  4. In the development or test environment: Create Docker volumes using Horcrux volume driver and specifying where the remote data is stored

  5. Now the volumes can be used within your containers as data volumes.

#To Generate Horcrux version of the data:

Step 1: Install Horcrux

Horcrux consists of two binaries, horcrux-cli and horcrux-dv

  • horcrux-cli: Used to generate Horcrux version of the data.
  • horcrux-dv: A volume driver plugin for Docker.

Download the latest binary copy of horcrux-cli from:

For Linux
For OSX
NOTE: Need help to generate binaries for OSX
From Source Code:
Step 2: Generate Horcrux version of the data [Reducto Spell]
horcrux-cli generate
NAME:
   ./horcrux-cli generate - [options] <name> <in-dir> <out-dir>

USAGE:
   ./horcrux-cli generate [command options] [arguments...]

OPTIONS:
   --chunksize, -s "64M"	Chunk Size

Lets consider an example of MySQL database stored in database server "kural". We name the database as "AMCC" (some meaningful name).

  • Original Database Server: kural
  • Name of the database: AMCC
  • Location of MySQL files: /var/lib/mysql
    For this example we use all files including log files :)

alt text

[Optional] Validate the generated Horcrux (in the Database server):
  • Server used to validate: kural
  • Mount point used: /mnt/horcrux
  • Horcrux location: /opt/horcrux
  • Access Method: Local "cp" (a.k.a Locket)
    NOTE: full path after "cp://", so three slashes in total

alt text

Distribute Horcrux to remote repositories
  • For AWS S3, you can use "aws s3 sync" on /opt/horcrux-amcc
  • You can also replicate the Horcrux to a local server and give SSH access to your developers

That's it... now Horcrux can be accessed by multiple developers simultaneously, and with minimal storage in their development/test machines.

Steps to do to work with the Horcrux generated above

If a developer wants to access the Horcrux, he/she needs to do the "Revelo" spell as follows

Step 1: Install Horcrux
  • Please refer to Install section above
Step 2: Install and Configure FUSE
  • FUSE should be installed default in most distros (Ubuntu/Fedora/CentOS/RHEL). If not, please check your distro doc on how to install FUSE packages.
  • Edit /etc/fuse.conf and add "user_allow_other" line
    user_allow_other
    
    NOTE: Right now we use "Allow Other" FUSE mount option so the mount point can be accessed by any user within the local system. We need this so the apps within containers can access the bind mountpoint. It may not be a problem right now since developers machine are mostly used by one (or few trusted) user(s). Will be addressed in future release.
  • Make /etc/fuse.conf readable
    # chmod a+r /etc/fuse.conf
    
Step 3: Start horcrux-dv volume plugin
# horcrux-dv >& /var/log/horcrux-dv.log &
Step 4: Create Docker volumes using Horcrux Volume driver
  • Docker volume "v1" that uses SCP access from remote server kural

    - Here docker volume name is "v1"
    
    - "-d": specifies the Docker volume driver as horcrux
    
    - We use the -o to pass options to our horcrux volume driver
    
    - first option: "--name=AMCC", here we give the same name that was used in generate step
    
    - second option: "--access=scp://muthu@kural:/opt/horcrux-mysql-amcc" specifies the access method as SCP and the remote location as "kural:/opt/horcrux-mysql-amcc"
    
  • Docker volume "v2" that uses AWS S3 as remote location

    - Here the bucket name is "muthu.horcrux"
    
    - Region is "us-west-1"
    
    NOTE: AWS credentials are in ~/.aws/credentials

alt text

Step 5: Create Docker containers with volume v1
  • Create container test-scp using volume v1 alt text

  • Inspect container test-scp for mount points alt text

Container test-scp can now access the all MySQL files inside /data directory

That's pretty much it...

Happy hacking!!

Muthukumar. R - m u t h u r AT g m a i l DOT c o m

ACKNOWLEDGEMENTS

Many thanks to:

Known Issues:

  • This is version 00.03-rc, so that pretty much explains it (but not bad at all, give it a shot and let me know)
  • Only tested on Linux systems (latest version of Fedora, Ubuntu, CentOS)
  • Some of local FS calls (Rename) is not there yet - just a matter of adding it, let me know if its needed badly :)
  • Open flags like O_EXCL is b0rked (not hard to fix though, next version)
  • With scp access, volumes are not visible inside the container consistently. If you experience this, you can workaround by creating a temp container with that volume and leave it running while you create/manage other containers for that volume.
  • Cache is not cleaned up after "docker volume rm". If it grows big, please clean it up manually for now.

Need Help:

  • Use it and let me know if it helps you..
  • Help in supporting other access methods (Google Cloud, Azure, etc..)
  • More testing and bug reports (see reporting issues)
  • Mac support
  • E m a i l m e : m u t h u r AT g m a i l DOT c o m

Release Notes:

  • Version 00.02 (02/06/2016)

    • Support for Docker 1.10+ volume plugin changes (Get, List)
  • Version 00.01 (02/05/2016)

    • Supported access methods - CP, SCP, MINIO, AWS S3
    • Tested only on Linux systems (Fedora, Ubuntu, CentOS)
    • Versioning is not yet there (coming soon...)

Documentation

Overview

Global definitions

Index

Constants

View Source
const (
	VERMAJOR = "00"
	VERMINOR = "10"
	VEREXTRA = ""
	VERSION  = VERMAJOR + "." + VERMINOR + VEREXTRA

	MINVER   = 1
	MAXVER   = 1000
	STARTVER = MINVER

	CHUNKSIZE_MIN         = (1 << 20)  // 1M
	CHUNKSIZE_DEFAULT     = (64 << 20) // 64M
	CHUNKSIZE_DEFAULT_STR = "64M"
)
View Source
const (
	CHUNK_TYPE_STATIC = 1 + iota
	CHUNK_TYPE_ROLLSUM
)
View Source
const LOGLEVEL = log.InfoLevel

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Version   string `json:"Version"`
	ChunkType int    `json:"Chunk Type"`
	ChunkSize int    `json:"Chunk Size"`
}

type Entry

type Entry struct {
	Name      string `json:"Name"`
	Prefix    string `json:"Prefix"`
	IsDir     bool   `json:"IsDir"`
	Stat      Stat   `json:"Stat"`
	NumChunks int64  `json:"Number of Chunks"`
}

type Meta

type Meta struct {
	Config   Config  `json:"Config"`
	CurrVer  string  `json:"Current Version"`
	NumFiles int     `json:"Num Files"`
	Entries  []Entry `json:"Entry List"`
}

type Stat

type Stat struct {
	Mode os.FileMode `json:"Mode"`
	Size int64       `json:"Size"`
	Uid  uint32      `json:"Uid"` //XXX Get from running pid?
	Gid  uint32      `json:"Gid"` //XXX Get from running pid?

}

Directories

Path Synopsis
Access interface - Provides back end chunk read/write interface
Access interface - Provides back end chunk read/write interface
cp
Implements local host cp access - mostly for testing and verifying before uploading to remote location
Implements local host cp access - mostly for testing and verifying before uploading to remote location
s3
Implements AWS S3 access interface
Implements AWS S3 access interface
scp
Many thanks to (1): https://blogs.oracle.com/janp/entry/how_the_scp_protocol_works and (2): https://gist.github.com/jedy/3357393 for pointing to (1) TODO: - VErify ctrl message and use its len ???
Many thanks to (1): https://blogs.oracle.com/janp/entry/how_the_scp_protocol_works and (2): https://gist.github.com/jedy/3357393 for pointing to (1) TODO: - VErify ctrl message and use its len ???
bazil-fuse
fuse
Package fuse enables writing FUSE file systems on Linux, OS X, and FreeBSD.
Package fuse enables writing FUSE file systems on Linux, OS X, and FreeBSD.
fuse/examples/clockfs
Clockfs implements a file system with the current time in a file.
Clockfs implements a file system with the current time in a file.
fuse/examples/hellofs
Hellofs implements a simple "hello world" file system.
Hellofs implements a simple "hello world" file system.
fuse/fs/bench
Package bench contains benchmarks.
Package bench contains benchmarks.
fuse/syscallx
Package syscallx provides wrappers that make syscalls on various platforms more interoperable.
Package syscallx provides wrappers that make syscalls on various platforms more interoperable.
* TODO: * - SetAttr * - Rename * - Fix "du" ? * - clean-up cache dirs for removed dirs/files
* TODO: * - SetAttr * - Rename * - Fix "du" ? * - clean-up cache dirs for removed dirs/files
* TOCHECK * - Why scp doesnt bind mount first time
* TOCHECK * - Why scp doesnt bind mount first time

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL