copyondemand

package module
v0.0.0-...-4384058 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2020 License: Apache-2.0 Imports: 20 Imported by: 0

README

What is it?

copy-on-demand copies a file from one place to another, while exposing a block device that allows data to be read and written to the target file during the copy.

This is useful in scenarios where the source file is high latency and/or limited bandwidth and it is critical that the file be usable during the copy operation (e.g. disk images living on a remote NFS server, and you'd like to run a virtual machine based on that image while the disk is copying). In the NFS scenario this also allows for offloading disk write operations to the target machines, rather than potentially overloading the NFS server.

Miscellaneous other features:

  • Driver is aware of sparse source files, so does not copy unnecessary 0's
  • Configurable background copy speed, with support for all copy-on-demand processes sharing the same bandwidth limit
  • Copy operations are optionally resumable - even in the case of driver crashes

How?

This project utilizes the "nbd" kernel module. This allows a user space application to service all the reads/writes of a block device (in the form of /dev/nbd[0-127]).

The block driver splits the source file into chunks (currently 4k), and for each chunk:

  • Read
    • If the chunk is already on the target, pass through the read to the backing file on the target
    • If the chunk is not on the target, read the chunk from the source, serve the bytes back to the user, and queue the chunk to be flushed to disk
  • Write
    • Writes are always sent to the target disk
      • [Book keeping is necessary to ensure writes that aren't aligned with our block size are reconciled, this is described in detail in the "Developer Docs" section]

Notice that all bytes on the source are read exactly once. And no writes are ever sent back to our (presumptively slow) source file. This reduces the load, and improves the performance, of a networked filesystem for i/o intensive workloads.

Prerequisites

Load nbd + set the max number nbd devices (/dev/nbd[0-127]):

/sbin/modprobe nbd nbds_max=128

It's highly recommended that you use a linux kernel >4.4. However, there are mitigations in the code to allow copy-on-demand to run in 4.4.

CLI Usage:

./copy-on-demand [-v] [-vv] /dev/nbdX [source file] [backing file]

This will causes /dev/nbdX to immediately appear as if it has the full content + size of [source file]. In the background the content will be transferred from [source file] to [backing file].

Any read/writes to /dev/nbdX will first pull the blocks from [source file] (if necessary), and then perform the requested read/write operation.

Upon completion of the background transfer, all read/writes become passthrough to the backing file.

Building

Go 1.11+ is required for building.

Licensing

This software is licensed under the Apache Software License, version 2.0, as described in the LICENSE file, with the notable exception of the modified vendored copy of buse-go, which is MIT licensed, as described in LICENSE.buse.

Developer docs

Basic flow

Fundamentally copy-on-demand just copies a file from one location to another. It is agnostic to where these files are - but assumes in many places that the source file is very slow/far away (i.e. over NFS) and that the target file is relatively fast/close by (i.e. a spinning hard disk attached to the machine we're running on). During this copy, the target file appears to be fully available through an nbd block device (/dev/nbdX), even though some or all of the bytes may only be on the source. Blocks that are only available on the source will be pulled "on demand" and served back to the requesting process.

There are 2 main threads: the "dynamic thread" - this thread serves reads/writes coming from the user, the disk write thread - this thread flushes blocks read from the source disk to the target disk. The dynamic thread is pooled, so many dynamic reads/writes can be happening at once, whereas the queue is always singly threaded.

It is guaranteed that no dynamic actions will affect overlapping blocks - this is so we don't have to worry at any layer about serializing reads/writes affecting the same blocks.

Concepts

As a supplement to these definitions, most of these concepts are visualized here.

  • Block
    • Source and target are divided into arbitrarily sized blocks (const.BlockSize)
  • Block map
    • Stores the state of blocks
    • Blocks can be either "synced" or "unsynced"
    • Synced indicates that it is safe to read/write directly on the target
    • Unsynced indicates that the block only exists on the source
  • Dirty block map
    • Stores blocks that have been partially written on the target, and the ranges of the block that have been written
  • Range lock
    • Locks a range of blocks
    • Any thread reading or writing to the target disk is responsible for holding a range lock for any blocks it operates on
    • State updates of either the block map or dirty block map require a range lock for any blocks that may be updated
  • Disk flush queue
    • Contains blocks that have been read from the source, but have not yet been flushed to the target disk

Entry point(s)

CLI

  • main is in ./cmd/copyondemand.go, main is responsible for:
    • Initializing the buse driver
    • Signal handling (^C)
    • (Optionally) Polling the health of the CoD driver to display to the user, via the SampleRate function
  • Requests coming from user space are sent (through the buse driver) to ./file_backed_device.go ReadAt and WriteAt
  • Disk flushes + dirty block reconciliation are performed in write_worker.go ProcessQueue

Use as a library

  • copy-on-demand can also be used as a library rather than a pre-built CLI tool
  • This is useful when the "source" isn't a traditional file (e.g. perhaps a file stored in chunks on an object store)
  • A basic library example can be found in ./examples/

Read walkthrough

  • Read request comes in from user
  • Determine if all affected blocks are synced
    • If yes pass through read
    • return
  • Read any blocks from source that appeared to be unsynced
  • Obtain a range lock for any affected blocks
  • If any blocks are still unsynced or dirty (i.e. they weren't fixed while we were locked)
    • Read the blocks from source
    • Reconcile any dirty blocks with their target state
    • Enqueue any blocks read from source to the disk flush queue
  • Read any synced blocks
  • Merge synced blocks, with blocks dynamically pulled from source
    • (In practice these "merge" scenarios should be rare, disk boots generally result in reading and writing to ranges that are either fully synced or fully unsynced)
  • Return read bytes to caller

Write walkthrough

  • Write request comes in from user
  • Obtain a range lock for any affected blocks
  • For any blocks that are fully overwritten by this operation, convert the blocks to "synced" in the block map
  • For any partially written blocks, calculate the range of the block that will be overwritten, and submit it to the dirty block map
  • Enqueue any dirty blocks to the disk flush queue
  • Write to target disk

Disk flush queue walkthrough

  • Dequeue item
  • If the item contains data
    • Lock the range of affected blocks
    • If any blocks have become "synced" (i.e. by a subsequent overwrite), discard them
    • For any dirty blocks, read from the target disk => apply the writes in memory
    • Write remaining and updated blocks to target
  • If item contains a dirty block id
    • Ensure the block hasn't become synced before the dequeue
    • Read block from source
    • Obtain a range lock for the block id
    • Read block from target
    • Reconcile block in memory
    • Flush to target disk

Documentation

Index

Constants

View Source
const BlockSize = 4096 // 4k

BlockSize is the chunk size (in bytes) that are transferred between the source and backing files

Variables

This section is empty.

Functions

func BytesToHumanReadable

func BytesToHumanReadable(b uint64) string

BytesToHumanReadable converts a raw byte count to a human readable value (e.g. 1024 becomes '1KB')

func FindMaxProcessesCounts

func FindMaxProcessesCounts(reader FileSystem, log *logrus.Logger, progressFiles []string) (int, error)

FindMaxProcessesCounts returns the largest integer found from searching the files found at the location of each progress file path

Types

type BlockRange

type BlockRange struct {
	Start uint64
	End   uint64
}

BlockRange defines a contiguous, inclusive range of blocks

type DiskActionType

type DiskActionType int

DiskActionType is an enum for read/write actions happening on a disk

const (
	// BackingWrite is a write to the backing disk
	BackingWrite DiskActionType = 0
	// BackingRead is a read to the backing disk
	BackingRead DiskActionType = 1
	// SourceRead is a read to the source disk
	SourceRead DiskActionType = 2
	// SourceWrite is a write to the source disk
	SourceWrite DiskActionType = 3
)

type DriverConfig

type DriverConfig struct {
	Source               *SyncSource
	Backing              *SyncFile
	NbdFileName          string
	ProcessFiles         []string
	Fs                   FileSystem
	Log                  *logrus.Logger
	EnableBackgroundSync bool
	Resumable            bool
}

DriverConfig contains the needed data to construct a driver If you are using a traditional filesystem (i.e. not overwriting the source or backing file interfaces) you can use NewFileBackedDevice to construct a driver based on on-disk file names.

type File

type File interface {
	ReadOnlyFile
	io.Closer
	io.Reader
	io.Seeker
	io.Writer
	io.WriterAt
	Stat() (os.FileInfo, error)
	Sync() error
	Truncate(size int64) error
}

File provides an interface for the native file struct

type FileBackedDevice

type FileBackedDevice struct {
	Source      *SyncSource
	BackingFile *SyncFile

	SetSynced context.CancelFunc
	// contains filtered or unexported fields
}

FileBackedDevice is the main BUSE driver object. The BUSE driver calls functions on this struct when associated read/write operations are sent from the kernel.

func New

func New(config *DriverConfig) (*FileBackedDevice, error)

New constructs a FileBackedDevice with potentially overwritten source and backing file interfaces. If you are using a traditional filesystem (i.e. not overwriting the source or backing file interfaces) you can use NewFileBackedDevice to construct a driver based on on-disk file names.

func NewFileBackedDevice

func NewFileBackedDevice(
	sourceFileName string,
	backingFileName string,
	nbdFileName string,
	processFiles []string,
	fs FileSystem,
	log *logrus.Logger,
	enableBackgroundSync bool,
	resumable bool,
) (*FileBackedDevice, error)

NewFileBackedDevice constructs a FileBackedDevice based on a source file

func (*FileBackedDevice) CheckSynced

func (d *FileBackedDevice) CheckSynced()

CheckSynced checks if the backing file is fully synced, and if so cancels the sync cancellation context

func (*FileBackedDevice) Connect

func (d *FileBackedDevice) Connect() error

Connect is a blocking function that initiates the NBD device driver

func (*FileBackedDevice) Disconnect

func (d *FileBackedDevice) Disconnect()

Disconnect terminates the NBD driver connection. This call blocks while write queues are flushing, and the intent log is finalizing. Ending the program without calling disconnect will be treated as a crash for the purposes of resuming.

func (*FileBackedDevice) DriverDisconnect

func (d *FileBackedDevice) DriverDisconnect()

DriverDisconnect is called by the BUSE driver in response to disconnect requests from the kernel.

func (*FileBackedDevice) EnqueueAllDirtyBlocks

func (d *FileBackedDevice) EnqueueAllDirtyBlocks()

EnqueueAllDirtyBlocks adds all dirty blocks to the write queue

func (*FileBackedDevice) Finalize

func (d *FileBackedDevice) Finalize()

Finalize runs any necessary cleanup tasks

func (*FileBackedDevice) Flush

func (d *FileBackedDevice) Flush() error

Flush is called by the BUSE driver in response to flush requests from the kernel.

func (*FileBackedDevice) GetBackgroundCopyRate

func (d *FileBackedDevice) GetBackgroundCopyRate() uint64

GetBackgroundCopyRate gets the current background copy rate

func (*FileBackedDevice) IsFullySynced

func (d *FileBackedDevice) IsFullySynced() bool

IsFullySynced returns true if the backing file is ready to use directly

func (*FileBackedDevice) ProcessQueues

func (d *FileBackedDevice) ProcessQueues(ctx context.Context, waitGroup *sync.WaitGroup)

ProcessQueues is a non-blocking function that begins required background processing for this FileBackedDevice. Backgrounded processes increment the provided WaitGroup, and self terminate when cancellation is signalled on the provided Context.

func (*FileBackedDevice) ReadAt

func (d *FileBackedDevice) ReadAt(p []byte, off uint64) error

ReadAt is called by the BUSE driver in response to read requests from the kernel. * If there are no unsynced blocks in the range, pass through the read to the backing file * If there are unsynced blocks in the range

  • Do a continuous read from source from the first unsynced block => last unsynced block
  • Read any synced blocks from the backing file
  • If any unsynced blocks are dirty, reconcile them using the dirty block map
  • Enqueue the reconciled buffer to be flushed to disk
  • Return requested data to the user

func (*FileBackedDevice) Resume

func (d *FileBackedDevice) Resume()

Resume reads any on-disk block maps that were written by previous executions and initialized the write intent log

func (*FileBackedDevice) SampleRate

func (d *FileBackedDevice) SampleRate(actionType DiskActionType, intervalMilliseconds uint64) uint64

SampleRate returns the rate (in bytes per second) for the given interval

func (*FileBackedDevice) SetBackgroundCopyRate

func (d *FileBackedDevice) SetBackgroundCopyRate(rateInBytesPerSecond uint64) bool

SetBackgroundCopyRate sets the current background copy rate Returns whether the value was updated

func (*FileBackedDevice) TotalSyncedBlocks

func (d *FileBackedDevice) TotalSyncedBlocks() uint64

TotalSyncedBlocks returns how many blocks are present on the backing file

func (*FileBackedDevice) Trim

func (d *FileBackedDevice) Trim(off, length uint64) error

Trim is called by the BUSE driver in response to trim requests from the kernel.

func (*FileBackedDevice) WriteAt

func (d *FileBackedDevice) WriteAt(p []byte, off uint64) error

WriteAt is called by the BUSE driver in response to write requests from the kernel. * Write requests are always passed straight to the backing file * Any blocks in the middle of a write are considered "synced" since they are fully overwritten * Any partially written blocks are recorded in the dirty block map, and enqueued to be fixed by the flush queue

type FileBlock

type FileBlock struct {
	// contains filtered or unexported fields
}

FileBlock holds a mutex for a block and whether it has been synced to the backing file

type FileSystem

type FileSystem interface {
	Open(name string) (File, error)
	Stat(name string) (os.FileInfo, error)
	OpenFile(name string, flag int, perm os.FileMode) (File, error)
	Rename(oldpath, newpath string) error
	Remove(name string) error
	ReadFile(name string) ([]byte, error)
}

FileSystem is an interface wrapper to the buildin os filesystem operations, for unit testability

type LocalFs

type LocalFs struct{}

LocalFs implements FileSystem using the local disk.

func (LocalFs) Open

func (LocalFs) Open(name string) (File, error)

Open opens a file with the native os function

func (LocalFs) OpenFile

func (LocalFs) OpenFile(name string, flag int, perm os.FileMode) (File, error)

OpenFile opens a file with the native os function

func (LocalFs) ReadFile

func (LocalFs) ReadFile(name string) ([]byte, error)

ReadFile reads the full binary content of a provided file

func (LocalFs) Remove

func (LocalFs) Remove(name string) error

Remove removes a file using the native os function

func (LocalFs) Rename

func (LocalFs) Rename(oldpath, newpath string) error

Rename renames a file using the native os function

func (LocalFs) Stat

func (LocalFs) Stat(name string) (os.FileInfo, error)

Stat stats a file with the native os function

type MockFile

type MockFile struct {
	mock.Mock
	File
	// contains filtered or unexported fields
}

MockFile is a fake file

func (*MockFile) Close

func (mf *MockFile) Close() error

Close mocks the close call

func (*MockFile) Fd

func (mf *MockFile) Fd() uintptr

Fd mocks a file descriptor

func (*MockFile) ReadAt

func (mf *MockFile) ReadAt(b []byte, off int64) (n int, err error)

ReadAt mocks the read call, and always reads 1's

func (*MockFile) Stat

func (mf *MockFile) Stat() (os.FileInfo, error)

Stat mocks the file stat call

func (*MockFile) Sync

func (mf *MockFile) Sync() error

Sync mocks the sync call

func (*MockFile) Truncate

func (mf *MockFile) Truncate(size int64) error

Truncate mocks the truncate call

func (*MockFile) Write

func (mf *MockFile) Write(b []byte) (n int, err error)

Write mocks the write call

func (*MockFile) WriteAt

func (mf *MockFile) WriteAt(b []byte, off int64) (n int, err error)

WriteAt mocks the writeat call

type MockFileInfo

type MockFileInfo struct {
	mock.Mock
	os.FileInfo
}

MockFileInfo returns fake file info

func (*MockFileInfo) Name

func (mfi *MockFileInfo) Name() string

Name returns a fake name

func (*MockFileInfo) Size

func (mfi *MockFileInfo) Size() int64

Size returns a fake size

type MockFs

type MockFs struct {
	mock.Mock
	FileSystem
}

MockFs is a filesystem that isn't real

func (*MockFs) OpenFile

func (mfs *MockFs) OpenFile(name string, flag int, perm os.FileMode) (File, error)

OpenFile mocks the openfile os call

func (*MockFs) ReadFile

func (mfs *MockFs) ReadFile(name string) ([]byte, error)

ReadFile mocks the readfile os call

func (*MockFs) Remove

func (mfs *MockFs) Remove(name string) error

Remove mocks the remove os call

func (*MockFs) Rename

func (mfs *MockFs) Rename(oldname, newname string) error

Rename mocks the rename os call

func (*MockFs) Stat

func (mfs *MockFs) Stat(name string) (os.FileInfo, error)

Stat mocks the stat os call

type QueueType

type QueueType int

QueueType is an enum for the type of queue you want an action from

const (
	// DynamicQueue serves actions from the user
	DynamicQueue QueueType = 0
	// BackgroundQueue copies blocks in the background
	BackgroundQueue QueueType = 1
)

type QueuedWriteAction

type QueuedWriteAction struct {
	// contains filtered or unexported fields
}

QueuedWriteAction contains the type, affected block range, and optional data to be written to disk

type RangeLocker

type RangeLocker struct {
	// contains filtered or unexported fields
}

RangeLocker implements "row level" locking for ranges of elements

func NewRangeLocker

func NewRangeLocker(blockRangePool *sync.Pool) *RangeLocker

NewRangeLocker returns a new RangeLocker

func (*RangeLocker) LockRange

func (rl *RangeLocker) LockRange(start uint64, end uint64)

LockRange locks the range of elements defined by start and end (both inclusive). This function blocks until it can obtain an exclusive lock on the provided range

func (*RangeLocker) UnlockRange

func (rl *RangeLocker) UnlockRange(start uint64, end uint64) error

UnlockRange unlocks an existing range lock, and returns error if no matching range is found

type ReadOnlyFile

type ReadOnlyFile interface {
	io.ReaderAt
	Fd() uintptr
}

ReadOnlyFile provides an interface for the read functions of the native file struct

type SyncFile

type SyncFile struct {
	File File
	Size uint64
}

SyncFile is a simple struct to hold a file pointer and the Stat'd size of the file

type SyncSource

type SyncSource struct {
	File ReadOnlyFile
	Size uint64
}

SyncSource is a struct that points to the read-only source

type WriteActionType

type WriteActionType int

WriteActionType is an enum for the type of action that can be in the writer queue

const (
	// WriteData actions flush data to disk that was already read from source by a dynamic read action
	WriteData WriteActionType = 0
	// FixDirtyBlock actions sync data from source, to fully sync blocks that were partially written on the target
	FixDirtyBlock WriteActionType = 1
	// SyncBlock actions sync data from source and write them to the backing file
	SyncBlock WriteActionType = 2
)

type WriterQueue

type WriterQueue struct {
	// contains filtered or unexported fields
}

WriterQueue is a thin wrapper to a channel, which allows for limiting the amount of data in the channel at once

func NewWriterQueue

func NewWriterQueue() *WriterQueue

NewWriterQueue initializes a writer queue

func (*WriterQueue) Dequeue

func (wq *WriterQueue) Dequeue(includeBackgroundItems bool) *QueuedWriteAction

Dequeue attempts to pull an item off the queue. This method is non-blocking if the queue contains no items, this function returns nil. If includeBackgroundItems is true you _may_ also get background block sync actions.

func (*WriterQueue) MakeWriteAction

func (wq *WriterQueue) MakeWriteAction() *QueuedWriteAction

MakeWriteAction constructs a QueuedWriteAction struct

func (*WriterQueue) PutWriteAction

func (wq *WriterQueue) PutWriteAction(wa *QueuedWriteAction)

PutWriteAction adds a write action back to the pool

func (*WriterQueue) TryDequeue

func (wq *WriterQueue) TryDequeue(waitMilliseconds int, includeBackgroundItems bool) *QueuedWriteAction

TryDequeue attempts to pull an item off the queue, if no message has arrived in waitMilliseconds this method returns nil. If includeBackgroundItems is true you _may_ also get background block sync actions

func (*WriterQueue) TryEnqueue

func (wq *WriterQueue) TryEnqueue(wa *QueuedWriteAction, timeoutMilliseconds int) bool

TryEnqueue attempts to add a write action to the write queue. timeoutMilliseconds = 0 indicates that this function should block

Directories

Path Synopsis
cmd
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL