dvid

command module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 18, 2023 License: BSD-3-Clause Imports: 0 Imported by: 0

README

DVID Picture

Go Report Card GoDoc

DVID is a Distributed, Versioned, Image-oriented Dataservice written to support neural reconstruction, analysis and visualization efforts at HHMI Janelia Research Center. It provides storage with branched versioning of a variety of data necessary for our research including teravoxel-scale image volumes, JSON descriptions of objects, sparse volumes, point annotations with relationships (like synapses), etc.

Its goal is to provide:

  • A framework for thinking of distribution and versioning of large-scale scientific data similar to distributed version control systems like git.
  • Easily extensible data types (e.g., annotation, keyvalue, and labelmap in figure below) that allow tailoring of APIs, access speeds, and storage space for different kinds of data.
  • The ability to use a variety of storage systems via plugin storage engines, currently limited to systems that can be viewed as (preferably ordered) key-value stores.
  • A stable science-driven HTTP API that can be implemented either by native DVID data types or by proxying to other services.

High-level architecture of DVID

How it's different from other forms of versioned data systems:

  • DVID handles large-scale data as in billions or more discrete units of data. Once you get to this scale, storing so many files can be difficult on a local file system or impose a lot of load even on shared file systems. Cloud storage is always an option (and available in some DVID backends) but that adds latency and doesn't reduce transfer time of such large numbers of files or data chunks. Database systems (including embedded ones) handle this by consolidating many bits of data into larger files. This can also be described as a sharded data approach.
  • All versions are available for queries. There is no checkout to read committed data.
  • The high-level science API uses pluggable datatypes. This allows clients to operate on domain-specific data and operations rather than operations on generic files.
  • Data can be flexibly assigned to different types of storage, so tera- to peta-scale immutable imaging data can be kept in cloud storage while smaller, frequently mutated label data can be kept on fast local NVMe SSDs. This also allows data to be partitioned across databases by data instance. Our recent datasets primarily hold local data in Badger embedded databases, also written in the Go language.
  • (Work in progress) A newer storage backend (DAGStore) will allow "chained storage" such that data published at a particular version, say on AWS Open Data, could be reused for later versions with only new modifications stored locally. This requires extending storage flexibility to versions of data across storage locations. DAGStore will greatly simplify "pull requests" where just the changes within a set of versions are transmitted between separate DVID servers.

While much of the effort has been focused on the needs of the Janelia FlyEM Team, DVID can be used as a general-purpose branched versioning file system that handles billions of files and terabytes of data by creating instances of the keyvalue datatype. Our team uses the keyvalue datatype for branched versioning of JSON, configuration, and other files using the simple key-value HTTP API.

DVID aspires to be a "github for large-scale scientific data" because a variety of interrelated data (like image volume, labels, annotations, skeletons, meshes, and JSON data) can be versioned together. DVID currently handles branched versioning of large-scale data and does not provide domain-specific diff tools to compare data from versions, which would be a necessary step for user-friendly pull requests and truly collaborative data editing.

Table of Contents

Installation

Users should install DVID from the releases. The main branch of DVID may include breaking changes required by our research work.

Developers should consult the install README where our conda-based process is described.

DVID has been tested on MacOS X, Linux (Fedora 16, CentOS 6, Ubuntu), and Windows Subsystem for Linux (WSL2). It comes out-of-the-box with several embedded key-value databases (Badger, Basho's leveldb) for storage although you can configure other storage backends.

Before launching DVID, you'll have to create a configuration file describing ports, the types of storage engines, and where the data should be stored. Both simple and complex sample configuration files are provided in the scripts/distro-files directory.

Basic Usage

Some documentation is available on the DVID wiki for how to start the DVID server. While the wiki's User Guide provides simple console-based toy examples, please note that how our team uses the DVID services is much more complex due to our variety of clients and script-based usage. Please see the neuclease python library for more realistic ways to use DVID at scale and, in particular, for larger image volumes.

More Information

Both high-level and detailed descriptions of DVID and its ecosystem can found here:

DVID is easily extensible by adding custom data types, each of which fulfill a minimal interface (e.g., HTTP request handling), DVID's initial focus is on efficiently handling data essential for Janelia's connectomics research:

  • image and 64-bit label 3d volumes, including multiscale support
  • 2d images in XY, XZ, YZ, and arbitrary orientation
  • multiscale 2d images in XY, XZ, and YZ, similar to quadtrees
  • sparse volumes, corresponding to each unique label in a volume, that can be merged or split
  • point annotations (e.g., synapse elements) that can be quickly accessed via subvolumes or labels
  • label graphs
  • regions of interest represented via a coarse subdivision of space using block indices
  • 2d and 3d image and label data using Google BrainMaps API and other cloud-based services

Each of the above is handled by built-in data types via a Level 2 REST HTTP API implemented by Go language packages within the datatype directory. When dealing with novel data, we typically use the generic keyvalue datatype and store JSON-encoded or binary data until we understand the desired access patterns and API. When we outgrow the keyvalue type's GET, POST, and DELETE operations, we create a custom datatype package with a specialized HTTP API.

DVID allows you to assign different storage systems to data instances within a single repo, which allows great flexibility in optimizing storage for particular use cases. For example, easily compressed label data can be store in fast, expensive SSDs while larger, immutable grayscale image data can be stored in petabyte-scale read-optimized systems like Google Cloud Storage.

DVID is written in Go and supports pluggable storage backends, a REST HTTP API, and command-line access (likely minimized in near future). Some components written in C, e.g., storage engines like Leveldb and fast codecs like lz4, are embedded or linked as a library.

Command-line and HTTP API documentation can be found in help constants within packages or by visiting the /api/help HTTP endpoint on a running DVID server.

Monitoring

Mutations and activity logging can be sent to a Kafka server. We use kafka activity topics to feed Kibana for analyzing DVID performance.

Snapshot of Kibana web page for DVID metrics

Known Clients with DVID Support

Programmatic clients:

  • neuclease, python library from HHMI Janelia
  • intern, python library from Johns Hopkins APL
  • natverse, R library from Jefferis Lab
  • libdvid-cpp, C++ library from HHMI Janelia FlyEM

GUI clients:

Screenshot of an early web app prototype pulling neuron data and 2d slices from 3d grayscale data:

Web app for 3d inspection being served from and sending requests to DVID

Documentation

Overview

DVID is a ***D**istributed, **V**ersioned, **I**mage-oriented **D**ataservice* written to support neural reconstruction, analysis and visualization efforts at [HHMI Janelia Research Center](http://www.janelia.org) using teravoxel-scale image volumes.

The starting point for DVID documentation is the README.md file in the DVID github repo: https://github.com/janelia-flyem/dvid

Directories

Path Synopsis
cmd
Package datastore provides versioning and persisting supported data types using one of the supported storage engines.
Package datastore provides versioning and persisting supported data types using one of the supported storage engines.
Package datatype provides interfaces for arbitrary datatypes supported in DVID.
Package datatype provides interfaces for arbitrary datatypes supported in DVID.
annotation
Package annotation supports point annotation management and queries.
Package annotation supports point annotation management and queries.
common/downres
Package downres provides a system for computing multi-scale 3d arrays given mutations.
Package downres provides a system for computing multi-scale 3d arrays given mutations.
common/labels
Package labels supports label-based data types like labelblk, labelvol, labelsurf, labelsz, etc.
Package labels supports label-based data types like labelblk, labelvol, labelsurf, labelsz, etc.
googlevoxels
Package googlevoxels implements DVID support for multi-scale tiles and volumes in XY, XZ, and YZ orientation using the Google BrainMaps API.
Package googlevoxels implements DVID support for multi-scale tiles and volumes in XY, XZ, and YZ orientation using the Google BrainMaps API.
imageblk
Package imageblk implements DVID support for image blocks of various formats (uint8, uint16, rgba8).
Package imageblk implements DVID support for image blocks of various formats (uint8, uint16, rgba8).
imagetile
Package imagetile implements DVID support for imagetiles in XY, XZ, and YZ orientation.
Package imagetile implements DVID support for imagetiles in XY, XZ, and YZ orientation.
keyvalue
Package keyvalue implements DVID support for data using generic key-value.
Package keyvalue implements DVID support for data using generic key-value.
labelarray
Package labelarray handles both volumes of label data as well as indexing to quickly find and generate sparse volumes of any particular label.
Package labelarray handles both volumes of label data as well as indexing to quickly find and generate sparse volumes of any particular label.
labelblk
Package labelblk supports only label volumes.
Package labelblk supports only label volumes.
labelmap
Package labelmap handles both volumes of label data as well as indexing to quickly find and generate sparse volumes of any particular label.
Package labelmap handles both volumes of label data as well as indexing to quickly find and generate sparse volumes of any particular label.
labelsz
Package labelsz supports ranking labels by # annotations of each type.
Package labelsz supports ranking labels by # annotations of each type.
labelvol
Package labelvol supports label-specific sparse volumes.
Package labelvol supports label-specific sparse volumes.
multichan16
Package multichan16 tailors the voxels data type for 16-bit fluorescent images with multiple channels that can be read from V3D Raw format.
Package multichan16 tailors the voxels data type for 16-bit fluorescent images with multiple channels that can be read from V3D Raw format.
neuronjson
This file supports the keyspace for the keyvalue data type.
This file supports the keyspace for the keyvalue data type.
roi
Package roi implements DVID support for Region-Of-Interest operations.
Package roi implements DVID support for Region-Of-Interest operations.
tarsupervoxels
Package tarsupervoxels implements DVID support for data blobs associated with supervoxels.
Package tarsupervoxels implements DVID support for data blobs associated with supervoxels.
Package dvid provides types, constants, and functions that have no other dependencies and can be used by all packages within DVID.
Package dvid provides types, constants, and functions that have no other dependencies and can be used by all packages within DVID.
Package server configures and launches http/rpc server and storage engines specific to the type of DVID platform: local (e.g., running on MacBook Pro), clustered, or using cloud-based services like Google Cloud.
Package server configures and launches http/rpc server and storage engines specific to the type of DVID platform: local (e.g., running on MacBook Pro), clustered, or using cloud-based services like Google Cloud.
Package storage provides a unified interface to a number of storage engines.
Package storage provides a unified interface to a number of storage engines.
swift
Package swift adds Openstack Swift support to DVID.
Package swift adds Openstack Swift support to DVID.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL