arken

command module
v0.2.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2022 License: Apache-2.0 Imports: 1 Imported by: 0

README

Arken

A Distributed Digital Archive Built for the World's Open Source and Scientific Data.

Go Report Card

Table of Contents

A Bit of Backstory

Many researchers, museums, and archivists are struggling to host and protect a vast amount of important public data. On the other hand, there are many of us developers, tinkerers, and general computer enthusiasts who have extra storage space on our home servers.

The goal of Arken is to build an autonomous system for organizing, balancing, and distributing this data among users who can donate their extra space.

+-------------GitHub/GitLab/GitTea-----------------+
|    +----------+   +----------+    +----------+   |
|    | manifest |   | manifest |    | manifest |   |
|    +-----|----+   +-----|\---+    +-----|----+   |
+----------|--------------|-\-------------|--------+
           |              |  \           /\
           |              |   \         /  \
           |              |    \       /    \
           |              |     \     /      \
           |              |      \   /        \
           |              |       \ /          \
           v              v        v            v
        [Arken]       [Arken]<-->[Arken]<--->[Arken]

What is Arken?

Arken is a management engine that runs on top of the IPFS (Interplanetary File System) protocol. Each instance of Arken calculates which important files are hosted by the fewest number of other nodes on the network and should thus be locally backed up to reduce the risk of data loss. Arken also knows how much space it's using on your system and will respect limits you set by locally deleting data that is backed up by more than 10% of the cluster.

What's a Manifest?

Arken uses Manifests to transparently keep track of which files are important to the network and should be monitored and backed up if needed. Unlike a Pinset in an IPFS cluster, a manifest is simply a plain text git repository made of up file identifiers. Additionally, Manifests are easy to audit so you can actually know what data you're helping preserve. manifest repositories can contain an arbitrary number of directories used to organize manifest files as long as they also contain a config TOML file. This config file provides a replication factor that is the number of nodes in the total network that should be storing a file at any given time.

While Manifests tell Arken which files should be stored on the subscribed nodes, they don't contain any of the data to be backed up onto the network. To import data to a manifest, users add files to IPFS and record the File Identifiers (IPFS CID) to a manifest file. From there, nodes will begin pulling data directly from the user to the cluster.

Manifest Security

Since Manifests are openly available through Git repositories, they can be easily audited but can only be changed by users who have access to those Git repositories or through pull requests.

Rebalancing Data Across the Community

Arken instances will periodically query IPFS for the number of other nodes hosting a particular file and attempt to replace one well backed up file on the system with files below the optimal threshold.

Getting Started

Tutorials:

Getting Started with Arken on a Raspberry Pi

To start running a node, you can download Arken as a Golang program or as a Docker container. It's recommended to run Arken as a Docker container for simplicity and ease of updating.

Docker:
docker run -d --name arken \
 -v STORAGE:/data/storage \
 -v DATABASE:/data/database \
 -v REPOSITORIES:/data/repositories \
 -v CONFIG:/data/config \
 -e ARKEN_GENERAL_POOLSIZE=2TB \
 -e ARKEN_DB_PATH=/data/database/keys.db \
 -e ARKEN_SOURCES_CONFIG=/data/config/keysets.yaml \
 -e ARKEN_SOURCES_REPOSITORIES=/data/repositories \
 -e ARKEN_SOURCES_STORAGE=/data/storage \
 -p 4001:4001 \
 --restart=always ghcr.io/arken/arken
Go Package:
go get github.com/arken/arken
go run arken
What's the process as someone who wants to back up important data?

Let's say that you are a scholar who wants to preserve some important works of humanity, or a researcher who wants to back up the DNA of an extinct animal/plant. How would you go about adding your data to the distributed file system? First, you would download & run the Arken Import Tool. Using the Arken Import tool you can create a manifest file of the IPFS identifiers for your data. At this point you can either upload the manifest to your own Git repository (this is best if you want to run your own pool of workers) or make an application to put your data in the Core manifest repository. The Core manifest repository consists of extremely important data to preserve and is what the community donating their extra disk space uses by default.

What's the process as someone donating their extra storage space?

Old computers or servers with some empty storage space make excellent Arken nodes. Check out our guide for configuring a Raspberry Pi with Docker and External Storage Arken here. After installing the Arken program, you can configure it either through environment variables or the Arken configuration file located at ~/.arken/. You can check out an example of an Arken Docker-Compose file here. The core manifest will be available by default, but because Manifests are just Git repositories, you can add and use any manifest you'd like. For example, you can donate space to the core community pool but also sync a custom manifest of some vacation pictures amongst yours and a few friends' machines.

After the configuration, that's it! Arken will continue to run in the background, determining files with the fewest number of other nodes hosting them and rebalancing as necessary.

License

Copyright 2020-2022 Alec Scott & Arken Team team@arken.io

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL