l7g

module
v0.0.0-...-7a0a068 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: AGPL-3.0

README

This repository has been retired, it has been superceded by https://github.com/arvados/lightning

Old README follows

l7g

l7g is the main codebase for the Lightning system being developed by Curoverse Research.

The repository contains documents, source code and pipelines for the various aspects of Lightning.

Code here should be considered "research grade" and is a work in progress.

Overview

Lightning is a system based on "genomic tiling". Genomes are split into small segments, on average roughly 250 base pairs (bp) long, and these small segments are called "tiles".

For a given population of genomic data, the genomic sequences are tiled with tiles that have redundant sequences de-duplicated. Coalescing all unique tiles creates a "Lightning tile library", where a source sequence from the population pool can be stored by using position references into the lightning tile library.

A compact representation of a genome can be created by storing arrays of indexes into the Lightning tile libary referencing their underlying sequence.

A representation of the compact genome representation we've developed is called "compact genome format" (CGF) that can represent a whole genome in ~30Mb, depending on the amount of low quality data in the original genome sample.

Directory Structure

cwl-version/

Common Workflow Language (CWL) pipelines for creating Lightning data.

doc/

Lightning documentation

go/

go (golang) programs used by Lightning.

img/

Image directory for pictures.

prototype/

A directory for the Lightning system prototype.

proxy/

Authentication for Lightning prototype.

sandbox/

Subdirectory for experimental code.

tools/

Source and tools used by Lightning.

Directories

Path Synopsis
go
experimental/tile-library-architecture/createlibrary
Createlibrary is a command line function that parses directories of FastJ files into a tile library and writes files to a specified directory.
Createlibrary is a command line function that parses directories of FastJ files into a tile library and writes files to a specified directory.
experimental/tile-library-architecture/genome
Package genome is a package for representing the genome, relative to a tile library, with Go data structures.
Package genome is a package for representing the genome, relative to a tile library, with Go data structures.
experimental/tile-library-architecture/genomestonumpy
Program genomestonumpy takes a directory to write to, a directory for a source library, a path number, and any number of directories for genomes.
Program genomestonumpy takes a directory to write to, a directory for a source library, a path number, and any number of directories for genomes.
experimental/tile-library-architecture/liftovergenome
Program liftovergenome takes a genome, a source library, and a destination library, along with a destination filepath and a boolean.
Program liftovergenome takes a genome, a source library, and a destination library, along with a destination filepath and a boolean.
experimental/tile-library-architecture/mergelibraries
Program mergelibraries merges a set of given directories of SGLFv2 files together into one library, and then writes SGLF or SGLFv2 files for the new library to disk.
Program mergelibraries merges a set of given directories of SGLFv2 files together into one library, and then writes SGLF or SGLFv2 files for the new library to disk.
experimental/tile-library-architecture/structures
Package structures is a basic package to hold basic structures, methods, and functions for tile libraries and genomes.
Package structures is a basic package to hold basic structures, methods, and functions for tile libraries and genomes.
experimental/tile-library-architecture/tile-library
Package tilelibrary is a package for implementing tile libraries in Go.
Package tilelibrary is a package for implementing tile libraries in Go.
memz/rollsum
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.
pasta
Package pasta provides primitives for manipulating PASTA streams.
Package pasta provides primitives for manipulating PASTA streams.
sandbox
tools
cglf-tools/rollsum
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.
Package rollsum implements rolling checksums similar to apenwarr's bup, which is similar to librsync.
cglf-tools/twobit
Package twobit implements the 2bit compact randomly-accessible file format for storing DNA sequence data.
Package twobit implements the 2bit compact randomly-accessible file format for storing DNA sequence data.
l7g

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL