strobemers

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2021 License: MIT Imports: 4 Imported by: 0

README

Strobemers in Go

GoDoc Go Report Card

Introduction

This is a Go implementation of the strobemers (minstrobes and randstrobes), with some differences.

The implementation of Randstrobes has a not-bad performance (2-3X slower) compared to regular k-mer, while it's 10-20X slower than ntHash. Besides, Randstrobes is only slightly slower than MinStrobes (see benchmark).

Attention

The current implementation only computes strobemers of the positive strand, because the strobes are asymmetrical and the location matters.

Installation

go get github.com/shenwei356/strobemers

Quick Start

We followed the code style of ntHash.

n := 2
l := 3
w_min := 3
w_max := 5
rs, err := strobemers.NewRandStrobes(seq, n, l, w_min, w_max)
checkError(err)

var hash uint64
var ok bool
var i int  // 0-based index
var positions []int // 0-based indexes of all strobes

rs.SetWindowShrink(true)
for {
    hash, ok = rs.Next()
    if !ok {
        break
    }

    i = rs.Index()
    positions = rs.Indexes()
}

Differences

Here are some differences compared to the original implementation, see discussion: #1, #2.

item orginal this comment
window range w_min < w_max w_min <= w_max allow a fixed position
shrinking window all w_min and w_max optional shrinking last w_max see figures below
number of strobemers len(seq)-n*l+1 len(seq)-n*l+1-(n-1)*l window shrinked
number of strobemers len(seq)-n*l+1-(n-1)*(l+w_min-1) window not shrinked
choice of min hash (h(m)+h(mj))%q (h(m)+h(mj))&q & is faster than %
final hash value (n=2) h(m1)-h(m2) h(m1)/2+h(m2)/3 keep asymmetry and avoid uint64 overflow
final hash value (n=3) h(m1)-h(m2)+2*h(m3) h(m1)/3+h(m2)/4+h(m3)/5 ~

Benchmark

method time relative_time
ntHashKmers(30) 8590 1
Kmers(30) 55579 6
MinStrobes(2,15,20,30) 104520 12
MinStrobes(3,10,20,30) 111662 13
RandStrobes(2,15,20,30) 93436 11
RandStrobes(3,10,20,30) 152461 18
$ go test . -bench=Benchmark* -benchmem \
    | grep Bench \
    | perl -pe 's/\s\s+/\t/g' \
    | csvtk cut -Ht -f 1,3-5 \
    | csvtk add-header -t -n test,time,memory,allocs \
    | csvtk pretty -t -r

                                 test           time       memory        allocs
-------------------------------------   ------------   ----------   -----------
           BenchmarkNTHash/1.00_KB-16     8590 ns/op      48 B/op   1 allocs/op
            BenchmarkKmers/1.00_KB-16    55579 ns/op      32 B/op   1 allocs/op
 BenchmarkMinStrobesOrder2/1.00_KB-16   104520 ns/op   25064 B/op   7 allocs/op
 BenchmarkMinStrobesOrder3/1.00_KB-16   111662 ns/op   25064 B/op   7 allocs/op
BenchmarkRandStrobesOrder2/1.00_KB-16    93436 ns/op    8432 B/op   3 allocs/op
BenchmarkRandStrobesOrder3/1.00_KB-16   152461 ns/op    8432 B/op   3 allocs/op

Similar Projects

References

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrIncompleteHashValues = fmt.Errorf("strobemers: incomplete hash values")

ErrIncompleteHashValues means incomplete hash values

View Source
var ErrInvalidOrder = fmt.Errorf("strobemers: strobemer order too small")

ErrInvalidOrder means

View Source
var ErrInvalidSequence = fmt.Errorf("strobemers: invalid DNA sequence")

ErrInvalidSequence means the given sequence is invalid

View Source
var ErrInvalidWindowOffsets = fmt.Errorf("strobemers: window offset should be > 0, and wMin <= wMax")

ErrInvalidWindowOffsets means invalid window offsets

View Source
var ErrOrderNotSupported = fmt.Errorf("strobemers: strobemer order not supported")

ErrOrderNotSupported means a big strobemer order is not supported.

View Source
var ErrPrimeNumberTooSmall = fmt.Errorf("strobemers: the primer number is too small")
View Source
var ErrSequenceTooShort = fmt.Errorf("strobemers: sequence too short")

ErrSequenceTooShort means the sequence is too short

View Source
var ErrStrobeLengthTooSmall = fmt.Errorf("strobemers: strobe length too small")

ErrStrobeLengthTooSmall means the strobe length is too small

Functions

This section is empty.

Types

type IdxValue

type IdxValue struct {
	Idx int    // index
	Val uint64 // hash
}

type MinStrobes

type MinStrobes struct {
	// contains filtered or unexported fields
}

MinStrobes is a iterator for MinStrobes

func NewMinStrobes

func NewMinStrobes(seq *[]byte, n int, l int, wMin int, wMax int) (*MinStrobes, error)

NewMinStrobes creates a MinStrobes iterator. Parametems:

n    - strobemer order
l    - strobes length
wMin - minimum window offset, wMin > 0
wMax - maximum window offset, wMin <= wMax.

func (*MinStrobes) Index

func (ms *MinStrobes) Index() int

Index returns the current index (0-based) of strobemers

func (*MinStrobes) Indexes

func (ms *MinStrobes) Indexes() []int

Indexes returns current indexes (0-based) of strobes

func (*MinStrobes) Next

func (ms *MinStrobes) Next() (uint64, bool)

Next returns the next hash value of randstrobe

func (*MinStrobes) SetPrime

func (ms *MinStrobes) SetPrime(q uint64)

SetPrime sets the prime number (q) in minimizing h(m)+h(mj) mod q. In this package, we use (h(m)+h(mj)) & q, where q = roundup(q) - 1. The value should not be too small, at least 256.

func (*MinStrobes) SetWindowShrink

func (ms *MinStrobes) SetWindowShrink(shrink bool)

SetWindowShrink decides whether shrink the search window at positions near the end of the sequence. Default is true.

type RandStrobes

type RandStrobes struct {
	// contains filtered or unexported fields
}

RandStrobes is a iterator for randstrobes

func NewRandStrobes

func NewRandStrobes(seq *[]byte, n int, l int, wMin int, wMax int) (*RandStrobes, error)

NewRandStrobes creates a RandStrobes iterator. Parameters:

n    - strobemer order
l    - strobes length
wMin - minimum window offset, wMin > 0
wMax - maximum window offset, wMin <= wMax.

func (*RandStrobes) Index

func (rs *RandStrobes) Index() int

Index returns the current index (0-based) of strobemers

func (*RandStrobes) Indexes

func (rs *RandStrobes) Indexes() []int

Indexes returns current indexes (0-based) of strobes

func (*RandStrobes) Next

func (rs *RandStrobes) Next() (uint64, bool)

Next returns the next hash value of randstrobe

func (*RandStrobes) SetPrime

func (rs *RandStrobes) SetPrime(q uint64)

SetPrime sets the prime number (q) in minimizing h(m)+h(mj) mod q. In this package, we use (h(m)+h(mj)) & q, where q = roundup(q) - 1. The value should not be too small, at least 256.

func (*RandStrobes) SetWindowShrink

func (rs *RandStrobes) SetWindowShrink(shrink bool)

SetWindowShrink decides whether shrink the search window at positions near the end of the sequence. Default is true.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL