base: github.com/grailbio/base/simd Index | Files

package simd

import "github.com/grailbio/base/simd"

Package simd provides access to SIMD-based implementations of several common operations on byte arrays which the compiler cannot be trusted to autovectorize within the next several years.

The backend assumes SSE4.2 is available: init() checks for SSE4.2 support, and panics when it isn't there. The interface is designed to allow the backend to also autodetect e.g. AVX2/AVX-512 and opportunistically use those instructions, without any changes to properly written higher-level code. Implementation of the AVX2 part of this is in progress.

The central constraint driving this package's design is the standard Go compiler's inability to inline short assembly functions; see

https://groups.google.com/forum/#!topic/golang-nuts/yVOfeHYCIT4
https://github.com/golang/go/issues/4947#issuecomment-66075571

for more details. As a consequence, it is critical to push looping logic down to the assembly level as well, otherwise function call overhead is overwhelming. Conversely, due to the much higher development burden of this type of code, we don't go beyond that point; this package is essentially a collection of for loops.

Two classes of functions are exported:

- Functions with 'Unsafe' in their names are very performant, but are memory-unsafe, do not validate documented preconditions, and may have the unusual property of reading/writing to a few bytes *past* the end of the given slices. The MakeUnsafe() function and its relatives allocate byte-slices with sufficient extra capacity for all Unsafe functions with the latter property to work properly.

- Their safe analogues work properly on ordinary slices, and often panic when documented preconditions are not met. When a precondition is not explicitly checked (due to computational cost), safe functions may return garbage values when the condition is not met, but they are memory-safe: they will not corrupt unrelated memory or perform out-of-bounds read operations.

Index

Package Files

add_amd64.go and_amd64.go cmp_amd64.go count_amd64.go doc.go invmask_amd64.go multibyte_amd64.go or_amd64.go simd_amd64.go xor_amd64.go

Constants

const BitsPerWord = BytesPerWord * 8

BitsPerWord is the number of bits in a machine word.

const BytesPerWord = 8

BytesPerWord is the number of bytes in a machine word. We don't use unsafe.Sizeof(uintptr(1)) since there are advantages to having this as an untyped constant, and there's essentially no drawback since this is an _amd64-specific file.

const Log2BytesPerWord = uint(3)

Log2BytesPerWord is log2(BytesPerWord). This is relevant for manual bit-shifting when we know that's a safe way to divide and the compiler does not (e.g. dividend is of signed int type).

func Accumulate8 Uses

func Accumulate8(src []byte) int

Accumulate8 returns the sum of the (unsigned) bytes in src[].

func Accumulate8Greater Uses

func Accumulate8Greater(src []byte, val byte) int

Accumulate8Greater returns the sum of all bytes in src[] greater than the given value.

func AddConst8 Uses

func AddConst8(dst, src []byte, val byte)

AddConst8 sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

func AddConst8Inplace Uses

func AddConst8Inplace(main []byte, val byte)

AddConst8Inplace adds the given constant to every byte of main[], with unsigned overflow.

func AddConst8Unsafe Uses

func AddConst8Unsafe(dst, src []byte, val byte)

AddConst8Unsafe sets dst[pos] := src[pos] + val for every byte in src (with the usual unsigned overflow).

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AddConst8UnsafeInplace Uses

func AddConst8UnsafeInplace(main []byte, val byte)

AddConst8UnsafeInplace adds the given constant to every byte of main[], with unsigned overflow.

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

func And Uses

func And(dst, src1, src2 []byte)

And sets dst[pos] := src1[pos] & src2[pos] for every position in dst. It panics if slice lengths don't match.

func AndConst8 Uses

func AndConst8(dst, src []byte, val byte)

AndConst8 sets dst[pos] := src[pos] & val for every position in dst. It panics if slice lengths don't match.

func AndConst8Inplace Uses

func AndConst8Inplace(main []byte, val byte)

AndConst8Inplace sets main[pos] := main[pos] & val for every position in main[].

func AndConst8Unsafe Uses

func AndConst8Unsafe(dst, src []byte, val byte)

AndConst8Unsafe sets dst[pos] := src[pos] & val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AndConst8UnsafeInplace Uses

func AndConst8UnsafeInplace(main []byte, val byte)

AndConst8UnsafeInplace sets main[pos] := main[pos] & val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func AndInplace Uses

func AndInplace(main, arg []byte)

AndInplace sets main[pos] := arg[pos] & main[pos] for every position in main[]. It panics if slice lengths don't match.

func AndUnsafe Uses

func AndUnsafe(dst, src1, src2 []byte)

AndUnsafe sets dst[pos] := src1[pos] & src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func AndUnsafeInplace Uses

func AndUnsafeInplace(main, arg []byte)

AndUnsafeInplace sets main[pos] := main[pos] & arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func BitFromEveryByte Uses

func BitFromEveryByte(dst, src []byte, bitIdx int)

BitFromEveryByte fills dst[] with a bitarray containing every 8th bit from src[], starting with bitIdx, where bitIdx is in [0,7]. If len(src) is not divisible by 8, extra bits in the last filled byte of dst are set to zero.

For example, if src[] is

0x1f 0x33 0x0d 0x00 0x51 0xcc 0x34 0x59 0x44

and bitIdx is 2, bit 2 from every byte is

1    0    1    0    0    1    1    0    1

so dst[] is filled with

0x65 0x01.

- It panics if len(dst) < (len(src) + 7) / 8, or bitIdx isn't in [0,7]. - If dst is larger than necessary, the extra bytes are not changed.

func BytesPerVec Uses

func BytesPerVec() int

BytesPerVec is an accessor for the bytesPerVec package variable.

func Count2Bytes Uses

func Count2Bytes(src []byte, val1, val2 byte) int

Count2Bytes counts the number of bytes in src[] which are equal to either val1 or val2. (bytes.Count() should be good enough for a single byte.)

func Count3Bytes Uses

func Count3Bytes(src []byte, val1, val2, val3 byte) int

Count3Bytes counts the number of bytes in src[] which are equal to val1, val2, or val3.

func CountNibblesInSet Uses

func CountNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

CountNibblesInSet counts the number of nibbles in src[] which are in the given set. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the table. It may return a garbage result on invalid input. (However, it won't corrupt memory.)

func CountNibblesInTwoSets Uses

func CountNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

CountNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

func CountUnpackedNibblesInSet Uses

func CountUnpackedNibblesInSet(src []byte, tablePtr *NibbleLookupTable) int

CountUnpackedNibblesInSet counts the number of bytes in src[] which are in the given set, assuming all bytes are <16. The set must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the table. It may crash or return a garbage result on invalid input. (However, it won't corrupt memory.)

func CountUnpackedNibblesInTwoSets Uses

func CountUnpackedNibblesInTwoSets(src []byte, table1Ptr, table2Ptr *NibbleLookupTable) (int, int)

CountUnpackedNibblesInTwoSets counts the number of bytes in src[] which are in the given two sets, assuming all bytes are <16. The sets must be represented as table[x] == 1 when value x is in the set, and table[x] == 0 when x isn't.

WARNING: This function does not validate the tables. It may crash or return garbage results on invalid input. (However, it won't corrupt memory.)

func DivUpPow2 Uses

func DivUpPow2(dividend, divisor int, log2Divisor uint) int

DivUpPow2 efficiently divides a number by a power-of-2 divisor. (This works for negative dividends since the language specifies arithmetic right-shifts of signed numbers. I'm pretty sure this doesn't have a performance penalty.)

func FirstGreater8 Uses

func FirstGreater8(arg []byte, val byte, startPos int) int

FirstGreater8 scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

func FirstGreater8Unsafe Uses

func FirstGreater8Unsafe(arg []byte, val byte, startPos int) int

FirstGreater8Unsafe scans arg[startPos:] for the first value larger than the given constant, returning its position if one is found, or len(arg) if all bytes are <= (or startPos >= len).

This should only be used when greater values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. startPos is nonnegative.

2. cap(arg) >= RoundUpPow2(len, bytesPerVec).

func FirstLeq8 Uses

func FirstLeq8(arg []byte, val byte, startPos int) int

FirstLeq8 scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

func FirstLeq8Unsafe Uses

func FirstLeq8Unsafe(arg []byte, val byte, startPos int) int

FirstLeq8Unsafe scans arg[startPos:] for the first value <= the given constant, returning its position if one is found, or len(arg) if all bytes are greater (or startPos >= len).

This should only be used when <= values are usually present at ~5% or lower frequency. Above that, use a simple for loop.

See warning for FirstGreater8Unsafe.

func FirstUnequal8 Uses

func FirstUnequal8(arg1, arg2 []byte, startPos int) int

FirstUnequal8 scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). It panics if the lengths are not identical, or startPos is negative.

This is essentially an extension of bytes.Compare().

func FirstUnequal8Unsafe Uses

func FirstUnequal8Unsafe(arg1, arg2 []byte, startPos int) int

FirstUnequal8Unsafe scans arg1[startPos:] and arg2[startPos:] for the first mismatching byte, returning its position if one is found, or the common length if all bytes match (or startPos >= len). This has essentially the same speed as bytes.Compare().

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. The second assumption is always satisfied when the last potentially-size-increasing operation on arg1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for arg2[].

1. len(arg1) == len(arg2).

2. Capacities are at least RoundUpPow2(len, bytesPerVec).

func IndexU16 Uses

func IndexU16(main []uint16, val uint16) int

IndexU16 returns the index of the first instance of val in main, or -1 if val is not present in main.

func Interleave8 Uses

func Interleave8(dst, even, odd []byte)

Interleave8 sets the bytes in dst[] as follows:

if pos is even, dst[pos] := even[pos/2]
if pos is odd, dst[pos] := odd[pos/2]

It panics if ((len(dst) + 1) / 2) != len(even), or (len(dst) / 2) != len(odd).

func Interleave8Unsafe Uses

func Interleave8Unsafe(dst, even, odd []byte)

Interleave8Unsafe sets the bytes in dst[] as follows:

if pos is even, dst[pos] := even[pos/2]
if pos is odd, dst[pos] := odd[pos/2]

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on dst[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for even[] and odd[].

1. len(even) = (len(dst) + 1) / 2, and len(odd) = len(dst) / 2.

2. cap(dst) >= RoundUpPow2(len(dst) + 1, bytesPerVec), cap(even) >= RoundUpPow2(len(even) + 1, bytesPerVec), and cap(odd) >= RoundUpPow2(len(odd) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func Invmask Uses

func Invmask(dst, src1, src2 []byte)

Invmask sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst. It panics if slice lengths don't match.

func InvmaskConst8 Uses

func InvmaskConst8(dst, src []byte, val byte)

InvmaskConst8 sets dst[pos] := src[pos] &^ val for every position in dst. It panics if slice lengths don't match.

func InvmaskConst8Inplace Uses

func InvmaskConst8Inplace(main []byte, val byte)

InvmaskConst8Inplace sets main[pos] := main[pos] &^ val for every position in main[].

func InvmaskConst8Unsafe Uses

func InvmaskConst8Unsafe(dst, src []byte, val byte)

InvmaskConst8Unsafe sets dst[pos] := src[pos] &^ val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func InvmaskConst8UnsafeInplace Uses

func InvmaskConst8UnsafeInplace(main []byte, val byte)

InvmaskConst8UnsafeInplace sets main[pos] := main[pos] &^ val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func InvmaskInplace Uses

func InvmaskInplace(main, arg []byte)

InvmaskInplace sets main[pos] := arg[pos] &^ main[pos] for every position in main[]. It panics if slice lengths don't match.

func InvmaskUnsafe Uses

func InvmaskUnsafe(dst, src1, src2 []byte)

InvmaskUnsafe sets dst[pos] := src1[pos] &^ src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func InvmaskUnsafeInplace Uses

func InvmaskUnsafeInplace(main, arg []byte)

InvmaskUnsafeInplace sets main[pos] := main[pos] &^ arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func MakeUnsafe Uses

func MakeUnsafe(len int) []byte

MakeUnsafe returns a byte slice of the given length which is guaranteed to have enough capacity for all Unsafe functions in this package to work. (It is not itself an unsafe function: allocated memory is zero-initialized.) Note that Unsafe functions occasionally have other caveats: e.g. PopcntUnsafe also requires relevant bytes past the end of the slice to be zeroed out.

func MaskThenCountByte Uses

func MaskThenCountByte(src []byte, mask, val byte) int

MaskThenCountByte counts the number of bytes in src[] satisfying

src[pos] & mask == val.

func Memset16Raw Uses

func Memset16Raw(dst, valPtr unsafe.Pointer, nElem int)

Memset16Raw assumes dst points to an array of nElem 2-byte elements, and valPtr points to a single 2-byte element. It fills dst with copies of *valPtr.

func Memset32Raw Uses

func Memset32Raw(dst, valPtr unsafe.Pointer, nElem int)

Memset32Raw assumes dst points to an array of nElem 4-byte elements, and valPtr points to a single 4-byte element. It fills dst with copies of *valPtr.

func Memset8 Uses

func Memset8(dst []byte, val byte)

Memset8 sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case.)

func Memset8Unsafe Uses

func Memset8Unsafe(dst []byte, val byte)

Memset8Unsafe sets all values of dst[] to the given byte. (This is intended for val != 0. It is better to use a range-for loop for val == 0 since the compiler has a hardcoded optimization for that case; see https://github.com/golang/go/issues/5373 .)

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(dst), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of dst[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on dst[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

func Or Uses

func Or(dst, src1, src2 []byte)

Or sets dst[pos] := src1[pos] | src2[pos] for every position in dst. It panics if slice lengths don't match.

func OrConst8 Uses

func OrConst8(dst, src []byte, val byte)

OrConst8 sets dst[pos] := src[pos] | val for every position in dst. It panics if slice lengths don't match.

func OrConst8Inplace Uses

func OrConst8Inplace(main []byte, val byte)

OrConst8Inplace sets main[pos] := main[pos] | val for every position in main[].

func OrConst8Unsafe Uses

func OrConst8Unsafe(dst, src []byte, val byte)

OrConst8Unsafe sets dst[pos] := src[pos] | val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func OrConst8UnsafeInplace Uses

func OrConst8UnsafeInplace(main []byte, val byte)

OrConst8UnsafeInplace sets main[pos] := main[pos] | val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func OrInplace Uses

func OrInplace(main, arg []byte)

OrInplace sets main[pos] := arg[pos] | main[pos] for every position in main[]. It panics if slice lengths don't match.

func OrUnsafe Uses

func OrUnsafe(dst, src1, src2 []byte)

OrUnsafe sets dst[pos] := src1[pos] | src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func OrUnsafeInplace Uses

func OrUnsafeInplace(main, arg []byte)

OrUnsafeInplace sets main[pos] := main[pos] | arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

func PackedNibbleLookup Uses

func PackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

PackedNibbleLookup sets the bytes in dst[] as follows:

if pos is even, dst[pos] := table[src[pos / 2] & 15]
if pos is odd, dst[pos] := table[src[pos / 2] >> 4]

It panics if len(src) != (len(dst) + 1) / 2.

Nothing bad happens if len(dst) is odd and some high bits in the last src[] byte are set, though it's generally good practice to ensure that case doesn't come up.

func PackedNibbleLookupUnsafe Uses

func PackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

PackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

if pos is even, dst[pos] := table[src[pos / 2] & 15]
if pos is odd, dst[pos] := table[src[pos / 2] >> 4]

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-#3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) == (len(dst) + 1) / 2.

2. Capacity of src is at least RoundUpPow2(len(src) + 1, bytesPerVec), and the same is true for dst.

3. The caller does not care if a few bytes past the end of dst[] are changed.

func Popcnt Uses

func Popcnt(bytes []byte) int

Popcnt returns the number of set bits in the given []byte.

Some effort has been made to make this run acceptably fast on relatively short arrays, since I expect knowing how to do so to be helpful when working with hundreds of millions of .bam reads with ~75 bytes of base data and ~150 bytes of quality data. Interestingly, moving the leading-byte handling code to assembly didn't make a difference.

Some single-threaded benchmark results calling Popcnt 99999999 times on a 14-byte unaligned array:

C implementation: 0.219-0.232s
This code: 0.606-0.620s
C implementation using memcpy for trailing bytes: 0.964-0.983s

So Go's extra looping and function call overhead can almost triple runtime in the short-array limit, but that's actually not as bad as the 4.5x overhead of trusting memcpy to handle trailing bytes.

func PopcntUnsafe Uses

func PopcntUnsafe(bytes []byte) int

PopcntUnsafe returns the number of set bits in the given []byte, assuming that any trailing bytes up to the next multiple of BytesPerWord are zeroed out.

func RemakeUnsafe Uses

func RemakeUnsafe(bufptr *[]byte, len int)

RemakeUnsafe reuses the given buffer if it has sufficient capacity; otherwise it does the same thing as MakeUnsafe. It does NOT preserve existing contents of buf[]; use ResizeUnsafe() for that.

func RepeatI16 Uses

func RepeatI16(dst []int16, val int16)

RepeatI16 fills dst[] with the given int16.

func RepeatU16 Uses

func RepeatU16(dst []uint16, val uint16)

RepeatU16 fills dst[] with the given uint16.

func ResizeUnsafe Uses

func ResizeUnsafe(bufptr *[]byte, len int)

ResizeUnsafe changes the length of buf and ensures it has enough extra capacity to be passed to this package's Unsafe functions. Existing buf[] contents are preserved (with possible truncation), though when length is increased, new bytes might not be zero-initialized.

func Reverse16InplaceRaw Uses

func Reverse16InplaceRaw(main unsafe.Pointer, nElem int)

Reverse16InplaceRaw assumes main points to an array of ct 2-byte elements, and reverses it in-place.

func Reverse16Raw Uses

func Reverse16Raw(dst, src unsafe.Pointer, nElem int)

Reverse16Raw assumes dst and src both point to arrays of ct 2-byte elements, and sets dst[pos] := src[ct - 1 - pos] for each position.

func Reverse8 Uses

func Reverse8(dst, src []byte)

Reverse8 sets dst[pos] := src[len(src) - 1 - pos] for every position in src. It panics if len(src) != len(dst).

func Reverse8Inplace Uses

func Reverse8Inplace(main []byte)

Reverse8Inplace reverses the bytes in main[]. (There is no unsafe version of this function.)

func Reverse8Unsafe Uses

func Reverse8Unsafe(dst, src []byte)

Reverse8Unsafe sets dst[pos] := src[len(src) - 1 - pos] for every position in src.

WARNING: This does not verify len(dst) == len(src); call the safe version of this function if you want that.

func ReverseI16 Uses

func ReverseI16(dst, src []int16)

ReverseI16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

func ReverseI16Inplace Uses

func ReverseI16Inplace(main []int16)

ReverseI16Inplace reverses a []int16 in-place.

func ReverseU16 Uses

func ReverseU16(dst, src []uint16)

ReverseU16 sets dst[len(src) - 1 - pos] := src[pos] for each position in src. It panics if len(src) != len(dst).

func ReverseU16Inplace Uses

func ReverseU16Inplace(main []uint16)

ReverseU16Inplace reverses a []uint16 in-place.

func RoundUpPow2 Uses

func RoundUpPow2(val, alignment int) int

RoundUpPow2 returns val rounded up to a multiple of alignment, assuming alignment is a power of 2.

func SubtractFromConst8 Uses

func SubtractFromConst8(dst, src []byte, val byte)

SubtractFromConst8 sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow). It panics if len(src) != len(dst).

func SubtractFromConst8Inplace Uses

func SubtractFromConst8Inplace(main []byte, val byte)

SubtractFromConst8Inplace subtracts every byte of main[] from the given constant, with unsigned underflow.

func SubtractFromConst8Unsafe Uses

func SubtractFromConst8Unsafe(dst, src []byte, val byte)

SubtractFromConst8Unsafe sets dst[pos] := val - src[pos] for every byte in src (with the usual unsigned overflow).

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe() or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func SubtractFromConst8UnsafeInplace Uses

func SubtractFromConst8UnsafeInplace(main []byte, val byte)

SubtractFromConst8UnsafeInplace subtracts every byte of main[] from the given constant, with unsigned underflow.

WARNING: This is a function designed to be used in inner loops, which assumes without checking that capacity is at least RoundUpPow2(len(main), bytesPerVec). It also assumes that the caller does not care if a few bytes past the end of main[] are changed. Use the safe version of this function if any of these properties are problematic. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

func UnpackedNibbleLookup Uses

func UnpackedNibbleLookup(dst, src []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookup sets the bytes in dst[] as follows:

if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
otherwise, set dst[pos] := 0

It panics if len(src) != len(dst).

func UnpackedNibbleLookupInplace Uses

func UnpackedNibbleLookupInplace(main []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupInplace replaces the bytes in main[] as follows:

if value < 128, set to table[value & 15]
otherwise, set to 0

func UnpackedNibbleLookupS Uses

func UnpackedNibbleLookupS(dst []byte, src string, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupS is a variant of UnpackedNibbleLookup() that takes string src.

func UnpackedNibbleLookupUnsafe Uses

func UnpackedNibbleLookupUnsafe(dst, src []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupUnsafe sets the bytes in dst[] as follows:

if src[pos] < 128, set dst[pos] := table[src[pos] & 15]
otherwise, set dst[pos] := 0

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) are equal.

2. Capacities are at least RoundUpPow2(len(src) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func UnpackedNibbleLookupUnsafeInplace Uses

func UnpackedNibbleLookupUnsafeInplace(main []byte, tablePtr *NibbleLookupTable)

UnpackedNibbleLookupUnsafeInplace replaces the bytes in main[] as follows:

if value < 128, set to table[value & 15]
otherwise, set to 0

WARNING: This is a function designed to be used in inner loops, which makes assumptions about capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Make,Remake}Unsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) must be at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func XcapUnsafe Uses

func XcapUnsafe(bufptr *[]byte)

XcapUnsafe is shorthand for ResizeUnsafe's most common use case (no length change, just want to ensure sufficient capacity).

func Xor Uses

func Xor(dst, src1, src2 []byte)

Xor sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst. It panics if slice lengths don't match.

func XorConst8 Uses

func XorConst8(dst, src []byte, val byte)

XorConst8 sets dst[pos] := src[pos] ^ val for every position in dst. It panics if slice lengths don't match.

func XorConst8Inplace Uses

func XorConst8Inplace(main []byte, val byte)

XorConst8Inplace sets main[pos] := main[pos] ^ val for every position in main[].

func XorConst8Unsafe Uses

func XorConst8Unsafe(dst, src []byte, val byte)

XorConst8Unsafe sets dst[pos] := src[pos] ^ val for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for dst[].

1. len(src) and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func XorConst8UnsafeInplace Uses

func XorConst8UnsafeInplace(main []byte, val byte)

XorConst8UnsafeInplace sets main[pos] := main[pos] ^ val for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. These assumptions are always satisfied when the last potentially-size-increasing operation on main[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe().

1. cap(main) is at least RoundUpPow2(len(main) + 1, bytesPerVec).

2. The caller does not care if a few bytes past the end of main[] are changed.

func XorInplace Uses

func XorInplace(main, arg []byte)

XorInplace sets main[pos] := arg[pos] ^ main[pos] for every position in main[]. It panics if slice lengths don't match.

func XorUnsafe Uses

func XorUnsafe(dst, src1, src2 []byte)

XorUnsafe sets dst[pos] := src1[pos] ^ src2[pos] for every position in dst.

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on src1[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for src2[] and dst[].

1. len(src1), len(src2), and len(dst) must be equal.

2. Capacities are at least RoundUpPow2(len(dst) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of dst[] are changed.

func XorUnsafeInplace Uses

func XorUnsafeInplace(main, arg []byte)

XorUnsafeInplace sets main[pos] := main[pos] ^ arg[pos] for every position in main[].

WARNING: This is a function designed to be used in inner loops, which makes assumptions about length and capacity which aren't checked at runtime. Use the safe version of this function when that's a problem. Assumptions #2-3 are always satisfied when the last potentially-size-increasing operation on arg[] is {Re}makeUnsafe(), ResizeUnsafe(), or XcapUnsafe(), and the same is true for main[].

1. len(arg) and len(main) must be equal.

2. Capacities are at least RoundUpPow2(len(main) + 1, bytesPerVec).

3. The caller does not care if a few bytes past the end of main[] are changed.

type NibbleLookupTable Uses

type NibbleLookupTable struct {
    // contains filtered or unexported fields
}

NibbleLookupTable represents a parallel-byte-substitution operation f, where every byte b in a byte-slice is replaced with

f(b) := shuffle[0][b & 15] for b <= 127, and
f(b) := 0 for b > 127.

(The second part is usually irrelevant in practice, but must be defined this way to allow _mm_shuffle_epi8()/_mm256_shuffle_epi8()/_mm512_shuffle_epi8() to be used to implement the operation efficiently.) It's named NibbleLookupTable rather than ByteLookupTable since only the bottom nibble of each byte can be used for table lookup. It potentially stores multiple adjacent copies of the lookup table since that speeds up the AVX2 and AVX-512 use cases (the table can be loaded with a single _mm256_loadu_si256 operation, instead of e.g. _mm_loadu_si128 followed by _mm256_set_m128i with the same argument twice), and the typical use case involves initializing very few tables and using them many, many times.

func MakeNibbleLookupTable Uses

func MakeNibbleLookupTable(table [16]byte) (t NibbleLookupTable)

MakeNibbleLookupTable generates a NibbleLookupTable from a [16]byte.

func (*NibbleLookupTable) Get Uses

func (t *NibbleLookupTable) Get(b byte) byte

Get performs the b <= 127 part of the lookup operation described above. The b > 127 branch is omitted because in many use cases (e.g. PackedNibbleLookup below), it can be proven that b > 127 is impossible, and removing the if-statement is a significant performance win when it's possible.

Package simd imports 5 packages (graph) and is imported by 4 packages. Updated 2020-08-29. Refresh now. Tools for package owners.