lcss

package module
v1.0.0-...-dfc501d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2018 License: Apache-2.0 Imports: 2 Imported by: 5

README

go-lcss logo

go-lcss

Fast Longest Common Substring algorithm in Go.

GoDoc Travis build Status Code coverage Go Report Card Apache 2.0 license

OverviewHow To UseInstallationContributionsLicense


Overview

Longest Common Substring (don't confuse with Longest Common Subsequence) is a well-known computer science problem of finding the longest substring which appears in all the given strings. It can be efficiently solved in log-linear time and linear space. The implemented algorithm uses the code to build suffix arrays which was copy-pasted from the Go's standard library - it is internal to index/suffixarray and unfortunately cannot be invoked directly. There is a blog post which gives some general understanding of that algorithm, though the actual implementation is quite different (and optimized).

How To Use

There are two functions: "basic" and "advanced". The former is straightforward:

import "gopkg.in/vmarkovtsev/go-lcss.v1"

lcss.LongestCommonSubstring([]byte("ABABC"), []byte("BABCA"), []byte("ABCBA"))
// []byte("ABC")

The "advanced" function allows to cache the suffix arrays. It can be useful since the construction of suffix arrays dominates over the run time of the algorithm:


import "gopkg.in/vmarkovtsev/go-lcss.v1"

s1, s2, s3 := []byte("ABABC"), []byte("BABCA"), []byte("ABCBA")
lcss.LongestCommonSubstringWithSuffixArrays(
	[][]byte{s1, s2, s3},
	[][]int{lcss.Qsufsort(s1), lcss.Qsufsort(s2), lcss.Qsufsort(s3)})
// []byte("ABC")

Installation

go get gopkg.in/vmarkovtsev/go-lcss.v1

The code supports building under Go >= 1.8.

Contributions

...are pretty much welcome! See contributing.md and code_of_conduct.md.

License

Apache 2.0, see LICENSE. It allows you to use this code in proprietary projects.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LongestCommonSubstring

func LongestCommonSubstring(strs ...[]byte) []byte

LongestCommonSubstring returns the longest substring which is present in all the given strings. https://en.wikipedia.org/wiki/Longest_common_substring_problem Not to be confused with the Longest Common Subsequence. Complexity: * time: sum of `n_i*log(n_i)` where `n_i` is the length of each string. * space: sum of `n_i`. Returns a byte slice which is never a nil.

### Algorithm. We build suffix arrays for each of the passed string and then follow the same procedure as in merge sort: pick the least suffix in the lexicographical order. It is possible because the suffix arrays are already sorted. We record the last encountered suffixes from each of the strings and measure the longest common prefix of those at each "merge sort" step. The string comparisons are optimized by maintaining the char-level prefix tree of the "heads" of the suffix array sequences.

func LongestCommonSubstringWithSuffixArrays

func LongestCommonSubstringWithSuffixArrays(strs [][]byte, suffixes [][]int) []byte

LongestCommonSubstringWithSuffixArrays returns the longest substring which is present in all the given strings. The corresponding suffix arrays must be precomputed with `lcss.Qsufsort`. https://en.wikipedia.org/wiki/Longest_common_substring_problem Not to be confused with the Longest Common Subsequence. Complexity: * time: sum of the string lengths. * space: sum of the string lengths. Returns a byte slice which is never a nil.

### Algorithm. We follow the same procedure as in merge sort: pick the least suffix in the lexicographical order. It is possible because the suffix arrays are already sorted. We record the last encountered suffixes from each of the strings and measure the longest common prefix of those at each "merge sort" step. The string comparisons are optimized by maintaining the char-level prefix tree of the "heads" of the suffix array sequences.

func Qsufsort

func Qsufsort(data []byte) []int

Qsufsort constructs the suffix array for a given string.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL