Documentation ¶
Index ¶
- func Binomial(n, k int) int
- func Bitmasks2DNA(bitmasks map[string][]bool) (string, error)
- func CheckIsDNA(input string) bool
- func Complement(input string) (string, error)
- func ConstructDeBruijnGraphKmers(kmers []string) (map[string][]string, error)
- func ConstructDeBruijnGraphString(text string, k int) (map[string][]string, error)
- func CountHammingNeighbors(n, d, c int) (int, error)
- func CountKmersMismatches(input string, k, d int) (int, error)
- func CountNucleotides(dna string) (map[string]int, error)
- func CountNucleotidesArray(dna string) ([]int, error)
- func DNA2Bitmasks(input string) (map[string][]bool, error)
- func EqualBoolSlices(a, b []bool) bool
- func EqualIntSlices(a, b []int) bool
- func EqualStringSlices(a, b []string) bool
- func Factorial(n int) int
- func FindApproximateOccurrences(pattern, text string, d int) ([]int, error)
- func FindClumps(genome string, k, L, t int) ([]string, error)
- func FindMotifs(dna []string, k, d int) ([]string, error)
- func FindOccurrences(pattern, genome string) ([]int, error)
- func FrequencyArray(input string, k int) ([]int, error)
- func GetSortedKeys(m map[string][]string) ([]string, error)
- func GibbsSampler(dna []string, k, t, n int) ([]string, int, error)
- func GreedyMotifSearch(dna []string, k, t int, pseudocounts bool) ([]string, error)
- func GreedyMotifSearchNoPseudocounts(dna []string, k, t int) ([]string, error)
- func GreedyMotifSearchPseudocounts(dna []string, k, t int) ([]string, error)
- func HammingDistance(p, q string) (int, error)
- func KeySetIntersection(input []map[string]int) ([]string, error)
- func KmerComposition(input string, k int) ([]string, error)
- func KmerHistogram(input string, k int) (map[string]int, error)
- func KmerHistogramMismatches(input string, k, d int) (map[string]int, error)
- func KmerInOrderList(dna string, k int) ([]string, error)
- func ManyGibbsSamplers(dna []string, k, t, n, n_starts int) ([]string, error)
- func ManyRandomMotifSearches(dna []string, k, t, n int) ([]string, error)
- func MedianString(dna []string, k int) ([]string, error)
- func MinKmerDistance(pattern, text string) (int, error)
- func MinKmerDistances(pattern string, inputs []string) (int, error)
- func MinSkewPositions(genome string) ([]int, error)
- func MoreFrequentThanNKmers(input string, k, N int) ([]string, error)
- func MostFrequentKmers(input string, k int) ([]string, error)
- func MostFrequentKmersMismatches(input string, k, d int) ([]string, error)
- func MostFrequentKmersMismatchesRevComp(input string, k, d int) ([]string, error)
- func NumberToPattern(n, k int) (string, error)
- func OverlapGraph(patterns []string) (map[string][]string, error)
- func PatternCount(input string, pattern string) int
- func PatternToNumber(input string) (int, error)
- func ProfileMostProbableKmer(dna string, k int, profile [][]float32) (string, error)
- func ProfileMostProbableKmers(dna string, k int, profile [][]float32) ([]string, error)
- func ProfileMostProbableKmersGreedy(dna string, k int, profile [][]float32) (string, error)
- func RandomMotifSearchPseudocounts(dna []string, k, t int) ([]string, int, error)
- func ReadLines(path string) ([]string, error)
- func ReadMatrix32(lines []string, k int) ([][]float32, error)
- func ReconstructGenomeFromPath(contigs []string) (string, error)
- func ReconstructGenomeFromPath_old(contigs []string) (string, error)
- func ReverseComplement(input string) (string, error)
- func ReverseString(s string) string
- func SPrintOverlapGraph(overlap_graph map[string][]string, one_edge_per_line bool) (string, error)
- func TheseFloatsAreEqual(a, b float32) bool
- func VisitHammingNeighbors(input string, d int) ([]string, error)
- func WriteLines(lines []string, path string) error
- type DirGraph
- type Node
- type ScoredMotifMatrix
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Bitmasks2DNA ¶
Convert four bitmasks (one each for ATGC) into a DNA string.
func CheckIsDNA ¶
Given an alleged DNA input string, iterate through it character by character to ensure that it only contains ATGC. Returns true if this is DNA (ATGC only), false otherwise.
func Complement ¶
Given a DNA input string, find the complement. The complement swaps Gs and Cs, and As and Ts.
func CountHammingNeighbors ¶
Given an input string of DNA of length n, a maximum Hamming distance of d, and a number of codons c, determine the number of Hamming neighbors of distance less than or equal to d using a combinatorics formula.
func CountKmersMismatches ¶
Count the number of times a given kmer and any Hamming neighbors (distance d or less) occur in the input string.
func CountNucleotides ¶
Count the number of each type of nucleotide ACGT.
func CountNucleotidesArray ¶
Count the number of each type of nucleotide ACGT and return as an array in order A, C, G, T.
func DNA2Bitmasks ¶
Convert a DNA string into four bitmasks: one each for ATGC. That is, for the DNA string AATCCGCT, it would become:
bitmask[A] = 11000000 bitmask[T] = 00100001 bitmask[C] = 00011010 bitmask[G] = 00000100
func EqualBoolSlices ¶
Utility function: check if two boolean arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]bool) and slices (of type []bool).
func EqualIntSlices ¶
Check if two int arrays/array slices are equal.
func EqualStringSlices ¶
Utility function: check if two string arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]string) and slices (of type []string).
func FindApproximateOccurrences ¶
Given a large string (text) and a string (pattern), find the zero-based indices where we have an occurrence of pattern or a string with Hamming distance d or less from pattern.
func FindClumps ¶
Find k-mers (patterns) of length k occuring at least t times over an interval of length L in a genome.
func FindMotifs ¶
Given a collection of strings Dna and an integer d, a k-mer is a (k,d)-motif if it appears in every string from Dna with at most d mismatches.
func FindOccurrences ¶
Given a large string (genome) and a string (pattern), find the zero-based indices where pattern occurs in genome.
func FrequencyArray ¶
Generate and return the frequency array for an input string for all kmers of a given length k.
To do this, we assemble the kmer histogram map, then convert that into the frequency array.
func GetSortedKeys ¶
Utility method: given a map of string to []string, extract a list of all string keys, sort them, and return the sorted list.
func GibbsSampler ¶
Implement a Gibbs sampler with pseudocounts. The Gibbs sampler starts with random kmers, and samples kmers randomly generated from a Profile matrix. Better sampling makes the algorithm faster.
func GreedyMotifSearch ¶
Given an integer k (kmer size) and t (len(dna)), return a collection of kmer strings that have the lowest score (highest similarity). If at any step you find more than one Profile-most probable k-mer in a given DNA string, use the one occurring first. Boolean pseudocounts turns on/off pseudocounts.
func GreedyMotifSearchNoPseudocounts ¶
Run a greedy motif search using regular counts.
func GreedyMotifSearchPseudocounts ¶
Run a greedy motif search using pseudocounts.
func HammingDistance ¶
Compute the Hamming distance between two strings. The Hamming distance is defined as the number of characters different between two strings.
func KeySetIntersection ¶
Find the intersection of the key sets for a slice of string to integer maps.
func KmerComposition ¶
Given an input DNA string, generate a set of all k-mers of length k in the input string.
func KmerHistogram ¶
Return the histogram of kmers of length k found in the given input
func KmerHistogramMismatches ¶
Return the histogram of all kmers of length k that are in the input, or whose Hamming neighbors within distance d are in the input.
func KmerInOrderList ¶
Return a list of kmers of length k that occur in a DNA string. This list preserves order in which the kmers appear in DNA. This list does not include duplicates.
func ManyGibbsSamplers ¶
Driver function to run multiple random motif searches and keep the best of all runs. n is the number of inner loops in one run of the Gibbs Sampler. n_starts is the number of times the Gibbs Sampler is run.
func ManyRandomMotifSearches ¶
Driver function to run multiple random motif searches and keep the best of all runs.
func MinKmerDistance ¶
Given a k-mer pattern and a longer string text, find the minimum distance from k-mer pattern to any possible k-mer in text.
func MinKmerDistances ¶
Given a k-mer pattern and a set of strings, find the sum (L1 norm) of the shortest distances from k-mer pattern to each input string.
func MinSkewPositions ¶
The skew of a genome is the difference between the number of G and C codons that have occurred cumulatively in a given strand of DNA. This function computes the positions in the genome at which the cumulative skew is minimized.
func MoreFrequentThanNKmers ¶
Find the kmer(s) in the kmer histogram exceeding a count of N, and return as a string array slice
func MostFrequentKmers ¶
Find the most frequent kmer(s) in the kmer histogram, and return as a string array slice
func MostFrequentKmersMismatches ¶
Find the most frequent kmer(s) of length k in the given input string. Include mismatches of Hamming distance <= d.
func MostFrequentKmersMismatchesRevComp ¶
Find the most frequent kmer(s) of length k in the given input string and its reverse complement. Include mismatches of Hamming distance <= d.
func NumberToPattern ¶
NumberToPattern converts an integer n and a kmer length k into the corresponding kmer string.
NOTE: We should be a little more careful about integer overflow, as that can easily happen for large k.
func OverlapGraph ¶
Construct the overlap graph of a collection of kmers. Given: arbitrary collection of kmers. Create: graph having 1 node for each kmer in kmer patterns Connect: kmers Pattern and Pattern' by directed edge if Suffix(Pattern) is equal to Prefix(Pattern') The resulting graph is called the overlap graph on these k-mers, denoted Overlap(Patterns).
Return the overlap graph Overlap(Patterns), in the form of an adjacency list.
func PatternCount ¶
Count occurrences of a substring pattern in a string input
func PatternToNumber ¶
PatternToNumber transforms a kmer of a given length into a corresponding integer indicating its lexicographic ordering among all kmers of length k.
A = 0 C = 1 G = 2 T = 3
Example for k = 3: C G T | | | | | T - - > 3 * 4^{k-3} | G - - - > 2 * 4^{k-2} C - - - - > 1 * 4^{k-1}
This basically boils down to transforming a number between base 10 (integer) and base 4 (DNA)
func ProfileMostProbableKmer ¶
Only return the _most_ probable kmer.
func ProfileMostProbableKmers ¶
Given a profile matrix, and given a DNA input string, evaluate the probability of every kmer in the DNA string and find the most probable kmer in the text - the kmer that was most likely to have been generated by profile among all kmers in text.
This particular method does not pay attention to order of occurrence of kmers.
func ProfileMostProbableKmersGreedy ¶
This uses a probility matrix and evaluates all possible kmers in a DNA string to determine which kmers in the DNA string match the profile most closely.
The greedy version maintains the order in which kmers occur in the original DNA string, and stops as soon as the first match is found.
func RandomMotifSearchPseudocounts ¶
Run a random motif search with pseudocounts.
func ReadMatrix32 ¶
ReadMatrix takes a set of lines containing a multidimensional array of floating point values, k elements per line, n lines, and returns a slice of slices with size slice[k][n] and with type float32.
func ReconstructGenomeFromPath ¶
Given a set of kmers that overlap such that the last k-1 symbols of pattern i equal the first k-1 symbols of pattern i+1 for all i = 1 to n-1, return a string of length k + n - 1 where the ith kmer is equal to pattern i
func ReconstructGenomeFromPath_old ¶
Given a genome path, i.e., a set of k-mers that overlap by some unknown number (up to k-1) of characters each, assemble the paths into a single string containing the genome.
Note: This solved a problem that is slightly more general than the problem actually given - here we assume the number of characters overlapping is unknown, but the problem on Rosalind.info says it's always 1.
func ReverseComplement ¶
Given a DNA input string, find the reverse complement. The complement swaps Gs and Cs, and As and Ts. The reverse complement reverses that.
func ReverseString ¶
Reverse returns its argument string reversed rune-wise left to right. https://github.com/golang/example/blob/master/stringutil/reverse.go
func SPrintOverlapGraph ¶
Print string representation of an overlap graph (map of string to []string) with the form: "SRC -> DEST" (no double quotes, one edge per line) and return the resulting string. The edges are ordered.
func TheseFloatsAreEqual ¶
Check if two floats are equal, to within some small tolerance.
func VisitHammingNeighbors ¶
Given an input string of DNA, generate variations of said string that are a Hamming distance of less than or equal to d.
func WriteLines ¶
WriteLines writes the lines to the given file.
Types ¶
type DirGraph ¶
type DirGraph struct {
// contains filtered or unexported fields
}
Directed graph type
type ScoredMotifMatrix ¶
type ScoredMotifMatrix struct {
// contains filtered or unexported fields
}
Create a struct to hold a set of motifs (kmers) and their associated score. We continually assemble many of these possible sets of motifs, checking to find a set of motifs with a minimum score. The score is not updated dyanmically, see UpdateScore().
func (*ScoredMotifMatrix) AddMotif ¶
func (s *ScoredMotifMatrix) AddMotif(motif string) error
Add a motif to the motif matrix
func (*ScoredMotifMatrix) MakeProfile ¶
func (s *ScoredMotifMatrix) MakeProfile(pseudocounts bool) ([][]float32, error)
func (*ScoredMotifMatrix) UpdateScore ¶
func (s *ScoredMotifMatrix) UpdateScore() error
Update the value of the score of a ScoredMotifMatrix. This assembles a kmer composed of the most common nucleotide per position, then computes the sum of the Hamming distances from that kmer for all motifs.