zoekt: github.com/google/zoekt Index | Files | Directories

package zoekt

import "github.com/google/zoekt"


Package Files

api.go bits.go contentprovider.go eval.go hititer.go indexbuilder.go indexdata.go indexfile_linux.go matchiter.go matchtree.go read.go section.go toc.go write.go


const FeatureVersion = 8

FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index.

const IndexFormatVersion = 15

FormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections


var DebugScore = false

DebugScore controls whether we collect data on match scores are constructed. Intended for use in tests.

var Version string

Filled by the linker (see build-deploy.sh)

func CheckText Uses

func CheckText(content []byte, maxTrigramCount int) error

CheckText returns a reason why the given contents are probably not source texts.

func ReadMetadata Uses

func ReadMetadata(inf IndexFile) (*Repository, *IndexMetadata, error)

ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.

func SortFilesByScore Uses

func SortFilesByScore(ms []FileMatch)

Sort a slice of results.

type Document Uses

type Document struct {
    Name              string
    Content           []byte
    Branches          []string
    SubRepositoryPath string
    Language          string

    // If set, something is wrong with the file contents, and this
    // is the reason it wasn't indexed.
    SkipReason string

    // Document sections for symbols. Offsets should use bytes.
    Symbols []DocumentSection

Document holds a document (file) to index.

type DocumentSection Uses

type DocumentSection struct {
    Start, End uint32

type FileMatch Uses

type FileMatch struct {
    // Ranking; the higher, the better.
    Score float64 // TODO - hide this field?

    // For debugging. Needs DebugScore set, but public so tests in
    // other packages can print some diagnostics.
    Debug string

    FileName string

    // Repository is the globally unique name of the repo of the
    // match
    Repository  string
    Branches    []string
    LineMatches []LineMatch

    // Only set if requested
    Content []byte

    // Checksum of the content.
    Checksum []byte

    // Detected language of the result.
    Language string

    // SubRepositoryName is the globally unique name of the repo,
    // if it came from a subrepository
    SubRepositoryName string

    // SubRepositoryPath holds the prefix where the subrepository
    // was mounted.
    SubRepositoryPath string

    // Commit SHA1 (hex) of the (sub)repo holding the file.
    Version string

FileMatch contains all the matches within a file.

type IndexBuilder Uses

type IndexBuilder struct {
    // contains filtered or unexported fields

IndexBuilder builds a single index shard.

func NewIndexBuilder Uses

func NewIndexBuilder(r *Repository) (*IndexBuilder, error)

NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.

func (*IndexBuilder) Add Uses

func (b *IndexBuilder) Add(doc Document) error

Add a file which only occurs in certain branches.

func (*IndexBuilder) AddFile Uses

func (b *IndexBuilder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for Add

func (*IndexBuilder) ContentSize Uses

func (b *IndexBuilder) ContentSize() uint32

ContentSize returns the number of content bytes so far ingested.

func (*IndexBuilder) Write Uses

func (b *IndexBuilder) Write(out io.Writer) error

type IndexFile Uses

type IndexFile interface {
    Read(off uint32, sz uint32) ([]byte, error)
    Size() (uint32, error)
    Name() string

IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.

func NewIndexFile Uses

func NewIndexFile(f *os.File) (IndexFile, error)

NewIndexFile returns a new index file. The index file takes ownership of the passed in file, and may close it.

type IndexMetadata Uses

type IndexMetadata struct {
    IndexFormatVersion  int
    IndexFeatureVersion int
    IndexTime           time.Time
    PlainASCII          bool
    LanguageMap         map[string]byte
    ZoektVersion        string

IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.

type LineFragmentMatch Uses

type LineFragmentMatch struct {
    // Offset within the line, in bytes.
    LineOffset int

    // Offset from file start, in bytes.
    Offset uint32

    // Number bytes that match.
    MatchLength int

LineFragmentMatch a segment of matching text within a line.

type LineMatch Uses

type LineMatch struct {
    // The line in which a match was found.
    Line       []byte
    LineStart  int
    LineEnd    int
    LineNumber int

    // If set, this was a match on the filename.
    FileName bool

    // The higher the better. Only ranks the quality of the match
    // within the file, does not take rank of file into account
    Score         float64
    LineFragments []LineFragmentMatch

LineMatch holds the matches within a single line in a file.

type RepoList Uses

type RepoList struct {
    Repos   []*RepoListEntry
    Crashes int

RepoList holds a set of Repository metadata.

type RepoListEntry Uses

type RepoListEntry struct {
    Repository    Repository
    IndexMetadata IndexMetadata
    Stats         RepoStats

type RepoStats Uses

type RepoStats struct {
    // Repos is used for aggregrating the number of repositories.
    Repos int

    // Shards is the total number of search shards.
    Shards int

    // Documents holds the number of documents or files.
    Documents int

    // IndexBytes is the amount of RAM used for index overhead.
    IndexBytes int64

    // ContentBytes is the amount of RAM used for raw content.
    ContentBytes int64

Statistics of a (collection of) repositories.

func (*RepoStats) Add Uses

func (s *RepoStats) Add(o *RepoStats)

type Repository Uses

type Repository struct {
    // The repository name
    Name string

    // The repository URL.
    URL string

    // The physical source where this repo came from, eg. full
    // path to the zip filename or git repository directory. This
    // will not be exposed in the UI, but can be used to detect
    // orphaned index shards.
    Source string

    // The branches indexed in this repo.
    Branches []RepositoryBranch

    // Nil if this is not the super project.
    SubRepoMap map[string]*Repository

    // URL template to link to the commit of a branch
    CommitURLTemplate string

    // The repository URL for getting to a file.  Has access to
    // {{Branch}}, {{Path}}
    FileURLTemplate string

    // The URL fragment to add to a file URL for line numbers. has
    // access to {{LineNumber}}. The fragment should include the
    // separator, generally '#' or ';'.
    LineFragmentTemplate string

    // All zoekt.* configuration settings.
    RawConfig map[string]string

    // Importance of the repository, bigger is more important
    Rank uint16

    // IndexOptions is a hash of the options used to create the index for the
    // repo.
    IndexOptions string

Repository holds repository metadata.

type RepositoryBranch Uses

type RepositoryBranch struct {
    Name    string
    Version string

RepositoryBranch describes an indexed branch, which is a name combined with a version.

type SearchOptions Uses

type SearchOptions struct {
    // Return an upper-bound estimate of eligible documents in
    // stats.ShardFilesConsidered.
    EstimateDocCount bool

    // Return the whole file.
    Whole bool

    // Maximum number of matches: skip all processing an index
    // shard after we found this many non-overlapping matches.
    ShardMaxMatchCount int

    // Maximum number of matches: stop looking for more matches
    // once we have this many matches across shards.
    TotalMaxMatchCount int

    // Maximum number of important matches: skip processing
    // shard after we found this many important matches.
    ShardMaxImportantMatch int

    // Maximum number of important matches across shards.
    TotalMaxImportantMatch int

    // Abort the search after this much time has passed.
    MaxWallTime time.Duration

    // Trim the number of results after collating and sorting the
    // results
    MaxDocDisplayCount int

func (*SearchOptions) SetDefaults Uses

func (o *SearchOptions) SetDefaults()

func (*SearchOptions) String Uses

func (s *SearchOptions) String() string

type SearchResult Uses

type SearchResult struct {
    Files []FileMatch

    // RepoURLs holds a repo => template string map.
    RepoURLs map[string]string

    // FragmentNames holds a repo => template string map, for
    // the line number fragment.
    LineFragments map[string]string

SearchResult contains search matches and extra data

type Searcher Uses

type Searcher interface {
    Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)

    // List lists repositories. The query `q` can only contain
    // query.Repo atoms.
    List(ctx context.Context, q query.Q) (*RepoList, error)

    // Describe the searcher for debug messages.
    String() string

func NewSearcher Uses

func NewSearcher(r IndexFile) (Searcher, error)

NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.

type Stats Uses

type Stats struct {
    // Amount of I/O for reading contents.
    ContentBytesLoaded int64

    // Amount of I/O for reading from index.
    IndexBytesLoaded int64

    // Number of search shards that had a crash.
    Crashes int

    // Wall clock time for this search
    Duration time.Duration

    // Number of files containing a match.
    FileCount int

    // Number of files in shards that we considered.
    ShardFilesConsidered int

    // Files that we evaluated. Equivalent to files for which all
    // atom matches (including negations) evaluated to true.
    FilesConsidered int

    // Files for which we loaded file content to verify substring matches
    FilesLoaded int

    // Candidate files whose contents weren't examined because we
    // gathered enough matches.
    FilesSkipped int

    // Shards that we did not process because a query was canceled.
    ShardsSkipped int

    // Number of non-overlapping matches
    MatchCount int

    // Number of candidate matches as a result of searching ngrams.
    NgramMatches int

    // Wall clock time for queued search.
    Wait time.Duration

Stats contains interesting numbers on the search

func (*Stats) Add Uses

func (s *Stats) Add(o Stats)


buildpackage build implements a more convenient interface for building zoekt indices.
cmd/zoekt-archive-indexCommand zoekt-archive-index indexes an archive.
cmd/zoekt-git-cloneThis binary fetches all repos of a user or organization and clones them.
cmd/zoekt-mirror-bitbucket-serverThis binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
cmd/zoekt-mirror-githubThis binary fetches all repos of a user or organization and clones them.
cmd/zoekt-mirror-gitilesThis binary fetches all repos of a Gitiles host.
cmd/zoekt-mirror-gitlabThis binary fetches all repos for a user from gitlab.
cmd/zoekt-repo-indexzoekt-repo-index indexes a repo-based repository.
cmd/zoekt-testzoekt-test compares the search engine results with raw substring search
gitindexPackage gitindex provides functions for indexing Git repositories.

Package zoekt imports 21 packages (graph) and is imported by 14 packages. Updated 2019-09-10. Refresh now. Tools for package owners.