zoekt: github.com/google/zoekt Index | Files | Directories

package zoekt

import "github.com/google/zoekt"

Index

Package Files

api.go bits.go contentprovider.go eval.go hititer.go indexbuilder.go indexdata.go indexfile_linux.go matchiter.go matchtree.go read.go section.go toc.go write.go

Constants

const FeatureVersion = 8

FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index.

const IndexFormatVersion = 15

FormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections

Variables

var DebugScore = false

DebugScore controls whether we collect data on match scores are constructed. Intended for use in tests.

var Version string

Filled by the linker (see build-deploy.sh)

func CheckText Uses

func CheckText(content []byte) error

CheckText returns a reason why the given contents are probably not source texts.

func ReadMetadata Uses

func ReadMetadata(inf IndexFile) (*Repository, *IndexMetadata, error)

ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.

func SortFilesByScore Uses

func SortFilesByScore(ms []FileMatch)

Sort a slice of results.

type Document Uses

type Document struct {
    Name              string
    Content           []byte
    Branches          []string
    SubRepositoryPath string
    Language          string

    // If set, something is wrong with the file contents, and this
    // is the reason it wasn't indexed.
    SkipReason string

    // Document sections for symbols. Offsets should use bytes.
    Symbols []DocumentSection
}

Document holds a document (file) to index.

type DocumentSection Uses

type DocumentSection struct {
    Start, End uint32
}

type FileMatch Uses

type FileMatch struct {
    // Ranking; the higher, the better.
    Score float64 // TODO - hide this field?

    // For debugging. Needs DebugScore set, but public so tests in
    // other packages can print some diagnostics.
    Debug string

    FileName string

    // Repository is the globally unique name of the repo of the
    // match
    Repository  string
    Branches    []string
    LineMatches []LineMatch

    // Only set if requested
    Content []byte

    // Checksum of the content.
    Checksum []byte

    // Detected language of the result.
    Language string

    // SubRepositoryName is the globally unique name of the repo,
    // if it came from a subrepository
    SubRepositoryName string

    // SubRepositoryPath holds the prefix where the subrepository
    // was mounted.
    SubRepositoryPath string

    // Commit SHA1 (hex) of the (sub)repo holding the file.
    Version string
}

FileMatch contains all the matches within a file.

type IndexBuilder Uses

type IndexBuilder struct {
    // contains filtered or unexported fields
}

IndexBuilder builds a single index shard.

func NewIndexBuilder Uses

func NewIndexBuilder(r *Repository) (*IndexBuilder, error)

NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.

func (*IndexBuilder) Add Uses

func (b *IndexBuilder) Add(doc Document) error

Add a file which only occurs in certain branches.

func (*IndexBuilder) AddFile Uses

func (b *IndexBuilder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for Add

func (*IndexBuilder) ContentSize Uses

func (b *IndexBuilder) ContentSize() uint32

ContentSize returns the number of content bytes so far ingested.

func (*IndexBuilder) Write Uses

func (b *IndexBuilder) Write(out io.Writer) error

type IndexFile Uses

type IndexFile interface {
    Read(off uint32, sz uint32) ([]byte, error)
    Size() (uint32, error)
    Close()
    Name() string
}

IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.

func NewIndexFile Uses

func NewIndexFile(f *os.File) (IndexFile, error)

NewIndexFile returns a new index file. The index file takes ownership of the passed in file, and may close it.

type IndexMetadata Uses

type IndexMetadata struct {
    IndexFormatVersion  int
    IndexFeatureVersion int
    IndexTime           time.Time
    PlainASCII          bool
    LanguageMap         map[string]byte
    ZoektVersion        string
}

IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.

type LineFragmentMatch Uses

type LineFragmentMatch struct {
    // Offset within the line, in bytes.
    LineOffset int

    // Offset from file start, in bytes.
    Offset uint32

    // Number bytes that match.
    MatchLength int
}

LineFragmentMatch a segment of matching text within a line.

type LineMatch Uses

type LineMatch struct {
    // The line in which a match was found.
    Line       []byte
    LineStart  int
    LineEnd    int
    LineNumber int

    // If set, this was a match on the filename.
    FileName bool

    // The higher the better. Only ranks the quality of the match
    // within the file, does not take rank of file into account
    Score         float64
    LineFragments []LineFragmentMatch
}

LineMatch holds the matches within a single line in a file.

type RepoList Uses

type RepoList struct {
    Repos   []*RepoListEntry
    Crashes int
}

RepoList holds a set of Repository metadata.

type RepoListEntry Uses

type RepoListEntry struct {
    Repository    Repository
    IndexMetadata IndexMetadata
    Stats         RepoStats
}

type RepoStats Uses

type RepoStats struct {
    // Repos is used for aggregrating the number of repositories.
    Repos int

    // Shards is the total number of search shards.
    Shards int

    // Documents holds the number of documents or files.
    Documents int

    // IndexBytes is the amount of RAM used for index overhead.
    IndexBytes int64

    // ContentBytes is the amount of RAM used for raw content.
    ContentBytes int64
}

Statistics of a (collection of) repositories.

func (*RepoStats) Add Uses

func (s *RepoStats) Add(o *RepoStats)

type Repository Uses

type Repository struct {
    // The repository name
    Name string

    // The repository URL.
    URL string

    // The physical source where this repo came from, eg. full
    // path to the zip filename or git repository directory. This
    // will not be exposed in the UI, but can be used to detect
    // orphaned index shards.
    Source string

    // The branches indexed in this repo.
    Branches []RepositoryBranch

    // Nil if this is not the super project.
    SubRepoMap map[string]*Repository

    // URL template to link to the commit of a branch
    CommitURLTemplate string

    // The repository URL for getting to a file.  Has access to
    // {{Branch}}, {{Path}}
    FileURLTemplate string

    // The URL fragment to add to a file URL for line numbers. has
    // access to {{LineNumber}}. The fragment should include the
    // separator, generally '#' or ';'.
    LineFragmentTemplate string

    // All zoekt.* configuration settings.
    RawConfig map[string]string

    // Importance of the repository, bigger is more important
    Rank uint16

    // IndexOptions is a hash of the options used to create the index for the
    // repo.
    IndexOptions string
}

Repository holds repository metadata.

type RepositoryBranch Uses

type RepositoryBranch struct {
    Name    string
    Version string
}

RepositoryBranch describes an indexed branch, which is a name combined with a version.

type SearchOptions Uses

type SearchOptions struct {
    // Return an upper-bound estimate of eligible documents in
    // stats.ShardFilesConsidered.
    EstimateDocCount bool

    // Return the whole file.
    Whole bool

    // Maximum number of matches: skip all processing an index
    // shard after we found this many non-overlapping matches.
    ShardMaxMatchCount int

    // Maximum number of matches: stop looking for more matches
    // once we have this many matches across shards.
    TotalMaxMatchCount int

    // Maximum number of important matches: skip processing
    // shard after we found this many important matches.
    ShardMaxImportantMatch int

    // Maximum number of important matches across shards.
    TotalMaxImportantMatch int

    // Abort the search after this much time has passed.
    MaxWallTime time.Duration

    // Trim the number of results after collating and sorting the
    // results
    MaxDocDisplayCount int
}

func (*SearchOptions) SetDefaults Uses

func (o *SearchOptions) SetDefaults()

func (*SearchOptions) String Uses

func (s *SearchOptions) String() string

type SearchResult Uses

type SearchResult struct {
    Stats
    Files []FileMatch

    // RepoURLs holds a repo => template string map.
    RepoURLs map[string]string

    // FragmentNames holds a repo => template string map, for
    // the line number fragment.
    LineFragments map[string]string
}

SearchResult contains search matches and extra data

type Searcher Uses

type Searcher interface {
    Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)

    // List lists repositories. The query `q` can only contain
    // query.Repo atoms.
    List(ctx context.Context, q query.Q) (*RepoList, error)
    Close()

    // Describe the searcher for debug messages.
    String() string
}

func NewSearcher Uses

func NewSearcher(r IndexFile) (Searcher, error)

NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.

type Stats Uses

type Stats struct {
    // Amount of I/O for reading contents.
    ContentBytesLoaded int64

    // Amount of I/O for reading from index.
    IndexBytesLoaded int64

    // Number of search shards that had a crash.
    Crashes int

    // Wall clock time for this search
    Duration time.Duration

    // Number of files containing a match.
    FileCount int

    // Number of files in shards that we considered.
    ShardFilesConsidered int

    // Files that we evaluated. Equivalent to files for which all
    // atom matches (including negations) evaluated to true.
    FilesConsidered int

    // Files for which we loaded file content to verify substring matches
    FilesLoaded int

    // Candidate files whose contents weren't examined because we
    // gathered enough matches.
    FilesSkipped int

    // Shards that we did not process because a query was canceled.
    ShardsSkipped int

    // Number of non-overlapping matches
    MatchCount int

    // Number of candidate matches as a result of searching ngrams.
    NgramMatches int

    // Wall clock time for queued search.
    Wait time.Duration
}

Stats contains interesting numbers on the search

func (*Stats) Add Uses

func (s *Stats) Add(o Stats)

Directories

PathSynopsis
buildpackage build implements a more convenient interface for building zoekt indices.
cmd
cmd/zoekt
cmd/zoekt-archive-indexCommand zoekt-archive-index indexes an archive.
cmd/zoekt-git-cloneThis binary fetches all repos of a user or organization and clones them.
cmd/zoekt-git-index
cmd/zoekt-index
cmd/zoekt-indexserver
cmd/zoekt-mirror-bitbucket-serverThis binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
cmd/zoekt-mirror-gerrit
cmd/zoekt-mirror-githubThis binary fetches all repos of a user or organization and clones them.
cmd/zoekt-mirror-gitilesThis binary fetches all repos of a Gitiles host.
cmd/zoekt-mirror-gitlabThis binary fetches all repos for a user from gitlab.
cmd/zoekt-repo-indexzoekt-repo-index indexes a repo-based repository.
cmd/zoekt-testzoekt-test compares the search engine results with raw substring search
cmd/zoekt-webserver
ctags
gitindexPackage gitindex provides functions for indexing Git repositories.
query
shards
web

Package zoekt imports 21 packages (graph) and is imported by 14 packages. Updated 2019-06-25. Refresh now. Tools for package owners.