tagdb

command module
v0.0.0-...-72cd24a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2023 License: GPL-3.0 Imports: 9 Imported by: 0

README

Build Status GoDoc

tagdb

Tagdb is a text search engine that offers fast word completion, and real time searches.

Tagdb is an external search engine (pure inverted index), that stores urls and tags, allowing you to index files, webpages, and anything else you can reference via a url. It can also store line numbers for a file, allowing you to jump straight to your search result.

Installation

Build
go build -o release/tagshell cmd/tagshell/tagshell.go
go build -o release/tagquery cmd/tagquery/tagquery.go 
go build -o release/tagserver cmd/tagserver/tagserver.go
go build -o release/tagloader cmd/tagloader/tagloader.go
Start

then start tagserver

./tagserver &

then load some files

./tagloader -verbose .

then run a search with

./tagquery quick brown fox

and you will see

./query quick brown fox
2017/04/06 18:47:01 Searching for [quick brown fox]
3: otherfiles/testsearch.txt(1)
3: README.md(29)
2017/04/06 18:47:01 Search complete

Use

tagshell

tagshell is a simple command line GUI that uses predictive, real time search to list your results and jump to them.

Start typing your search until you see the results you want, then press the down arrow to select the result you want to examine. Then right arrow will open that file.

tagloader

tagloader recursively scans files and directories, indexing their contents

  -addRecord
        Add record from the command line
  -debug
        Display additional debug information
  -noContents
        Do not look inside files
  -parallel int
        Maximum number of simultaneous inserts to attempt (default 1)
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")
  -verbose
        Show files as they are loaded

-verbose will print every filename as it is scanned.

By default, tagloader will treat the entire contents of the file as one "search result". It reads the entire file, building a tag list, and then stores that list. There are two options to control this:

-noContents will ignore the file contents and only store the file path (split up by usual word boundaries). Searches will only return a file if your search word occurs in the file name. -noContents is handy for indexing things like mp3 collections and photographs, where the contents contain no text.

-everyLine will store every line in a text file separately, so search results can return multiple lines in the same file. You can then jump to the correct line using programs like tagshell.

Tagloader creates a record in the database using the path to the file (based on the command line argument). It does no further processing of the path, and won't even normalise it. So if you give it a relative path, it will store relative paths, which will make it difficult to find the file again if you search for it while in another directory.

Relative paths are useful for things like indexing a webserver directory, so you can later build a full URL from the relative path and the server name. Absolute paths are more useful if you plan to access the files from the command line or other programs.

tagquery

tagquery searches the database, and can also command the database to shutdown

  -completeMatch
        Do not return partial matches
  -fingerprint
        Display the tag fingerprint for each result
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")
  -shutdown
        Shutdown the server
  -status
        Report status
-completeMatch

By default, tagdb shows you partial matches. If a record matches some of the tags you provided, it will be returned (with a lower score than if you matched all the tags). This is slower and clutters up the results, so you can request -completeMatch. -completeMatch will only return records where all your search terms match all the tags for the record.

-shutdown

Order the server to quit. This will take several seconds or minutes, depending on which storage layer you chose for your data.

-status

Print some server statistics

tagserver

tagserver is the main database, which listens for JSON-RPC requests and servers answers

  -config string
        Config file to load settings from (default "tagdb.conf")
  -cpuprofile string
        write cpu profile to file
  -debug
        Print extra debugging information.  Default: false
  -preAlloc int
        Allocate this many entries at startup.  Default: 1000000 (default 1000000)
-config

Read a configuration file. The default file is "tagdb.conf", in the current directory.

-preAlloc

If the database files run out of room, they must be extended and this takes some time. Preallocating entries can speed up this process. Only implemented for some storage methods.

fetchbot

fetchbot crawls a website and adds it to the database

  -match string
        Only follow URLs that match this regular expression
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")

Example:

./fetchbot --match "rock" -debug https://www.rockpapershotgun.com/

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
cmd
csearchquery
status.go
status.go
csearchshell
status.go
status.go
indexer
loader.go
loader.go
pick
status.go
status.go
tagloader
loader.go
loader.go
tagquery
status.go
status.go
tagserver
tagserver.go
tagserver.go
tagshell
status.go
status.go
silo
silo
lsmkv/entities
ent contains common types used throughout various lsmkv (sub-)packages
ent contains common types used throughout various lsmkv (sub-)packages
lsmkv/roaringset
The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy
The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL