tagdb

command module

v0.0.0-...-72cd24a Latest Latest Go to latest Published: Dec 4, 2023 License: GPL-3.0 Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/donomii/tagdb

Links

Open Source Insights

README ¶

tagdb

Tagdb is a text search engine that offers fast word completion, and real time searches.

Tagdb is an external search engine (pure inverted index), that stores urls and tags, allowing you to index files, webpages, and anything else you can reference via a url. It can also store line numbers for a file, allowing you to jump straight to your search result.

Installation

Build

go build -o release/tagshell cmd/tagshell/tagshell.go
go build -o release/tagquery cmd/tagquery/tagquery.go 
go build -o release/tagserver cmd/tagserver/tagserver.go
go build -o release/tagloader cmd/tagloader/tagloader.go

Start

then start tagserver

./tagserver &

then load some files

./tagloader -verbose .

then run a search with

./tagquery quick brown fox

and you will see

./query quick brown fox
2017/04/06 18:47:01 Searching for [quick brown fox]
3: otherfiles/testsearch.txt(1)
3: README.md(29)
2017/04/06 18:47:01 Search complete

Use

tagshell

tagshell is a simple command line GUI that uses predictive, real time search to list your results and jump to them.

Start typing your search until you see the results you want, then press the down arrow to select the result you want to examine. Then right arrow will open that file.

tagloader

tagloader recursively scans files and directories, indexing their contents

  -addRecord
        Add record from the command line
  -debug
        Display additional debug information
  -noContents
        Do not look inside files
  -parallel int
        Maximum number of simultaneous inserts to attempt (default 1)
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")
  -verbose
        Show files as they are loaded

-verbose will print every filename as it is scanned.

By default, tagloader will treat the entire contents of the file as one "search result". It reads the entire file, building a tag list, and then stores that list. There are two options to control this:

-noContents will ignore the file contents and only store the file path (split up by usual word boundaries). Searches will only return a file if your search word occurs in the file name. -noContents is handy for indexing things like mp3 collections and photographs, where the contents contain no text.

-everyLine will store every line in a text file separately, so search results can return multiple lines in the same file. You can then jump to the correct line using programs like tagshell.

Tagloader creates a record in the database using the path to the file (based on the command line argument). It does no further processing of the path, and won't even normalise it. So if you give it a relative path, it will store relative paths, which will make it difficult to find the file again if you search for it while in another directory.

Relative paths are useful for things like indexing a webserver directory, so you can later build a full URL from the relative path and the server name. Absolute paths are more useful if you plan to access the files from the command line or other programs.

tagquery

tagquery searches the database, and can also command the database to shutdown

  -completeMatch
        Do not return partial matches
  -fingerprint
        Display the tag fingerprint for each result
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")
  -shutdown
        Shutdown the server
  -status
        Report status

-completeMatch

By default, tagdb shows you partial matches. If a record matches some of the tags you provided, it will be returned (with a lower score than if you matched all the tags). This is slower and clutters up the results, so you can request -completeMatch. -completeMatch will only return records where all your search terms match all the tags for the record.

-shutdown

Order the server to quit. This will take several seconds or minutes, depending on which storage layer you chose for your data.

-status

Print some server statistics

tagserver

tagserver is the main database, which listens for JSON-RPC requests and servers answers

  -config string
        Config file to load settings from (default "tagdb.conf")
  -cpuprofile string
        write cpu profile to file
  -debug
        Print extra debugging information.  Default: false
  -preAlloc int
        Allocate this many entries at startup.  Default: 1000000 (default 1000000)

-config

Read a configuration file. The default file is "tagdb.conf", in the current directory.

-preAlloc

If the database files run out of room, they must be extended and this takes some time. Preallocating entries can speed up this process. Only implemented for some storage methods.

fetchbot

fetchbot crawls a website and adds it to the database

  -match string
        Only follow URLs that match this regular expression
  -server string
        Server IP and Port.  Default: 127.0.0.1:6781 (default "127.0.0.1:6781")

Example:

./fetchbot --match "rock" -debug https://www.rockpapershotgun.com/

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
csearchquery status.go	status.go
csearchshell status.go	status.go
fetchbot
indexer loader.go	loader.go
pick status.go	status.go
tagloader loader.go	loader.go
tagquery status.go	status.go
tagserver tagserver.go	tagserver.go
tagshell status.go	status.go
tagbrowser silo	silo
lsmkv
lsmkv/entities ent contains common types used throughout various lsmkv (sub-)packages	ent contains common types used throughout various lsmkv (sub-)packages
lsmkv/rbtree
lsmkv/roaringset The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy	The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy
lsmkv/segmentindex

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL