Rummage

package module
v0.0.0-...-1b26253 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 20, 2022 License: Apache-2.0 Imports: 22 Imported by: 0

README

Rummage

Rummage is an open, collaborative, and decentralised search mechanism for IPFS. A rummage node is able to crawl content on IPFS and add this to the index, which itself is stored in a decentralised manner on IPFS.

Rummage currenly supports parsing PDF's and any HTML files stored on IPFS.

How Rummage Works

Rummage divides the crawled data into two levels of indices.

  • High Level Index (HLI)
  • Key Word Index (KWI)

High Level Index stores the CID for the KWI of each keyword, then making this a highly indexable mesh. This data is constantly updated when a new craw is submitted to the network

We allow for two specific actions on the client

  • Search
  • Crawl

Search will be used to perform sophisticated searches, while Crawl will be used to submit crawl data by clients

Installation

git clone https://github.com/adekunle-oOo/rummage.git

Install dependencies

sudo apt-get install g++
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
wget http://www.leptonica.org/source/leptonica-1.81.1.tar.gz
sudo tar xf leptonica-1.81.1.tar.gz
cd leptonica-1.81.1 &&\
sudo ./configure &&\
sudo apt install make
sudo make &&\
sudo make install
sudo apt-get install tesseract-ocr # or sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Install Packages

go get -t github.com/otiai10/gosseract
go get github.com/ipfs/go-ipfs-api
go get github.com/wealdtech/go-ens/v3
go get github.com/otiai10/gosseract/v2

Build

sudo go build Rummage/CLI/.

Run (BETA)

./CLI

Documentation

Index

Constants

View Source
const (
	StopCharacter = "\r\n\r\n"
)

Variables

View Source
var (
	Shell  *ipfsapi.Shell
	Client *ethclient.Client
	HLI    string
)

defining the global variables used

Functions

func ConnectClient

func ConnectClient(Infura string, HLI string, ip string, port int, passW string) (*ipfsapi.Shell, *ethclient.Client)

function to setup the local connections to ipfs, eth gateway, gateway server address etc. to be used by clients using the CLI application

func ConnectServer

func ConnectServer(Infura string, HLI string) (*ipfsapi.Shell, *ethclient.Client)

function to setup the local connections to ipfs, eth gateway etc. to be used at gateway server running the web interface

func CreateIndexEntryClient1

func CreateIndexEntryClient1(data []string, cid string) error

indexing from the client

func CreateIndexEntryServer1

func CreateIndexEntryServer1(data []string, cid string) error

indexing from the server checks if entry exist in HLI check if entry for specific document exist in KWI, otherwise add adds to IPNS

func DoCrawlClient

func DoCrawlClient(name string, t string) error

initiates a crawl for a name and content type t to be used at client and uses the server to update the HLI entry on IPNS files are stored locally. Future releases will use tmpdir

func DoCrawlServer

func DoCrawlServer(name string, t string) error

initiates a crawl for a name and content type t to be used on server (and therefore performs name publishing step locally) files are stored locally. Future releases will use tmpdir

func DoSearchClient

func DoSearchClient(searchTerms []string) error

search to be used from the client

func ExtractPdfDataOCR

func ExtractPdfDataOCR(name string) ([]string, error)

function for extracting keywords from pdf with tesseract OCR

Types

type IncorrrectInput

type IncorrrectInput struct{}

func (*IncorrrectInput) Error

func (zz *IncorrrectInput) Error() string

type QueryResult

type QueryResult struct {
	SearchTerm string `json:"searchTerm"`
	CID        string `json:"CID"`
	Metadata   string `json:"metadata"`
}

structure for returning search results

func DoSearch1

func DoSearch1(query string) ([]QueryResult, error)

search to be used from the gateway server

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL