search-engine

module
v0.0.0-...-c4765d5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2018 License: MIT

README

Search Engine in Go

MIT license Build Status

This is experimental project baked with go-colly and blevesearch to build search engine in Go.

Prerequisite

To use package in blevex (blevesearch's extension), it is required to install C dependency as following steps.

This is verified against Debian GNU/Linux 8.11 (jessie)

$ sudo apt-get install libleveldb-dev libstemmer-dev libicu-dev build-essential
$ cd ~
$ git clone https://github.com/blevesearch/cld2.git
$ cd cld2/internal/
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

Minimum requirement for ICU analyzer and tokenizer, libicu-dev only is required. Thus, go ahead and simply install it by issuing this command in prompt

$ sudo apt-get install libicu-dev

Installation

Use git clone or go get to download project to your go workspace in $GOPATH then run dep ensure to initialise project.


$ go get github.com/atthakorn/search-engine
$ cd $GOPATH/src/github.com/atthakorn/search-engine
$ dep ensure

Config

Here is the list of parameter you can find in .config.yml


# sites to crawl
entryPoint:
- https://en.wikipedia.org/wiki/Main_Page


# max depth for crawler to follow
maxDepth: 2

# max worker
parallelism: 1

# random delay (second)
delay: 1


# path to store scraped data from crawler
dataPath: "data/crawl"

# path to store indexed data
indexPath: "data/index"


# http address, 0.0.0.0:8080 or :8080, it means listening all ipv4 address in local machine
httpAddress: ":8080"

Crawling & Indexing

To crawling websites, just cd to project root and run

$ go run cmd/crawl/main.go

Once, crawl complete, the scraped data will be kept at ./data/crawl/*.json

Now you can index crawler's scraped data by issuing following command

$ go run cmd/index/main.go

The data will be indexed by boltdb, the file will be located at ./data/index/*

Start Server

Search Engine comes with simple search application , you can start http server by following command

$ go run cmd/http/main.go

and you can open application via browser at http://localhost:8080

License

This project is licensed under MIT License

Directories

Path Synopsis
cmd
internal
web

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL