Documentation ¶
Overview ¶
Package search provides the core search results.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ElasticSearch ¶
type ElasticSearch struct {
*document.ElasticSearch
}
ElasticSearch embeds our main Elasticsearch instance
func (*ElasticSearch) Fetch ¶
func (e *ElasticSearch) Fetch(q string, lang language.Tag, region language.Region, number int, offset int, votes []vote.Result) (*Results, error)
Fetch returns search results for a search query https://www.elastic.co/guide/en/elasticsearch/guide/current/one-lang-docs.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_single_query_string.html#know-your-data The idea here is to first filter out docs that do not want to be indexed. We then search multiple fields for the search query, giving more weight to certain fields. We also are searching the standard analyzer and the language-specific analyzer. We weight the domain > path, path > title, title > description. We also give extra weight for bigram matches (need trigram????): https://www.elastic.co/guide/en/elasticsearch/guide/current/shingles.html Note: "It is not useful to mix not_analyzed fields with analyzed fields in multi_match queries." TODO: A better domain name method...we could use regex ('.*hendrix'), prefix query, etc.
type Fetcher ¶
type Fetcher interface {
Fetch(q string, lang language.Tag, region language.Region, number int, page int, votes []vote.Result) (*Results, error)
}
Fetcher outlines the methods used to retrieve the core search results
type Results ¶
type Results struct { Count int64 `json:"count"` Page string `json:"page"` Previous string `json:"previous"` Next string `json:"next"` Last string `json:"last"` Pagination []string `json:"-"` Documents []*document.Document `json:"links"` }
Results are the core search results from a query
func (*Results) AddPagination ¶
AddPagination adds pagination to the search results
Directories ¶
Path | Synopsis |
---|---|
Package crawler is a distributed web crawler.
|
Package crawler is a distributed web crawler. |
cmd
Command crawler demonstrates how to run the crawler
|
Command crawler demonstrates how to run the crawler |
queue
Package queue manages the queue for a distributed crawler
|
Package queue manages the queue for a distributed crawler |
robots
Package robots handles caching robots.txt files
|
Package robots handles caching robots.txt files |
Package document parses URLs and the HTML of a webpage
|
Package document parses URLs and the HTML of a webpage |
Package vote handles storing and retrieving user votes on urls
|
Package vote handles storing and retrieving user votes on urls |