proxy

package module

v1.0.0 Latest Latest Go to latest Published: Jan 12, 2017 License: MIT Imports: 15 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/KensoDev/go-solr-proxy

Links

Open Source Insights

README ¶

Go SOLR Proxy

Production ready docker friendly proxy/document cache/load balancer for SOLR servers.

Why?

We at Trip.com are working heavily with SOLR and we had some really painful use cases with it that we wanted to solve.

1. Load Balancing

Solr Load Balancing

We use multiple SOLR slaves and we want a way to load balance between them.

Also, while backing up an instance we take down Jetty for that instance, meaning SOLR is basically down as far as the application is concerned, we want to put this in Standby mode until the backup is finished.

2. Single configurable endpoint

Since we are using SOLR heavily, we want to be able to switch out servers without deploying production or changing configuration files. We want a single access point to all servers that will be load balanced and the same for the master and all the slaves.

This way, we can reindex a full cluster from scratch and just switch out production with no configuration change in the app.

3. Partial update document cache

Implementing Partial updates in SOLR is crucial for indexing speeds.

Say you want to update a single field across all documents, you have to reindex your entire cluster, which can take days.

SOLR partially supports this allowing you to update a field in a document, however there's a limitation Read Here

This linked documentation mentions this limitation pretty clearly

SOLR Partial updates

Getting around that limitation

ElasticSearch bypasses this limitation smartly

In Updating a Whole Document, we said that the way to update a document is to retrieve it, change it, and then reindex the whole document. This is true. However, using the update API, we can make partial updates like incrementing a counter in a single request.

We also said that documents are immutable: they cannot be changed, only replaced. The update API must obey the same rules. Externally, it appears as though we are partially updating a document in place. Internally, however, the update API simply manages the same retrieve-change-reindex process that we have already described. The difference is that this process happens within a shard, thus avoiding the network overhead of multiple requests. By reducing the time between the retrieve and reindex steps, we also reduce the likelihood of there being conflicting changes from other processes.

Here's the plan here:

Each of our documents has this:

<add><doc boost="0.5"><field name="id">Restaurant 4000000690324</field>

As you can see, there's a unique id for that document.

Proxy save document

Now that the proxy is in the way, it will save the document to s3 under Restaurant/4000000690324 and also send the request to SOLR for the actual update.

Now, you have a document cache that you can grab from and update a single field. Since you send that document through the same pipeline, you will also have the new document on S3 and in SOLR.

Since there's no DB connection when reindexing the document, and there's no rebuilding of that XML, the update process is super fast.

Usage


usage: solr_proxy [<flags>]

Flags:
  --help           Show help (also see --help-long and --help-man).
  --listen-port=LISTEN-PORT
                   Which port should the proxy listen on
  --master=MASTER  Location to your master server
  --slaves=SLAVES  Comma separated list of servers that act as slaves
  --aws-region="us-west-2"
                   Which AWS region should it use for the cache
  --aws-endpoint="https://s3-us-west-2.amazonaws.com"
                   AWS Endpoint for s3
  --bucket-name=BUCKET-NAME
                   What's the bucket name you want to save the documents in
  --log-location="stdout"
                   Where do you want to keep logs
  --bucket-prefix=BUCKET-PREFIX
                   Prefix after the bucket name before the filename

View configuration when running

When the proxy is running it has a web-accesible configuration JSON.
You can access it by going to /proxy/configuration.

It looks like this:

Proxy Configuration

Deployment (Docker)

Checkout docker for more details.

CHANGELOG

View Here

Status

This is under active development at Trip.com.

We are currently testing it on staging and a small percentage of production traffic.
Once this is running all production traffic, I will release 1.0 with some benchmarks and data.

Contibutors

@kensodev Github
@kenegozi Github Even though Ken is not in the commit logs (yet) he contributed a lot. We paired and he contributed a lot of feedback and insights.

Documentation ¶

Index ¶

type AWSConfig
type Add
- func ParseXMLDocument(content []byte) (*Add, error)
- func (d *Add) GetNameAndId() (string, string)
- func (d *Add) GetSolrDocument(CoreName string) *SolrDocument
type Configuration
- func NewConfigurationRender(proxyConfig *ProxyConfig) (config *Configuration)
- func (c *Configuration) ServeHTTP(w http.ResponseWriter, req *http.Request)
type DocField
type Document
type Proxy
- func NewProxy(proxyConfig *ProxyConfig) *Proxy
- func (p *Proxy) ServeHTTP(w http.ResponseWriter, req *http.Request)
type ProxyConfig
type Reader
- func NewReader(slaves []string) (r *Reader)
- func (reader *Reader) ServeHTTP(w http.ResponseWriter, req *http.Request)
type RequestReader
- func (m RequestReader) Close() error
type SolrDocument
- func (d *SolrDocument) Cache(awsConfig *AWSConfig) error
- func (d *SolrDocument) GetDocumentName(awsConfig *AWSConfig) string
type Updater
- func NewUpdater(master string) (updater *Updater)
- func (updater *Updater) ServeHTTP(w http.ResponseWriter, req *http.Request, awsConfig *AWSConfig, ...)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AWSConfig ¶

type AWSConfig struct {
	RegionName    string
	S3Endpoint    string
	BucketName    string
	BucketPrefix  string
	DisableUpload bool
}

type Add ¶

type Add struct {
	Doc Document `xml:"doc"`
	// contains filtered or unexported fields
}

func ParseXMLDocument ¶

func ParseXMLDocument(content []byte) (*Add, error)

func (*Add) GetNameAndId ¶

func (d *Add) GetNameAndId() (string, string)

func (*Add) GetSolrDocument ¶

func (d *Add) GetSolrDocument(CoreName string) *SolrDocument

type Configuration ¶ added in v0.5.0

type Configuration struct {
	// contains filtered or unexported fields
}

func NewConfigurationRender ¶ added in v0.5.0

func NewConfigurationRender(proxyConfig *ProxyConfig) (config *Configuration)

func (*Configuration) ServeHTTP ¶ added in v0.5.0

func (c *Configuration) ServeHTTP(w http.ResponseWriter, req *http.Request)

type DocField ¶

type DocField struct {
	Name  string `xml:"name,attr"`
	Value string `xml:",chardata"`
}

type Document ¶

type Document struct {
	Field []DocField `xml:"field"`
}

type Proxy ¶

type Proxy struct {
	// contains filtered or unexported fields
}

func NewProxy ¶

func NewProxy(proxyConfig *ProxyConfig) *Proxy

func (*Proxy) ServeHTTP ¶

func (p *Proxy) ServeHTTP(w http.ResponseWriter, req *http.Request)

type ProxyConfig ¶

type ProxyConfig struct {
	Master      string
	Slaves      []string
	AwsConfig   *AWSConfig
	LogLocation string
}

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

func NewReader ¶

func NewReader(slaves []string) (r *Reader)

func (*Reader) ServeHTTP ¶

func (reader *Reader) ServeHTTP(w http.ResponseWriter, req *http.Request)

type RequestReader ¶

type RequestReader struct {
	*bytes.Buffer
}

func (RequestReader) Close ¶

func (m RequestReader) Close() error

type SolrDocument ¶

type SolrDocument struct {
	Core string
	Id   string
	Name string
	// contains filtered or unexported fields
}

func (*SolrDocument) Cache ¶

func (d *SolrDocument) Cache(awsConfig *AWSConfig) error

func (*SolrDocument) GetDocumentName ¶ added in v0.2.0

func (d *SolrDocument) GetDocumentName(awsConfig *AWSConfig) string

type Updater ¶

type Updater struct {
	// contains filtered or unexported fields
}

func NewUpdater ¶

func NewUpdater(master string) (updater *Updater)

func (*Updater) ServeHTTP ¶

func (updater *Updater) ServeHTTP(w http.ResponseWriter, req *http.Request, awsConfig *AWSConfig, CoreName string)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
proxy

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL