solrdump

package module

v0.1.11 Latest Latest Go to latest Published: Mar 16, 2024 License: GPL-3.0 Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ubleipzig/solrdump

Links

Open Source Insights

README ¶

README

Export documents from a SOLR index as JSON, fast and simply from the command line.

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

Requesting large number of documents from SOLR can lead to Deep Paging problems:

When you wish to fetch a very large number of sorted results from Solr to feed into an external system, using very large values for the start or rows parameters can be very inefficient.

See also: Fetching A Large Number of Sorted Results: Cursors

As an alternative to increasing the "start" parameter to request subsequent pages of sorted results, Solr supports using a "Cursor" to scan through results. Cursors in Solr are a logical concept, that doesn't involve caching any state information on the server. Instead the sort values of the last document returned to the client are used to compute a "mark" representing a logical point in the ordered space of sort values.

Requirements

SOLR 4.7 or higher, since the cursor mechanism was introduced with SOLR 4.7 (2014-02-25) — see also efficient deep paging with cursors.

This project has been developed for Project finc at Leipzig University Library.

Installation

Via debian or rpm package.

Or via go tool:

$ go install github.com/ubleipzig/solrdump/cmd/solrdump@latest

Usage

$ solrdump -h
Usage of solrdump:
  -fl string
        field or fields to export, separate multiple values by comma
  -q string
        SOLR query (default "*:*")
  -rows int
        number of rows returned per request (default 1000)
  -server string
        SOLR server, host post and collection (default "http://localhost:8983/solr/example")
  -sort string
        sort order (only unique fields allowed) (default "id asc")
  -verbose
        show progress
  -version
        show version and exit

Export id and title field for all documents:

$ solrdump -server https://localhost:8983/solr/biblio -q '*:*' -fl id,title
{"id":"0000001864","title":"Veröffentlichungen des Museums für Völkerkunde zu Leipzig"}
{"id":"0000002001","title":"Festschrift zur Feier des 500jährigen Bestehens der ... /"}
...

Export documents matching a query and postprocess with jq:

$ solrdump -server https://localhost:8983/solr/biblio -q 'title:"topic model"' -fl id,title | \
  jq -r .title | \
  head -10

A generic approach to topic models and its application to virtual communities /
Topic models for image retrieval on large scale databases
On the use of language models and topic models in the web new algorithms for filtering, ...
Integration von Topic Models und Netzwerkanalyse bei der Bestimmung des Kundenwertes
Time dynamic topic models /
...

Instant search as one-liner

Using solrdump + jq + fzf (or peco).

$ solrdump -server http://solr.io/solr/biblio -q 'title:"leipzig"' -fl 'id,source_id,title' | \
    jq -rc '[.source_id, .title[:80]] | @tsv' | fzf -e

...

Documentation ¶

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func PrependSchema ¶

func PrependSchema(s string) string

PrependSchema http, if missing.

Types ¶

type Dumper ¶

type Dumper struct {
	Writer                      io.Writer
	Server                      string
	Fields                      string
	Sort                        string
	Query                       string
	NumRows                     int
	Wt                          string
	SkipCertificateVerification bool
	Verbose                     bool
}

Dumper can run a data extraction from solr.

func (*Dumper) Run ¶

func (d *Dumper) Run() error

type Response ¶

type Response struct {
	Header struct {
		Status int `json:"status"`
		QTime  int `json:"QTime"`
		Params struct {
			Query      string `json:"q"`
			CursorMark string `json:"cursorMark"`
			Sort       string `json:"sort"`
			Rows       string `json:"rows"`
		} `json:"params"`
	} `json:"header"`
	Response struct {
		NumFound int               `json:"numFound"`
		Start    int               `json:"start"`
		Docs     []json.RawMessage `json:"docs"` // dependent on SOLR schema
	} `json:"response"`
	NextCursorMark string `json:"nextCursorMark"`
}

Response is a SOLR response.

Source Files ¶

View all Source files

dump.go

Directories ¶

Path	Synopsis
cmd
solrdump https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results	https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL