search

package module

v0.2.0 Latest Latest Go to latest Published: Nov 8, 2022 License: Apache-2.0 Imports: 44 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

gitlab.com/peerdb/search

Links

Open Source Insights

README ¶

PeerDB Search

PeerDB Search is opinionated open source search system incorporating best practices in search and user interfaces/experience to provide an intuitive, fast, and easy to use search over semantic, structured, and full-text data. Its user interface automatically adapts to data and search results and provides relevant filters. The goal of the user interface is to allow users without technical knowledge to easily find results they want, without having to write queries.

Demos:

https://wikipedia.peerdb.org/: a search service for English Wikipedia articles, Wikimedia Commons files, and Wikidata data.
https://moma.peerdb.org/: a search service for The Museum of Modern Art (MoMA) artists and artworks.
https://omni.peerdb.org/: using an ElasticSearch alias to combine other demos into one search service

Installation

You can run PeerDB Search behind a reverse proxy (which should support HTTP2), or simply run it directly (it is safe to do so). PeerDB Search is compiled into one backend binary which has frontend files embedded and they are served by the backend as well.

The releases page contains a list of stable versions. Each includes:

A statically compiled binary.
Docker images.
A nix package.

Static binary

The latest stable statically compiled binary for Linux (amd64) is available at:

https://gitlab.com/peerdb/search/-/releases/permalink/latest/downloads/linux-amd64/peerdb-search

You can also download older versions on the releases page.

The latest successfully built development (main branch) binary is available at:

https://gitlab.com/peerdb/search/-/jobs/artifacts/main/raw/peerdb-search-linux-amd64?job=docker

Docker

Docker images for stable versions are available as:

registry.gitlab.com/peerdb/search/tag/<version>:latest

<version> is a version string with . replaced with -. E.g., v0.1.0 becomes v0-1-0.

The docker image contains only PeerDB Search binary, which is image's entrypoint. If you need a shell as well, then use the debug version of the image:

registry.gitlab.com/peerdb/search/tag/<version>:latest-debug

In that case you have to override the entrypoint (i.e., --entrypoint sh argument to docker run).

The latest successfully built development (main branch) image is available as:

registry.gitlab.com/peerdb/search/branch/main:latest

generated in the current directory as described above:

Nix

You can build a binary yourself using Nix. For the latest stable version, run:

nix-build -E "with import <nixpkgs> { }; callPackage (import (fetchTarball https://gitlab.com/peerdb/search/-/releases/permalink/latest/downloads/nix/nix.tgz)) { }"

The built binary is available at ./result/bin/search.

If you download a nix.tgz file for an older version, you can build the binary with:

nix-build -E "with import <nixpkgs> { }; callPackage (import (fetchTarball file://$(pwd)/nix.tgz)) { }"

To build the latest development (main branch) binary, run:

nix-build -E "with import <nixpkgs> { }; callPackage (import (fetchTarball https://gitlab.com/peerdb/search/-/jobs/artifacts/main/raw/nix.tgz?job=nix)) { }"

Usage

PeerDB Search requires an ElasticSearch instance. To run one locally you can use Docker:

docker network create peerdb
docker run -d --network peerdb --name elasticsearch -p 127.0.0.1:9200:9200 \
 -e network.bind_host=0.0.0.0 -e network.publish_host=elasticsearch -e ES_JAVA_OPTS="-Xmx1000m" \
 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "ingest.geoip.downloader.enabled=false" \
 elasticsearch:7.16.3

Feel free to change any of the above parameters (e.g., remove ES_JAVA_OPTS if you have enough memory). The parameters above are primarily meant for development on a local machine.

ElasticSearch instance needs to have an index with documents in PeerDB Search schema and configured with PeerDB Search mapping. If you already have such an index, proceed to run PeerDB Search, otherwise first populate ElasticSearch with data.

Next, to run PeerDB Search you need a HTTPS TLS certificate (as required by HTTP2). When running locally you can use mkcert, a tool to create a local CA keypair which is then used to create a TLS certificate. Use Go 1.19 or newer.

go install filippo.io/mkcert@latest
mkcert -install
mkcert localhost 127.0.0.1 ::1

This creates two files, localhost+2.pem and localhost+2-key.pem, which you can provide to PeerDB Search as:

./search -k localhost+2.pem -K localhost+2-key.pem

When running using Docker, you have to provide them to the container through a volume, e.g.:

docker run -d --network peerdb --name peerdb-search -p 8080:8080 -v "$(pwd):/data" \
 registry.gitlab.com/peerdb/search/branch/main:latest -e http://elasticsearch:9200 \
 -k /data/localhost+2.pem -K /data/localhost+2-key.pem

Open https://localhost:8080/ in your browser to access the web interface.

Temporary accepted self-signed certificates are not recommended because not all browser features work. If you want to use a self-signed certificate instead of mkcert, add the certificate to your browser's certificate store.

When running it directly publicly on the Internet (it is safe to do so), PeerDB Search is able to obtain a HTTPS TLS certificate from Let's Encrypt automatically:

docker run -d --network peerdb --name peerdb-search -p 443:8080 -v "$(pwd):/data" \
 registry.gitlab.com/peerdb/search/branch/main:latest -e http://elasticsearch:9200 \
 -D public.domain.example.com -E name@example.com -C /data/letsencrypt

PeerDB Search would then be available at https://public.domain.example.com.

When using Let's Encrypt you accept its Terms of Service.

Populating with data

Power of PeerDB Search comes from having data in ElasticSearch index organized into documents in PeerDB Search schema. The schema is designed to allow describing almost any data. Moreover, data and properties to describe data can be changed at runtime without having to reconfigure PeerDB Search, e.g., it adapts filters automatically. The schema also allows multiple data sources to be used and merged together.

PeerDB Search schema of documents is fully described in JSON Schema and is available here. But at a high-level look like:

{
  "_id": "22 character ID",
  "name": {
    "en": "name in English"
  },
  "score": 1.0,
  "active": {
    "id": [
      {
        "_id": "22 character ID",
        "confidence": 1.0,
        "prop": {
          "_id": "22 character property ID",
          "name": {
            "en": "property name in English"
          },
          "score": 1.0
        },
        "id": "external ID value"
      }
    ],
    "ref": [...],
    "text": [...],
    "amount": [...],
    "rel": [...],
    "file": [...],
    "time": [...]
  }
}

Besides core metadata (_id, name, and score) all other data is organized in claims (seen under active claims above) which are then organized based on claim (data) type. For example, there are id claims which are used to store external ID values. prop is a reference to a property document which describes the ID value.

Which properties you use and how you use them to map your data to PeerDB Search documents is left to you. We do suggest that you first populate the index using core PeerDB Search properties. You can do that by running:

./search populate

This also creates an ElasticSearch index if it does not yet exist and configures it with PeerDB Search mapping. Otherwise you have to create such index yourself.

Then you populate the index with documents for properties for you data. For example, if you have a blog post like:

{
  "id": 123,
  "title": "Some title",
  "body": "Some <b>blog post</b> body HTML",
  "author": {
    "username": "foobar",
    "displayName": "Foo Bar"
  }
}

To convert the blog post:

You could create two documents, one for title property and another for body property. But you could also decide to map title to core name metadata, and body to existing DESCRIPTION (for shorter HTML contents shown in search results as well) or ARTICLE (for longer HTML contents) core property (and its label HAS_ARTICLE).
Maybe you also want to record the original blog post ID and author username, so create property documents for them as well.
Author displayName name can be mapped to name core metadata.
Another property document is needed for the author property.
Documents should also have additional claims to describe relations between them. Properties should be marked as properties and which claim type they are meant to be used for. It is useful to create properties for user documents and blog post documents, which can in turn be more general (core) items.

Assuming that the author does not yet have its document, you could convert the above blog post into the following two PeerDB Search documents:

{
  "_id": "LcrxeiU9XjxosmX8kiPCx6",
  "name": {
    "en": "Foo Bar"
  },
  "score": 1.0,
  "active": {
    "id": [
      {
        "_id": "9TfczNe5aa4LrKqWQWfnFF",
        "confidence": 1.0,
        "prop": {
          "_id": "Hx5zknvxsmPRiLFbGMPeiZ",
          "name": {
            "en": "author username"
          },
          "score": 1.0
        },
        "id": "foobar"
      }
    ],
    "rel": [
      {
        "_id": "sgpzxwxPyn51j5VfH992ZQ",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "6asppjBfRGSTt5Df7Zvomb",
          "name": {
            "en": "user"
          },
          "score": 1.0
        }
      }
    ]
  }
}

{
  "_id": "MpGZyd7grTBPYhMhETAuHV",
  "name": {
    "en": "Some title"
  },
  "score": 1.0,
  "active": {
    "id": [
      {
        "_id": "Ci3A1tLF6MHZ4y5zBibvGg",
        "confidence": 1.0,
        "prop": {
          "_id": "8mu7vrUK7zJ4Me2JwYUG6t",
          "name": {
            "en": "blog post ID"
          },
          "score": 1.0
        },
        "id": "123"
      }
    ],
    "text": [
      {
        "_id": "VdX1HZm1ETw8K77nLTV6yt",
        "confidence": 1.0,
        "prop": {
          "_id": "FJJLydayUgDuqFsRK2ZtbR",
          "name": {
            "en": "article"
          },
          "score": 1.0
        },
        "html": {
          "en": "Some <b>blog post</b> body HTML"
        }
      }
    ],
    "rel": [
      {
        "_id": "xbufQEChDXvtg3hh4i1PvT",
        "confidence": 1.0,
        "prop": {
          "_id": "fmUeT7JN8qPuFw28Vdredm",
          "name": {
            "en": "author"
          },
          "score": 1.0
        },
        "to": {
          "_id": "LcrxeiU9XjxosmX8kiPCx6",
          "name": {
            "en": "Foo Bar"
          },
          "score": 1.0
        }
      },
      {
        "_id": "LJNg7QaiMxE1crjMiijpaN",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "3APZXnK3uofpdEJV55Po18",
          "name": {
            "en": "blog post"
          },
          "score": 1.0
        }
      },
      {
        "_id": "UxhYEJY6mpA147eujZ489B",
        "confidence": 1.0,
        "prop": {
          "_id": "5SoFeEFk5aWXUYFC1EZFec",
          "name": {
            "en": "label"
          },
          "score": 1.0
        },
        "to": {
          "_id": "MQYs7JmAR3tge25eTPS8XT",
          "name": {
            "en": "has article"
          },
          "score": 1.0
        }
      }
    ]
  }
}

Some additional documents need to exist in the index as well. Click to expand to see them.

{
  "_id": "Hx5zknvxsmPRiLFbGMPeiZ",
  "name": {
    "en": "author username"
  },
  "score": 1.0,
  "active": {
    "rel": [
      {
        "_id": "5zZZ6nJFKuA5oNBu9QbsYY",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HohteEmv2o7gPRnJ5wukVe",
          "name": {
            "en": "property"
          },
          "score": 1.0
        }
      },
      {
        "_id": "jjcWxq9VoVLhKqV2tnqz1A",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "UJEVrqCGa9f3vAWi2mNWc7",
          "name": {
            "en": "\"identifier\" claim type"
          },
          "score": 1.0
        }
      }
    ]
  }
}

{
  "_id": "8mu7vrUK7zJ4Me2JwYUG6t",
  "name": {
    "en": "blog post ID"
  },
  "score": 1.0,
  "active": {
    "rel": [
      {
        "_id": "DWYDFZ2DbasS4Tyehnko2U",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HohteEmv2o7gPRnJ5wukVe",
          "name": {
            "en": "property"
          },
          "score": 1.0
        }
      },
      {
        "_id": "ivhonZLA2ktDwyMawBLuKV",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "UJEVrqCGa9f3vAWi2mNWc7",
          "name": {
            "en": "\"identifier\" claim type"
          },
          "score": 1.0
        }
      }
    ]
  }
}

{
  "_id": "fmUeT7JN8qPuFw28Vdredm",
  "name": {
    "en": "author"
  },
  "score": 1.0,
  "active": {
    "rel": [
      {
        "_id": "gK8nXxJ3AXErTmGPoAVF78",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HohteEmv2o7gPRnJ5wukVe",
          "name": {
            "en": "property"
          },
          "score": 1.0
        }
      },
      {
        "_id": "pDBnb32Vd2VPd2UjPnd1eS",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HZ41G8fECg4CjfZN4dmYwf",
          "name": {
            "en": "\"relation\" claim type"
          },
          "score": 1.0
        }
      }
    ]
  }
}

{
  "_id": "6asppjBfRGSTt5Df7Zvomb",
  "name": {
    "en": "user"
  },
  "score": 1.0,
  "active": {
    "rel": [
      {
        "_id": "79m7fNMHy7SRinSmB3WARM",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HohteEmv2o7gPRnJ5wukVe",
          "name": {
            "en": "property"
          },
          "score": 1.0
        }
      },
      {
        "_id": "EuGC7ZRK7weuHNoZLDC8ah",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "6HpkLLj1iSK3XBhgHpc6n3",
          "name": {
            "en": "item"
          },
          "score": 1.0
        }
      }
    ]
  }
}

{
  "_id": "3APZXnK3uofpdEJV55Po18",
  "name": {
    "en": "blog post"
  },
  "score": 1.0,
  "active": {
    "rel": [
      {
        "_id": "c24VwrPEMwZUhRgECzSn1b",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "HohteEmv2o7gPRnJ5wukVe",
          "name": {
            "en": "property"
          },
          "score": 1.0
        }
      },
      {
        "_id": "faByPL18Y1ZH2NAhea4FBy",
        "confidence": 1.0,
        "prop": {
          "_id": "2fjzZyP7rv8E4aHnBc6KAa",
          "name": {
            "en": "is"
          },
          "score": 1.0
        },
        "to": {
          "_id": "6HpkLLj1iSK3XBhgHpc6n3",
          "name": {
            "en": "item"
          },
          "score": 1.0
        }
      }
    ]
  }
}

MoMA search

To populate search with The Museum of Modern Art (MoMA) artists and artworks (from this dataset), clone the repository and run (you need Go 1.19 or newer):

make moma
./moma

Runtime is few minutes. If you also want to add articles (to have more full-text data) and (more) images from MoMA's website, run instead:

./moma --website-data

Fetching data from the website takes time, so runtime is around 12 hours.

Wikipedia search

To populate search with English Wikipedia articles, Wikimedia Commons files, and Wikidata data, clone the repository and run (you need Go 1.19 or newer):

make wikipedia
./wikipedia

This will do multiple passes:

wikidata downloads Wikidata dump and imports data into search (70 GB download, runtime 2 days).
commons-files populates search with Wikimedia Commons files from images table SQL dump (10 GB download, runtime 1 day).
wikipedia-files populates search with Wikipedia files from table SQL dump (100 MB download, runtime 10 minutes).
commons (20 GB download, runtime 3 days)
wikipedia-articles downloads Wikipedia articles HTML dump and imports articles (100 GB download, runtime 0.5 days)
wikipedia-file-descriptions downloads Wikipedia files HTML dump and imports file descriptions (2 GB download, runtime 1 hour)
wikipedia-categories downloads Wikipedia categories HTML dump and imports their articles as descriptions (2 GB download, runtime 1 hour)
wikipedia-templates uses API to fetch data about templates Wikipedia (runtime 0.5 days)
commons-file-descriptions uses API to fetch descriptions of Wikimedia Commons files (runtime 35 days)
commons-categories uses API to fetch data about categories Wikimedia Commons (runtime 4 days)
commons-templates uses API to fetch data about templates Wikimedia Commons (runtime 2.5 hours)
prepare goes over imported documents and process them for PeerDB Search (runtime 6 days).
optimize forces merging of ElasticSearch segments (runtime few hours).

The whole process requires substantial amount of disk space (at least 1.5 TB), bandwidth, and time (weeks). Because of this you might want to run only a subset of passes.

To populate only with Wikidata (all references to Wikimedia Commons files will not be available):

./wikipedia wikidata
./wikipedia prepare

To populate with Wikidata and with basic metadata of Wikimedia Commons files:

./wikipedia wikidata
./wikipedia commons-files
./wikipedia prepare

Or maybe you also want English Wikipedia articles:

./wikipedia wikidata
./wikipedia commons-files
./wikipedia wikipedia-articles
./wikipedia prepare

Configuration

PeerDB Search can be configured through CLI arguments and a config file. CLI arguments have precedence over the config file. Config file is a YAML file with the structure corresponding to the structure of CLI flags and commands. Run ./search --help for list of available flags and commands. If no command is specified, serve command is the default.

Each PeerDB Search instance can serve multiple sites and Let's Encrypt can be used to obtain HTTPS TLS certificates for them automatically. Example config file for all demos is available in demos.yml. It configures sites, their titles, and ElasticSearch indices for each site. To use the config file with Docker, you could do:

docker run -d --network peerdb --name peerdb-search -p 443:8080 -v "$(pwd):/data" \
 registry.gitlab.com/peerdb/search/branch/main:latest -e http://elasticsearch:9200 \
 -E name@example.com -C /data/letsencrypt -c /data/demos.yml

Size of documents filter

PeerDB Search can filter on size of documents, but it requires installed mapper-size ElasticSearch plugin and enabled size field in the index. If you use populate command to create the index, you can enable the size field with the --size-field argument:

./search --size-field populate

Alternatively, you can set sizeField in site configuration.

If you use Docker to run ElasticSearch, you can use create a custom Docker image with the plugin installed, or run on your running container:

docker exec -t -i elasticsearch bin/elasticsearch-plugin install mapper-size
docker restart elasticsearch

Use with ElasticSearch alias

If you use an ElasticSearch alias instead of an index, PeerDB Search will provide a filter to filter documents based on which index they come from.

Development

During PeerDB Search development run backend and frontend as separate processes. During development the backend proxies frontend requests to Vite, which in turn compiles frontend files and serves them, hot-reloading the frontend as necessary.

Backend

The backend is implemented in Go (requires 1.19 or newer) and provides a HTTP2 API. Node 16 or newer is required as well.

Then clone the repository and run:

make search
./search -d -k localhost+2.pem -K localhost+2-key.pem

localhost+2.pem and localhost+2-key.pem are files of a TLS certificate generated as described in the Usage section. Backend listens at https://localhost:8080/.

-d CLI argument makes the backend proxy unknown requests (non-API requests) to the frontend. In this mode any placeholders in HTML files are not rendered.

You can also run make watch to reload the backend on file changes. You have to install CompileDaemon first:

go install github.com/githubnemo/CompileDaemon@latest

Frontend

The frontend is implemented in TypeScript and Vue and during development we use Vite. Vite compiles frontend files and serves them. It also watches for changes in frontend files, recompiles them, and hot-reloads the frontend as necessary. Node 16 or newer is required.

To install all dependencies and run frontend for development:

npm install
npm run serve

Open https://localhost:8080/ in your browser, which will connect you to the backend which then proxies unknown requests (non-API requests) to the frontend.

GitHub mirror

There is also a read-only GitHub mirror available, if you need to fork the project there.

Funding

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.

Documentation ¶

Index ¶

Constants
Variables
func EnsureIndex(ctx context.Context, httpClient *http.Client, logger zerolog.Logger, ...) (*elastic.Client, errors.E)
func GenerateCoreProperties(properties []struct{ ... })
func GetClient(httpClient *http.Client, logger zerolog.Logger, url string) (*elastic.Client, errors.E)
func InsertOrReplaceDocument(processor *elastic.BulkProcessor, index string, doc *Document)
func SaveCoreProperties(ctx context.Context, log zerolog.Logger, esClient *elastic.Client, ...) errors.E
func UpdateDocument(processor *elastic.BulkProcessor, index string, seqNo, primaryTerm int64, ...)
func ValidAmountUnit(unit string) bool
type AmountClaim
type AmountClaims
type AmountRangeClaim
type AmountRangeClaims
type AmountUnit
- func (u AmountUnit) MarshalJSON() ([]byte, error)
- func (u *AmountUnit) UnmarshalJSON(b []byte) error
type Claim
type ClaimTypes
- func (c *ClaimTypes) Size() int
- func (c *ClaimTypes) Visit(visitor visitor) errors.E
type Confidence
type CoreClaim
- func (cc *CoreClaim) AddMeta(claim Claim) errors.E
- func (cc CoreClaim) GetConfidence() Confidence
- func (cc CoreClaim) GetID() Identifier
- func (cc *CoreClaim) GetMeta(propID Identifier) []Claim
- func (cc *CoreClaim) GetMetaByID(id Identifier) Claim
- func (cc *CoreClaim) RemoveMetaByID(id Identifier) Claim
- func (cc *CoreClaim) VisitMeta(visitor visitor) errors.E
type CoreDocument
type Document
- func (d *Document) Add(claim Claim) errors.E
- func (d *Document) AllClaims() []Claim
- func (d *Document) Get(propID Identifier) []Claim
- func (d *Document) GetByID(id Identifier) Claim
- func (d Document) Reference() DocumentReference
- func (d *Document) Remove(propID Identifier) []Claim
- func (d *Document) RemoveByID(id Identifier) Claim
- func (d *Document) Visit(visitor visitor) errors.E
type DocumentReference
- func GetCorePropertyReference(mnemonic string) DocumentReference
type FileClaim
type FileClaims
type Handler
type Identifier
- func GetCorePropertyID(mnemonic string) Identifier
- func GetID(namespace uuid.UUID, args ...interface{}) Identifier
type IdentifierClaim
type IdentifierClaims
type Mnemonic
type Name
type NoValueClaim
type NoValueClaims
type Params
type ReferenceClaim
type ReferenceClaims
type RelationClaim
type RelationClaims
type Router
- func NewRouter() *Router
- func (r *Router) APIPath(name string, params Params, query string) (string, errors.E)
- func (r *Router) Error(w http.ResponseWriter, req *http.Request, code int)
- func (r *Router) Handle(name, method, path string, api bool, handler Handler) errors.E
- func (r *Router) Path(name string, params Params, query string) (string, errors.E)
- func (r *Router) ServeHTTP(w http.ResponseWriter, req *http.Request)
type Score
type Scores
type Service
- func NewService(esClient *elastic.Client, log zerolog.Logger, ...) (*Service, errors.E)
- func (s *Service) BadRequest(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) ConnContext(ctx context.Context, c net.Conn) context.Context
- func (s *Service) DocumentGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentGetAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearch(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) DocumentSearchAPIGet(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) DocumentSearchAPIPost(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) DocumentSearchAmountFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchFiltersAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchIndexFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchRelFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchSizeFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchStringFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) DocumentSearchTimeFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)
- func (s *Service) HomeGet(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) HomeGetAPIGet(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) ImmutableFile(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) InternalServerError(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) MethodNotAllowed(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) NotAcceptable(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) NotFound(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) Proxy(w http.ResponseWriter, req *http.Request, _ Params)
- func (s *Service) RouteWith(router *Router, version string) (http.Handler, errors.E)
- func (s *Service) StaticFile(w http.ResponseWriter, req *http.Request, _ Params)
type Site
type StringClaim
type StringClaims
type TextClaim
type TextClaims
type TimeClaim
type TimeClaims
type TimePrecision
- func (p TimePrecision) MarshalJSON() ([]byte, error)
- func (p *TimePrecision) UnmarshalJSON(b []byte) error
type TimeRangeClaim
type TimeRangeClaims
type Timestamp
- func (t Timestamp) MarshalJSON() ([]byte, error)
- func (t Timestamp) String() string
- func (t *Timestamp) UnmarshalJSON(data []byte) error
type TranslatableHTMLString
type TranslatablePlainString
type UnknownValueClaim
type UnknownValueClaims
type VisitResult

Constants ¶

View Source

const (
	ActiveClaimThreshold = 0.5
)

Variables ¶

View Source

var (

	// CoreProperties is a map from a core property ID to a document describing it.
	CoreProperties = map[string]Document{}
)

Functions ¶

func EnsureIndex ¶

func EnsureIndex(ctx context.Context, httpClient *http.Client, logger zerolog.Logger, url, index string, sizeField bool) (*elastic.Client, errors.E)

EnsureIndex creates an instance of the ElasticSearch client and makes sure the index for PeerDB documents exists. If not, it creates it. It does not update configuration of an existing index if it is different from what current implementation of EnsureIndex would otherwise create.

func GenerateCoreProperties ¶ added in v0.2.0

func GenerateCoreProperties(properties []struct {
	Name            string
	DescriptionHTML string
	Is              []string
},
)

func GetClient ¶

func GetClient(httpClient *http.Client, logger zerolog.Logger, url string) (*elastic.Client, errors.E)

func InsertOrReplaceDocument ¶ added in v0.2.0

func InsertOrReplaceDocument(processor *elastic.BulkProcessor, index string, doc *Document)

insertOrReplaceDocument inserts or replaces the document based on its ID.

func SaveCoreProperties ¶ added in v0.2.0

func SaveCoreProperties(ctx context.Context, log zerolog.Logger, esClient *elastic.Client, processor *elastic.BulkProcessor, index string) errors.E

func UpdateDocument ¶ added in v0.2.0

func UpdateDocument(processor *elastic.BulkProcessor, index string, seqNo, primaryTerm int64, doc *Document)

updateDocument updates the document in the index, if it has not changed in the database since it was fetched (based on seqNo and primaryTerm).

func ValidAmountUnit ¶

func ValidAmountUnit(unit string) bool

Types ¶

type AmountClaim ¶

type AmountClaim struct {
	CoreClaim

	Prop   DocumentReference `json:"prop"`
	Amount float64           `json:"amount"`
	Unit   AmountUnit        `json:"unit"`
}

type AmountClaims ¶

type AmountClaims = []AmountClaim

type AmountRangeClaim ¶

type AmountRangeClaim struct {
	CoreClaim

	Prop  DocumentReference `json:"prop"`
	Lower float64           `json:"lower"`
	Upper float64           `json:"upper"`
	Unit  AmountUnit        `json:"unit"`
}

type AmountRangeClaims ¶

type AmountRangeClaims = []AmountRangeClaim

type AmountUnit ¶

type AmountUnit int

const (
	AmountUnitCustom AmountUnit = iota
	AmountUnitNone
	AmountUnitRatio
	AmountUnitKilogramPerKilogram
	AmountUnitKilogram
	AmountUnitKilogramPerCubicMetre
	AmountUnitMetre
	AmountUnitSquareMetre
	AmountUnitMetrePerSecond
	AmountUnitVolt
	AmountUnitWatt
	AmountUnitPascal
	AmountUnitCoulomb
	AmountUnitJoule
	AmountUnitCelsius
	AmountUnitRadian
	AmountUnitHertz
	AmountUnitDollar
	AmountUnitByte
	AmountUnitPixel
	AmountUnitSecond
)

func (AmountUnit) MarshalJSON ¶

func (u AmountUnit) MarshalJSON() ([]byte, error)

func (*AmountUnit) UnmarshalJSON ¶

func (u *AmountUnit) UnmarshalJSON(b []byte) error

type Claim ¶

type Claim interface {
	GetID() Identifier
	GetConfidence() Confidence
	AddMeta(claim Claim) errors.E
	GetMetaByID(id Identifier) Claim
	RemoveMetaByID(id Identifier) Claim
	VisitMeta(visitor visitor) errors.E
}

type ClaimTypes ¶

type ClaimTypes struct {
	Identifier   IdentifierClaims   `json:"id,omitempty"`
	Reference    ReferenceClaims    `json:"ref,omitempty"`
	Text         TextClaims         `json:"text,omitempty"`
	String       StringClaims       `json:"string,omitempty"`
	Amount       AmountClaims       `json:"amount,omitempty"`
	AmountRange  AmountRangeClaims  `json:"amountRange,omitempty"`
	Relation     RelationClaims     `json:"rel,omitempty"`
	File         FileClaims         `json:"file,omitempty"`
	NoValue      NoValueClaims      `json:"none,omitempty"`
	UnknownValue UnknownValueClaims `json:"unknown,omitempty"`
	Time         TimeClaims         `json:"time,omitempty"`
	TimeRange    TimeRangeClaims    `json:"timeRange,omitempty"`
}

func (*ClaimTypes) Size ¶

func (c *ClaimTypes) Size() int

func (*ClaimTypes) Visit ¶

func (c *ClaimTypes) Visit(visitor visitor) errors.E

type Confidence ¶

type Confidence = Score

type CoreClaim ¶

type CoreClaim struct {
	ID         Identifier  `json:"_id"`
	Confidence Confidence  `json:"confidence"`
	Meta       *ClaimTypes `json:"meta,omitempty"`
}

func (*CoreClaim) AddMeta ¶

func (cc *CoreClaim) AddMeta(claim Claim) errors.E

func (CoreClaim) GetConfidence ¶

func (cc CoreClaim) GetConfidence() Confidence

func (CoreClaim) GetID ¶

func (cc CoreClaim) GetID() Identifier

func (*CoreClaim) GetMeta ¶

func (cc *CoreClaim) GetMeta(propID Identifier) []Claim

func (*CoreClaim) GetMetaByID ¶

func (cc *CoreClaim) GetMetaByID(id Identifier) Claim

func (*CoreClaim) RemoveMetaByID ¶

func (cc *CoreClaim) RemoveMetaByID(id Identifier) Claim

func (*CoreClaim) VisitMeta ¶

func (cc *CoreClaim) VisitMeta(visitor visitor) errors.E

type CoreDocument ¶

type CoreDocument struct {
	ID     Identifier `json:"-"`
	Name   Name       `json:"name"`
	Score  Score      `json:"score"`
	Scores Scores     `json:"scores,omitempty"`
}

type Document ¶

type Document struct {
	CoreDocument

	Mnemonic Mnemonic    `json:"mnemonic,omitempty"`
	Active   *ClaimTypes `json:"active,omitempty"`
	Inactive *ClaimTypes `json:"inactive,omitempty"`
}

func (*Document) Add ¶

func (d *Document) Add(claim Claim) errors.E

func (*Document) AllClaims ¶

func (d *Document) AllClaims() []Claim

func (*Document) Get ¶

func (d *Document) Get(propID Identifier) []Claim

func (*Document) GetByID ¶

func (d *Document) GetByID(id Identifier) Claim

func (Document) Reference ¶

func (d Document) Reference() DocumentReference

func (*Document) Remove ¶

func (d *Document) Remove(propID Identifier) []Claim

func (*Document) RemoveByID ¶

func (d *Document) RemoveByID(id Identifier) Claim

func (*Document) Visit ¶

func (d *Document) Visit(visitor visitor) errors.E

type DocumentReference ¶

type DocumentReference struct {
	ID     Identifier `json:"_id"`
	Name   Name       `json:"name"`
	Score  Score      `json:"score"`
	Scores Scores     `json:"scores,omitempty"`
}

func GetCorePropertyReference ¶ added in v0.2.0

func GetCorePropertyReference(mnemonic string) DocumentReference

type FileClaim ¶

type FileClaim struct {
	CoreClaim

	Prop    DocumentReference `json:"prop"`
	Type    string            `json:"type"`
	URL     string            `json:"url"`
	Preview []string          `json:"preview,omitempty"`
}

type FileClaims ¶

type FileClaims = []FileClaim

type Handler ¶

type Handler func(http.ResponseWriter, *http.Request, Params)

type Identifier ¶

type Identifier string

func GetCorePropertyID ¶ added in v0.2.0

func GetCorePropertyID(mnemonic string) Identifier

func GetID ¶

func GetID(namespace uuid.UUID, args ...interface{}) Identifier

type IdentifierClaim ¶

type IdentifierClaim struct {
	CoreClaim

	Prop       DocumentReference `json:"prop"`
	Identifier string            `json:"id"`
}

type IdentifierClaims ¶

type IdentifierClaims = []IdentifierClaim

type Mnemonic ¶

type Mnemonic string

type Name ¶

type Name = TranslatablePlainString

type NoValueClaim ¶

type NoValueClaim struct {
	CoreClaim

	Prop DocumentReference `json:"prop"`
}

type NoValueClaims ¶

type NoValueClaims = []NoValueClaim

type Params ¶

type Params map[string]string

type ReferenceClaim ¶

type ReferenceClaim struct {
	CoreClaim

	Prop DocumentReference `json:"prop"`
	IRI  string            `json:"iri"`
}

type ReferenceClaims ¶

type ReferenceClaims = []ReferenceClaim

type RelationClaim ¶

type RelationClaim struct {
	CoreClaim

	Prop DocumentReference `json:"prop"`
	To   DocumentReference `json:"to"`
}

type RelationClaims ¶

type RelationClaims = []RelationClaim

type Router ¶

type Router struct {
	NotFound         Handler
	MethodNotAllowed Handler
	NotAcceptable    Handler
	Panic            func(w http.ResponseWriter, req *http.Request, err interface{})
	// contains filtered or unexported fields
}

func NewRouter ¶

func NewRouter() *Router

func (*Router) APIPath ¶ added in v0.2.0

func (r *Router) APIPath(name string, params Params, query string) (string, errors.E)

func (*Router) Error ¶

func (r *Router) Error(w http.ResponseWriter, req *http.Request, code int)

func (*Router) Handle ¶

func (r *Router) Handle(name, method, path string, api bool, handler Handler) errors.E

func (*Router) Path ¶

func (r *Router) Path(name string, params Params, query string) (string, errors.E)

func (*Router) ServeHTTP ¶

func (r *Router) ServeHTTP(w http.ResponseWriter, req *http.Request)

type Score ¶

type Score float64

type Scores ¶

type Scores map[string]Score

Score name to score mapping.

type Service ¶

type Service struct {
	ESClient       *elastic.Client
	Log            zerolog.Logger
	Sites          map[string]Site
	Development    string
	Version        string
	BuildTimestamp string
	Revision       string
	Router         *Router
	// contains filtered or unexported fields
}

func NewService ¶

func NewService(esClient *elastic.Client, log zerolog.Logger, version, buildTimestamp, revision string, sites map[string]Site, development string) (*Service, errors.E)

func (*Service) BadRequest ¶

func (s *Service) BadRequest(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) ConnContext ¶

func (s *Service) ConnContext(ctx context.Context, c net.Conn) context.Context

func (*Service) DocumentGet ¶ added in v0.2.0

func (s *Service) DocumentGet(w http.ResponseWriter, req *http.Request, params Params)

DocumentGet is a GET/HEAD HTTP request handler which returns HTML frontend for a document given its ID as a parameter.

func (*Service) DocumentGetAPIGet ¶ added in v0.2.0

func (s *Service) DocumentGetAPIGet(w http.ResponseWriter, req *http.Request, params Params)

DocumentGetAPIGet is a GET/HEAD HTTP request handler which returns a document given its ID as a parameter. It supports compression based on accepted content encoding and range requests.

func (*Service) DocumentSearch ¶ added in v0.2.0

func (s *Service) DocumentSearch(w http.ResponseWriter, req *http.Request, _ Params)

DocumentSearch is a GET/HEAD HTTP request handler which returns HTML frontend for searching documents. If search state is invalid, it redirects to a valid one.

func (*Service) DocumentSearchAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchAPIGet(w http.ResponseWriter, req *http.Request, _ Params)

DocumentSearchAPIGet is a GET/HEAD HTTP request handler and it searches ElasticSearch index using provided search state and returns to the client a JSON with an array of IDs of found documents. If search state is invalid, it returns correct query parameters as JSON. It supports compression based on accepted content encoding and range requests. It returns search metadata (e.g., total results) as PeerDB HTTP response headers.

func (*Service) DocumentSearchAPIPost ¶ added in v0.2.0

func (s *Service) DocumentSearchAPIPost(w http.ResponseWriter, req *http.Request, _ Params)

DocumentSearchAPIPost is a POST HTTP request handler which stores the search state and returns query parameters for the GET endpoint as JSON or redirects to the GET endpoint based on search ID.

func (*Service) DocumentSearchAmountFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchAmountFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchFiltersAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchFiltersAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchIndexFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchIndexFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchRelFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchRelFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchSizeFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchSizeFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchStringFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchStringFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) DocumentSearchTimeFilterAPIGet ¶ added in v0.2.0

func (s *Service) DocumentSearchTimeFilterAPIGet(w http.ResponseWriter, req *http.Request, params Params)

func (*Service) HomeGet ¶ added in v0.2.0

func (s *Service) HomeGet(w http.ResponseWriter, req *http.Request, _ Params)

HomeGet is a GET/HEAD HTTP request handler which returns HTML frontend for the home page.

func (*Service) HomeGetAPIGet ¶ added in v0.2.0

func (s *Service) HomeGetAPIGet(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) ImmutableFile ¶ added in v0.2.0

func (s *Service) ImmutableFile(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) InternalServerError ¶

func (s *Service) InternalServerError(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) MethodNotAllowed ¶

func (s *Service) MethodNotAllowed(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) NotAcceptable ¶

func (s *Service) NotAcceptable(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) NotFound ¶

func (s *Service) NotFound(w http.ResponseWriter, req *http.Request, _ Params)

NotFound is a HTTP request handler which returns a 404 error to the client.

func (*Service) Proxy ¶

func (s *Service) Proxy(w http.ResponseWriter, req *http.Request, _ Params)

func (*Service) RouteWith ¶

func (s *Service) RouteWith(router *Router, version string) (http.Handler, errors.E)

func (*Service) StaticFile ¶

func (s *Service) StaticFile(w http.ResponseWriter, req *http.Request, _ Params)

type Site ¶ added in v0.2.0

type Site struct {
	Domain string `json:"domain,omitempty"`
	Index  string `json:"index"`
	Title  string `json:"title"`
	// contains filtered or unexported fields
}

type StringClaim ¶

type StringClaim struct {
	CoreClaim

	Prop   DocumentReference `json:"prop"`
	String string            `json:"string"`
}

type StringClaims ¶

type StringClaims = []StringClaim

type TextClaim ¶

type TextClaim struct {
	CoreClaim

	Prop DocumentReference      `json:"prop"`
	HTML TranslatableHTMLString `json:"html"`
}

type TextClaims ¶

type TextClaims = []TextClaim

type TimeClaim ¶

type TimeClaim struct {
	CoreClaim

	Prop      DocumentReference `json:"prop"`
	Timestamp Timestamp         `json:"timestamp"`
	Precision TimePrecision     `json:"precision"`
}

type TimeClaims ¶

type TimeClaims = []TimeClaim

type TimePrecision ¶

type TimePrecision int

const (
	TimePrecisionGigaYears TimePrecision = iota
	TimePrecisionHundredMegaYears
	TimePrecisionTenMegaYears
	TimePrecisionMegaYears
	TimePrecisionHundredKiloYears
	TimePrecisionTenKiloYears
	TimePrecisionKiloYears
	TimePrecisionHundredYears
	TimePrecisionTenYears
	TimePrecisionYear
	TimePrecisionMonth
	TimePrecisionDay
	TimePrecisionHour
	TimePrecisionMinute
	TimePrecisionSecond
)

func (TimePrecision) MarshalJSON ¶

func (p TimePrecision) MarshalJSON() ([]byte, error)

func (*TimePrecision) UnmarshalJSON ¶

func (p *TimePrecision) UnmarshalJSON(b []byte) error

type TimeRangeClaim ¶

type TimeRangeClaim struct {
	CoreClaim

	Prop      DocumentReference `json:"prop"`
	Lower     Timestamp         `json:"lower"`
	Upper     Timestamp         `json:"upper"`
	Precision TimePrecision     `json:"precision"`
}

type TimeRangeClaims ¶

type TimeRangeClaims = []TimeRangeClaim

type Timestamp ¶

type Timestamp time.Time

func (Timestamp) MarshalJSON ¶

func (t Timestamp) MarshalJSON() ([]byte, error)

func (Timestamp) String ¶

func (t Timestamp) String() string

func (*Timestamp) UnmarshalJSON ¶

func (t *Timestamp) UnmarshalJSON(data []byte) error

type TranslatableHTMLString ¶

type TranslatableHTMLString map[string]string

Language to HTML string mapping.

type TranslatablePlainString ¶

type TranslatablePlainString map[string]string

Language to plain string mapping.

type UnknownValueClaim ¶

type UnknownValueClaim struct {
	CoreClaim

	Prop DocumentReference `json:"prop"`
}

type UnknownValueClaims ¶

type UnknownValueClaims = []UnknownValueClaim

type VisitResult ¶

type VisitResult int

const (
	Keep VisitResult = iota
	KeepAndStop
	Drop
	DropAndStop
)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
mapping
moma
search
wikipedia
identifier Package provides functions to generate PeerDB identifiers.	Package provides functions to generate PeerDB identifiers.
internal
cli
es
wikipedia

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL