substreams-sink-clickhouse

module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2023 License: Apache-2.0

README

Substreams Sink Clickhouse

This is a command line tool to quickly sync a Substreams to a Clickhouse database.

Quickstart
  1. Install substreams-sink-clickhouse by using the pre-built binary release available in the releases page. Extract substreams-sink-clickhouse binary into a folder and ensure this folder is referenced globally via your PATH environment variable.

    Note Or install from source directly go install github.com/Aleno1/substreams-sink-clickhouse/cmd/substreams-sink-clickhouse@latest.

  2. Start Docker Compose:

    docker compose up
    

    Note Feel free to skip this step if you already have a running Clickhouse instance accessible, don't forget to update the connection string in the command below.

  3. Setup Clickhouse

    Connect to Clickhouse

    docker compose exec ch_server clickhouse-client -u dev-node --password insecure-change-me-in-prod -h localhost
    

    And create necessary tables to run the sink

        CREATE TABLE block_meta
    (
        id          String, 
        at          String,
        number      Int32,
        hash        String,
        parent_hash String,
        timestamp   String,
        PRIMARY KEY (id),
    )
    ENGINE = MergeTree()
    ORDER BY id;
    
    CREATE TABLE cursors
    (
        id         String,
        cursor     String,
        block_num  Int64,
        block_id   String,
        PRIMARY KEY (id)
    ) ENGINE = MergeTree()
    ORDER BY id;
    

    Note Each create table query must be run independently as clickhouse doesn't support multiple create table queries at once.

  4. Run the sink

    Use the precompiled Ethereum Block Meta substreams

    Note To connect to Substreams you will need an authentication token, follow this guide to obtain one.

    substreams-sink-clickhouse run \
        "clickhouse://dev-node:insecure-change-me-in-prod@localhost:8123" \
        "mainnet.eth.streamingfast.io:443" \
        https://github.com/streamingfast/substreams-eth-block-meta/releases/download/v0.4.1/substreams-eth-block-meta-v0.4.1.spkg \
        db_out
    
Output Module

To be accepted by substreams-sink-clickhouse, your module output's type must be a sf.substreams.sink.database.v1.DatabaseChanges message. The Rust crate substreams-data-change contains bindings and helpers to implement it easily. Some project implementing db_out module for reference:

By convention, we name the map module that emits sf.substreams.sink.database.v1.DatabaseChanges output db_out.

Clickhouse DSN

The connection string is provided using a simple string format respecting the URL specification. The DSN format is:

    clickhouse://<user>:<password>@<host>:<port>/<dbname>[?<options>]

Where is URL query parameters in = format, multiple options are separated by & signs. Supported options can be seen on libpq official documentation. The options , , , and should not be passed in as they are automatically extracted from the DSN URL.

Improvements

Current implementation of substreams-sink-clickhouse uses the http interface which has better language support than the native one. However, it is more limited than the native interface and worse performances.

Reimplementing this sink using the native interface would improve performance.

Use an in memory table for cursor table instead of writing it on disk. (Memory table have no primary keys)

Directories

Path Synopsis
cmd
pb

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL