mmdb-to-clickhouse
Import MMDB files into ClickHouse.
This repository contains a dummy example.mmdb
file for testing purpose.
To get real data, check out the free Country & ASN database from IPinfo which is also supported by this tool.
Features
- Automatically infers the table schema from the MMDB file
- Supports nested records by flattening them
- Stores data in a partitioned table to keep track of history
- Creates an IP trie dictionary for fast lookups
Current limitations
- The schema is inferred from the first record only. If subsequent records have additional fields, those will be ignored.
- The
network
and partition
names are reserved and must not be present in the MMDB file
Installation
Download the latest release for your operating system and run:
./mmdb-to-clickhouse -h
You can also run it through Docker with:
docker run --rm -it ghcr.io/maxmouchet/mmdb-to-clickhouse -h
Example usage
First start a ClickHouse instance:
docker run --name clickhouse --rm -d -p 9000:9000 clickhouse/clickhouse-server
Then download the example MMDB file:
wget https://github.com/maxmouchet/mmdb-to-clickhouse/raw/main/example.mmdb
And run mmdb-to-clickhouse
:
./mmdb-to-clickhouse -dsn clickhouse://localhost:9000 -mmdb example.mmdb -name example_mmdb -test
The output should look like the following:
2023/12/21 12:18:10 Schema: network String, country String, partition Date
2023/12/21 12:18:10 Creating example_mmdb_history
2023/12/21 12:18:10 Creating example_mmdb
2023/12/21 12:18:10 Dropping partition 2023-12-21
2023/12/21 12:18:10 Inserted 1 rows
2023/12/21 12:18:10 Running test query: SELECT dictGet('example_mmdb', 'country', IPv6StringToNum('1.1.1.1'))
2023/12/21 12:18:10 Test query result: WW
This will create two tables:
example_mmdb_history
: a partitioned table which keeps the last 30 days of history by default (see the -ttl
option)
example_mmdb
: an in-memory IP trie dictionary which always uses the latest partition from example_mmdb_history
. This dictionary enables very fast IP lookups.
Open a REPL and inspect the tables:
docker exec -it clickhouse clickhouse client
SHOW TABLES
-- ┌─name─────────────────┐
-- │ example_mmdb │
-- │ example_mmdb_history │
-- └──────────────────────┘
SELECT * FROM example_mmdb_history
-- ┌─network───┬─country─┬──partition─┐
-- │ 0.0.0.0/0 │ WW │ 2023-12-21 │
-- └───────────┴─────────┴────────────┘
SELECT * FROM example_mmdb
-- ┌─network───┬─country─┬──partition─┐
-- │ 0.0.0.0/0 │ WW │ 2023-12-21 │
-- └───────────┴─────────┴────────────┘
SELECT dictGet('example_mmdb', 'country', IPv6StringToNum('1.1.1.1')) AS country
-- ┌─country─┐
-- │ WW │
-- └─────────┘
To clean up just remove the ClickHouse instance:
docker rm -f clickhouse