Flock
Flock loads real Twitter streams into Dgraph to make use of graph travesals.
Flock has two parts,
- Tweet loader - Connects to realtime Tweets via the Twitter Developer API and
loads a graph model of Twitter into Dgraph via mutations.
- Query client - Runs interesting graph queries on the Tweets data stored in Dgraph.
Here is the graph schema of Flock:
Running Flock
We need to create a Twitter developer account and an app to be able to fetch stream of Tweets using
their APIs. Let's start with how to create a Twitter developer account.
Setup
$ git clone https://github.com/dgraph-io/flock.git
$ cd flock
- Export the persistent data directory. Since Dgraph is run using Docker containers, it is nice
to mount a directory on the host machine to persist the data across multiple runs.
$ mkdir ./data
$ export DATA_DIR=$(pwd)/data
- If you're running Linux, you can add the current user to the
docker
group to use Docker as a non-root user.
newgrp
creates a new terminal session. It is necessary after the user modification to see the effects.
$ sudo usermod -aG docker $USER
$ newgrp docker
-
Ensure that credentials.json
with the Twitter credentials exist in the root directory of Flock.
-
Start the Dgraph servers and Ratel with Docker Compose. Visit http://localhost:8000 on your
browser to view the UI.
$ docker-compose up
- On another terminal, start Flock:
$ docker-compose -f docker-compose-flock.yml up
Flock will begin printing out periodic log messages mentioning its
loading rate. You're good to go if you see the commit_rate
higher
than 0/sec, which means data has been successfully committed to
Dgraph.
A few minutes of running Flock is sufficient to get enough data for
some interesting queries. To stop running Flock, press Ctrl+C on the
terminal running Flock.
$ docker-compose -f docker-compose-flock.yml up
...
<Ctrl+C>
Killing flock ... done