dagger

command module

v0.0.0-...-9bb8190 Latest Latest Go to latest Published: Feb 17, 2021 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nsaje/dagger

Links

Open Source Insights

README ¶

Dagger

Dagger is a dynamic realtime stream processing framework based on publishing of and subscribing to streams of data.

Computations are only executed on a stream (or streams) of data if there's someone actually subscribed to and using that computation. For example, a process that persists data streams to a DB is subscribed to a stream of CPU utilization for a node, averaged by a minute. This means a computation averaging CPU utilization over each minute will run as long as that process is running. When an administrator opens up a dashboard that would like to display a realtime chart of CPU utilization averaged over 5 seconds, a second computation will spin up that will supply average CPU utilization over 5 seconds. When the admin closes the dashboard, the computation shuts down, because there is no one interested in its data anymore.

This way, computation and data transfer over the network is only done when necessary, thus conserving compute cycles and reducing network traffic.

This software is a prototype in the making. It is being developed for the purpose of my master's thesis.

Properties

Multilang plugins

Plugins for stream processing can be written in any language, interfaced via JSON-RPC.
Exactly once delivery

Ensured using retries and efficient deduplication.
Timestamp-ordered processing

Even when processing records from different streams, Dagger will process the records in the order of their timestamps. This is achieved using low watermarks, used also in Google's MillWheel stream processing system, which tell us when we have received all the records up to a certain point in time.
Fault tolerant

Achieved without using a separate database system, with multiple computations being executed in parallel enabling no-loss failover.
Kafka-like rewinds

Using the approach Kafka pioneered, Dagger persists all the streams to disk. This means you can decide from where on you want to receive stream data or even rewind to an earlier position in time.
Decentralized

No single point of failure. Workers are fail-fast and coordinate via a consistent KV store such as Consul or etcd.
Easy deployment

Written in Go, so all you need to deploy is a Consul or etcd cluster and a Go binary.

Dynamic work assignment example

1: the monitored node has a 'cpu_util' stream available

monitored node
+---------------+
|               |
| pub: cpu_util |
|               |
+---------------+

2: a subscriber comes along that wants a stream of 'cpu_util' averaged by 5 minutes (say, a process that persists this to a DB)

monitored node                                                        subscriber
+---------------+                                                    +---------------------------+
|               |                                                    |                           |
| pub: cpu_util |                                                    | sub: avg(cpu_util, 5min)  |
|               |                                                    |                           |
+---------------+                                                    +---------------------------+

3: a Dagger worker spins up that starts computing this average and making it available for subscribers

monitored node           worker node                                  subscriber
+---------------+       +-----------------------------------+        +---------------------------+
|               |       |                                   |        |                           |
| pub: cpu_util +-------> computation: avg(cpu_util, 5min)  +--------> sub: avg(cpu_util, 5min)  |
|               |       |                                   |        |                           |
+---------------+       +-----------------------------------+        +---------------------------+

4: another subscriber shows up, this time wanting an average of 'cpu_util' over 10 seconds (say, an admin opens a web dashboard that displays live data)

monitored node           worker node                                  subscriber
+---------------+       +-----------------------------------+        +---------------------------+
|               |       |                                   |        |                           |
| pub: cpu_util +-------> computation: avg(cpu_util, 5min)  +--------> sub: avg(cpu_util, 5min)  |
|               |       |                                   |        |                           |
+---------------+       +-----------------------------------+        +---------------------------+

                                                                     +---------------------------+
                                                                     |                           |
                                                                     | sub: avg(cpu_util, 10sec) |
                                                                     |                           |
                                                                     +---------------------------+

5: a new worker is started, providing the required stream

monitored node           worker node                                  subscriber
+---------------+       +-----------------------------------+        +---------------------------+
|               |       |                                   |        |                           |
| pub: cpu_util +---+---> computation: avg(cpu_util, 5min)  +--------> sub: avg(cpu_util, 5min)  |
|               |   |   |                                   |        |                           |
+---------------+   |   +-----------------------------------+        +---------------------------+
                    |
                    |   +-----------------------------------+        +---------------------------+
                    |   |                                   |        |                           |
                    +---> computation: avg(cpu_util, 10sec) +--------> sub: avg(cpu_util, 10sec) |
                        |                                   |        |                           |
                        +-----------------------------------+        +---------------------------+

6: the second subscribers unsubscribes (say, the admin closes the dashboard)

monitored node           worker node                                  subscriber
+---------------+       +-----------------------------------+        +---------------------------+
|               |       |                                   |        |                           |
| pub: cpu_util +---+---> computation: avg(cpu_util, 5min)  +--------> sub: avg(cpu_util, 5min)  |
|               |   |   |                                   |        |                           |
+---------------+   |   +-----------------------------------+        +---------------------------+
                    |
                    |   +-----------------------------------+
                    |   |                                   |
                    +---> computation: avg(cpu_util, 10sec) |
                        |                                   |
                        +-----------------------------------+

7: a lack of subscribers is detected for the second worker so it shuts down

monitored node           worker node                                  subscriber
+---------------+       +-----------------------------------+        +---------------------------+
|               |       |                                   |        |                           |
| pub: cpu_util +-------> computation: avg(cpu_util, 5min)  +--------> sub: avg(cpu_util, 5min)  |
|               |       |                                   |        |                           |
+---------------+       +-----------------------------------+        +---------------------------+

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
client
command
computations
computation-alarm
computation-avg
computation-bar
computation-count
computation-foo
computation-max
computation-min
computation-sum
dagger
producers
producer-test

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL