nomadctld

command module

v0.0.2 Latest Latest Go to latest Published: Sep 29, 2018 License: MIT Imports: 31 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/42wim/nomadctld

Links

Open Source Insights

README ¶

nomadctld

nomadctld is a ssh server which sits between your users and the nomad cluster.
This way you can give users limited access to your nomad cluster and allow them to attach, see logs, tail logs, stop, restart, exec containers they own on the cluster. The server you're running nomadctld on needs to have access to your nomad server (usually on port 4646) and the docker daemon on all nomad nodes. (TLS is not supported yet)

features

Support nomad ACL tokens
Authenticate users via ssh-keys
Authorize users on nomad job prefixes
Supports following commands
- ps (shows all/filtered nomad jobs) ( is optional)
- batch (shows batch nomad jobs) ( is optional)
- exec (executes in container (command default is /bin/sh)
- logs (shows last 100 stdout lines of )
- tail (shows last 100 stdout lines of and keeps following them (like tail -f))
- stop (stops a nomad job with ( is the jobname))
- status (shows status of a nomad job with ( is the jobname)
- inspect (inspects a nomad job with ( is the jobname))
- di (runs equivalent of docker inspect on container
- raw <production/quality> (executes a "raw" nomad command (basically talks to the nomad binary))
- restart (restart a job)
- pstree (shows all/filtered nomad jobs)
- info (shows information about all allocations of the job)

running

nomadctld looks for a nomadctld.toml in the current directory or in /etc/nomadctld

./nomadctld
2018/04/12 23:54:53 starting ssh server on  :2222 Using  /etc/ssh/ssh_host_rsa_key

You can now do a ssh -p2222 localhost ps to see all running jobs

There's also a nomadctl bash script included that wraps ssh, modify the first line nomadctld="/usr/bin/ssh -p2222 yourserver" to your own server and optionally specify your sshkey

example

Filter for a mattermost job

$ nomadctl ps mattermost
Exec ID    |Job/Task         |Node      |Uptime      |CPU |Mem(max)
           |q-lnx-mattermost |          |            |    |
f429aef310 |q-lnx-mattermost |q-node-10 |3 weeks ago |2   |34 MiB(79 MiB)
           |p-lnx-mattermost |          |            |    |
e7c3629baf |p-lnx-mattermost |p-node-2  |1 week ago  |4   |56 MiB(201 MiB)

Get a shell on our quality container

$ nomadctl exec f429aef310
Welcome Wim (wim, ssh-ed25519 AAAAC3NzaC1lZDI1NT)
You have access to [p- q- t- raw]
welcome to q-lnx-mattermost on q-node-10
sh-4.2# ps -ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.0  11640  1344 ?        Ss   Mar21   0:00 /bin/bash /docker-entry.sh
root          8  0.0  0.2 455808 43488 ?        Sl   Mar21  22:40 ./platform --config /config_docker.json
root         31  1.0  0.0  11772  1672 ?        Ss   00:36   0:00 /bin/sh
root         38  0.0  0.0  47448  1664 ?        R+   00:36   0:00 ps -ux
sh-4.2# exit
$

Show batch jobs

$ nomadctl batch
Job ID                                           |Next        |                          |Config
p-es-log-curator                                 |5m24s       |2018-07-12T15:52:00+02:00 |52 6,15,18 * * * *
p-iss-frontend-sync                              |9m24s       |2018-07-12T15:56:00+02:00 |11/15 * * * * *

debugging examples

If you see a OK or a FAIL next to the job, this means the deployment is OK or has FAILed. A deployment can fail even when the containers are running, because of allocations that took too much time to become healthy

Extra column legend

F = allocation failed
P = allocation pending
T = allocation has been terminated
O = allocation has been terminated by OOM
NR = allocation will not restart
KV = allocation has been killed because of vault issues
U = allocation is unhealthy (based on consul checks)

With failures

Exec ID    |Job/Task                                     |Node      |Uptime        |CPU |Mem(max)          |Extra
076f1388af |p-lnx-helloworld2                            |p-node-5  |1 hour ago    |0   |0 B(0 B)          |F,KV,U
30a1e62b4e |p-es-iss-cluster1-frontend                   |p-node-9  |14 hours ago  |4   |1.9 GiB(2.0 GiB)  |O,O,O
b4eaf79c05 |q-fii-lamp                                   |q-node-11 |1 day ago     |0   |39 MiB(67 MiB)    |U,T
f3c6ee4200 |q-fii-monitoring-wmsmonitor                  |q-node-11 |17 hours ago  |0   |22 MiB(43 MiB)    |T

p-lnx-helloworld2 allocations have failed because of vault issues and the allocation is unhealthy
p-es-iss-cluster1-frontend is 3 times terminated because of a OOM issue (but is still running because no F)
q-fii-monitoring-wmsmonitor has been terminated once (but is still running because no F)
q-fii-lamp been terminated once and is now unhealthy (but is still running because no F)

Further debugging of those failures:

$ nomadctl info p-lnx-helloworld2

076f1388af |failed          |1 hour ago |p-lnx-helloworld2-main/p-lnx-helloworld2
           |p-node-5        |0 Mhz      |0 B(0 B)
           |1 hour ago      |Received   |Task received by client
           |1 hour ago      |Task Setup |Building Task Directory
           |1 hour ago      |Killing    |vault: server error deriving vault token: Error making API request.

URL: POST https://vault.service.consul:8200/v1/auth/token/create/nomad-cluster
Code: 400. Errors:

* token policies ([default p-lnx-ros]) must be subset of the role's allowed policies ([default p-es-ro p-lnx-ro])
           |1 hour ago      |Alloc Unhealthy |Unhealthy because of failed task

We can see that there is an issue with our vault policy

example configuration

[general]
hostkey="/etc/ssh/ssh_host_rsa_key"
#this is the port your docker daemon is listening on
dockerport="4243"
#specify the api, if you're using an older docker (1.24 is needed for centos7)
dockerapi="1.24"
nomadbinary="/usr/bin/nomad"
bind=":2222"

#create some aliases
[alias]
status="raw alles status"

[nomad]
[nomad.production]
url="http://nomad.service.consul:4646"
#the prefix that production jobs start with, eg p-team1-teamjob
prefix=["p-"]
#file containing the nomad acl token with admin privileges
token="/etc/nomadctld/token-p"

[nomad.quality]
url="http://nomad.service.qconsul:4646"
#the prefix that quality jobs start with, eg q-team1-teamjob
#we're running quality and test on the same cluster, so t- also runs here
prefix=["q-"]
token="/etc/nomadctld/token-q"

[nomad.test]
url="http://nomad.service.tconsul:4646"
#the prefix that test jobs start with, eg t-team1-teamjob
prefix=["t-"]
token="/etc/nomadctld/token-t"


#prefixes are used for acl
[prefix]
#create a lnx prefix that has access to jobs starting with "p-","q-","t-"
#and has access to the "raw" command
[prefix.lnx]
prefix=["p-","q-","t-",raw"]

#the team1 prefix has less privileges, it can only see there own jobs
[prefix.team1]
prefix=["p-team1-","q-team1-","t-team1-"]

#create some users
[users]
[users.wim]
#key is the public sshkey
key="ssh-ed25519 ABCEC4NzaC1lZDI2NTE5AADAIB3m1na7Lu74koUa5EYPvEk7xCXiwtToar73gl9w9MND"
#which prefix can this user access
prefix=["lnx"]
name="Wim"

[users.user2]
key="ssh-ed25519 ABCEC4NzaC1lZDI2NTE5AADAIB3m1na7Lu74koUa5EYPvEk7xCXiwtToar73gl9abcde"
prefix=["team2"]
name="user2"

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL