igor

command
v0.0.0-...-81ce8a9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2022 License: GPL-3.0 Imports: 33 Imported by: 0

README

igor is a tool for managing reservations on a cluster.

Users make reservations with igor, requesting either a number of nodes
or a specific set of nodes. They also specify a kernel and initial
ramdisk which their nodes should boot. Reservations are deleted when
they run out of time, but users can add additional time to the
reservation. igor now only allows a reservation's owner to delete the
reservation; experience with using igor in a real setting has shown
that this is needed.

SETUP
----------------

You need either Cobbler or syslinux, specifically pxelinux,
installed.

For pxelinux, figure out where your TFTP root will be--whichever
directory contains pxelinux.cfg, ours is /tftpboot--and set up the
default PXE configuration in /tftpboot/pxelinux.cfg/default. This
setup is simple but is outside the scope of this document.

For Cobbler, make sure you have a usable default profile enabled,
and make sure user 'igor' is allowed to run the cobbler command.
You should also select a root directory for igor to use; /tftpboot
is not a bad choice.

Run "./setup.sh <path to root>", e.g. for pxelinux you'd run

	./setup.sh /tftpboot

This will copy necessary files to their destinations.

To configure igor, edit /etc/igor.conf, a JSON config file created by
setup.sh. The setup.sh script will copy over sampleconfig.json;
edit the values to match your configuration.

Our cluster nodes are named kn1 through kn520, so our
"prefix" is "kn", "start" is 1, and "end" is 520. Note that the
numbers are *not* in quotes.

Basic config
------------
* "tftproot" is the directory which contains pxelinux.cfg, or if
using Cobbler can be any directory.
* "prefix" is the prefix for your cluster hostnames on the management network, so nodes of the format "kn1" would have a prefix "kn"
* "start" is the first node's number
* "end" is the last node's number
* "padlen" should be set to the minimum number width for node names; if your nodes are named "kn1", set it to 0, but if nodes are named "kn001" set it to 3.
* "rackheight" and "rackwidth" define the physical dimensions of your
cluster hardware, for use with "igor show". Our cluster is composed of
13 shelves, each containing 5 shelves of 8 PCs each. When "igor show"
runs, part of the information it gives is a diagram of "racks"; one
"rack" from our cluster is shown below:

	---------------------------------
	|281|282|283|284|285|286|287|288|
	|289|290|291|292|293|294|295|296|
	|297|298|299|300|301|302|303|304|
	|305|306|307|308|309|310|311|312|
	|313|314|315|316|317|318|319|320|
	---------------------------------

If you are running a cluster of 4x 1U servers, and they are all in a
single rack, you would set rackheight = 4, and rackwidth = 1, to see
something like this:

	---
	|1|
	|2|
	|3|
	|4|
	---

If the physical layout of your cluster is strange, or if you'd just
prefer a big grid, you can set rackheight = sqrt(# nodes) and
rackwidth = sqrt(# nodes). This will just show one big grid of all
your nodes.

Cobbler
-------
* "usecobbler" is a boolean value and should be set to true if you wish to use Cobbler
* "cobblerdefaultprofile" is the name of the default Cobbler profile; unreserved nodes will be configured to use this.

Power
-----
* "poweroncommand" and "poweroffcommand" are commands that can be executed to turn nodes on and off. If "poweroncommand" is set to "powerbot on", for instance, igor will attempt to call "powerbot on kn1" to power on kn1.
* "autoreboot" determines if igor will attempt to power cycle nodes automatically when a reservation begins or ends.

Network segmentation
---------------------
* "network" specifies what kind of networking switch you're using. Currently, only Arista devices are supported; set this to "arista" to use it. If set to "", igor will not do segmentation.
* "vlan_min" and "vlan_max" specify a range of VLANs to use for automatically segmenting reservation traffic using 802.1ad (QinQ)
* "networkuser" and "networkpassword" specify login details for the switch
* "network_url" specifies the host to connect to for switch configuration; for Arista devices, this will take the form of <host>:<port>/command-api
* "node_map" is a mapping from cluster node names to port names on the core switch.

RUNNING
-----------------

Generally, to use igor you will check what nodes are reserved, make
your own reservation with some un-used nodes, and then delete the
reservation when you're done. When creating a reservation, you can
specify a duration (default 12 hours); after this expires, your
reservation is not automatically deleted, but it should be considered
"fair game" for deletion by anyone else.

To see what reservations exist:

	$ igor show

To make a reservation named "testing", using some kernel and initrd,
with nodes 1-10:

	$ igor sub -r testing -k /path/to/kernel -i /path/to/initrd -w kn[1-10]

To remove your reservation:

	$ igor del testing

If your reservation is about to run out of time, use the "addtime"
command to increase the reservation duration:

	$ igor addtime -r testing -t 2    # add 2 hours

You can type "igor help" to access the built-in help, which gives more
details on all the possible command line switches.

Documentation

Overview

Copyright 2019-2021 National Technology & Engineering Solutions of Sandia, LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

igor is a simple command line tool for managing reservations of nodes in a cluster. It also will configure the pxeboot environment for booting kernels and initrds for cluster nodes.

Copyright 2018-2021 National Technology & Engineering Solutions of Sandia, LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

Copyright 2019-2021 National Technology & Engineering Solutions of Sandia, LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL