nomad-ondemand-scaler

command module

v0.0.0-...-d3ca436 Latest Latest Go to latest Published: Dec 21, 2023 License: MIT Imports: 37 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/tantra35/nomad-ondemand-scaler

Links

Open Source Insights

README ¶

Hashicorp nomad ondemand horizontal cluster autoscaler

Purpose

The nomad-ondemand-scaler is used to automatically adjust the cluster size when cluster size not sufficient to satisfy workload requirements

The need for this project arose due to the lack of such scaling in the original nomad-autoscaler

This scaler monitor blocked evals, and if it detect this, begin scaling action(selects the most suitable pool, calculate required amount of nodes to place required workload)

As most of autoscalers this project also, have such abstraction as pools of nodes - which is a set of instances (nodes) combined by one or more parameters (these can be attributes, resources or devices available on pool instances). Pools are unique relative to each other and should not overlap

Using pools allows you to more granularly allocate resources for workload, for example, it makes no sense to allocate instances with gpu for loads that do not require gpu, etc.

Config

poolconfig="./pools.yml" # <-- Setup location of yaml file, that describes node pools

gc {
  cicles_to_gc = 3
  cicle_period = "1m"
  allowed_freexpr = "min(round(totalnodes * 0.1), 2)"
}

stalenomadapi {
  allow = true
  duration = "30ms"
}

telemetry {
  statsiteaddr = "statesitelocal.service.consul:8125"
  prefix = "telemetry stats prefix"
}

hungprevention {
 allow = true
 detect_period = "30m"
}

Config consist from 4 sections:

gc describes garbage collection:
- cicles_to_gc how many GC cycles instance must exist in idle state(without allocations) before it will be garbage collected
- cicle_period periodically of GC cycle(should be specified in form that understands ParseDuration function)
- allowed_freexpr expression that understands exprtk. This expression defines allowed free nodes count in each pool (instances that will not garbage collected)) this is usefull to organize Hot pools Important in expression can be used predefined variables:
  - totalnodes - total nodes in pool
  - busynodes - busy nodes in pool
stalenomadapi allow use inconsistent nomad api
- allow allow using inconsistent nomad api(true|false)
- duration max allowed interval of inconsistency, within which response from nomad api will be considered as valid (should be specified in form that understands ParseDuration function), in other case request will be repeated with fully consistent requirements
telemetry - allow telemetry, now supports only statsite, but due library https://github.com/hashicorp/go-metrics used, no any problems to add other collectors, for example Prometheus, Datadog etc
hungprevention - describes parameters that prevents scale action hung(they can be caused by errors in the code of the scaler itself, as well as external reasons - for example, the cloud provider cannot allocate the requested resources)
- allow - prohibits or not setting of a global timeout for scaleup actions (default: false)
- detect_period - timeout for scaleup action(should be specified in form that understands ParseDuration function)

Pool configuration

Pool configuration is a yaml file, something like this:

- datacenter: <some datacenter>
  nodeclass: <node class> 
  cpu: 1000
  mem: 25Gib
  reserved:
    cpu: 100
    mem: 100Mib
  drivers:
    - docker
    - exec
  devices:
    - name: "NVIDIA A10G"
      type: gpu
      vendor: nvidia
      attr:
        memory: "23028 MiB"
  attr.cpu.arch: x86
  attr.kernel.name: linux
  provider:
    name: anynode

In such config provider field describe node provider, which is used to create pool instances, for now 3 types of providers are supported:

Documentation ¶

Rendered for

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
nodeprovider
karpenterprovidergrpc

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL