gpu-cloudwatch-reporting

command module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 12, 2020 License: MIT Imports: 17 Imported by: 0

README

gpu-cloudwatch-reporting

This repository provides a tool that sends metrics on GPU utilization on Amazon ECS to CloudWatch. This tool is able to supports Linux only.

  • Ubuntu 16.04/18.04/20.04 LTS
  • Amazon Linux 2 (ECS GPU-optimized AMI)

Installation

Download binary

Download it from releases page and extract it to /usr/local/bin.

$ curl -L -O https://github.com/ohsawa0515/gpu-cloudwatch-reporting/releases/download/<version>/gpu-cloudwatch-reporting_linux_amd64.tar.gz
$ tar zxf gpu-cloudwatch-reporting_linux_amd64.tar.gz
$ mv ./gpu-cloudwatch-reporting /usr/local/bin/
$ chmod +x /usr/local/bin/gpu-cloudwatch-reporting
go get
$ go get github.com/ohsawa0515/gpu-cloudwatch-reporting
$ mv $GOPATH/gpu-cloudwatch-reporting /usr/local/bin/
$ chmod +x /usr/local/bin/gpu-cloudwatch-reporting

Run as systemd

$ cat <<-EOH > /lib/systemd/system/gpu-cloudwatch-reporting.service
[Unit]
Description=GPU Utilization Metric Reporting
[Service]
Type=simple
PIDFile=/run/gpu-cloudwatch-reporting.pid
ExecStart=/usr/local/bin/gpu-cloudwatch-reporting
User=root
Group=root
WorkingDirectory=/
Restart=always
[Install]
WantedBy=multi-user.target
EOH
$ systemctl daemon-reload
$ systemctl enable gpu-cloudwatch-reporting.service
$ systemctl start gpu-cloudwatch-reporting.service

Run as docker container

NVIDIA driver is required. Please install from here.

$ docker pull ohsawa0515/gpu-cloudwatch-reporting:latest
$ docker run -d --gpus=all --rm \
      -e REGION=us-east-1 \
      -e NAMESPACE=GPUMonitor \
      -e SEND_INTERVAL_SECOND=60s \
      -e COLLECT_INTERVAL_SECOND=5s \
      ohsawa0515/gpu-cloudwatch-reporting:latest

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL