k8s-device-plugin

module

v1.25.2 Latest Latest Go to latest Published: Oct 19, 2022 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/RadeonOpenCompute/k8s-device-plugin

Links

Open Source Insights

README ¶

AMD GPU device plugin for Kubernetes

Introduction

This is a Kubernetes device plugin implementation that enables the registration of AMD GPU in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD GPU.

More information about RadeonOpenCompute (ROCm)

Prerequisites

ROCm capable machines
kubeadm capable machines (if you are using kubeadm to deploy your k8s cluster)
ROCm kernel (Installation guide) or latest AMD GPU Linux driver (Installation guide)
A Kubernetes deployment
--allow-privileged=true for both kube-apiserver and kubelet (only needed if the device plugin is deployed via DaemonSet since the device plugin container requires privileged security context to access /dev/kfd for device health check)

Limitations

This plugin targets Kubernetes v1.18+.

Deployment

The device plugin needs to be run on all the nodes that are equipped with AMD GPU. The simplest way of doing so is to create a Kubernetes DaemonSet, which run a copy of a pod on all (or some) Nodes in the cluster. We have a pre-built Docker image on DockerHub that you can use for with your DaemonSet. This repository also have a pre-defined yaml file named k8s-ds-amdgpu-dp.yaml. You can create a DaemonSet in your Kubernetes cluster by running this command:

$ kubectl create -f k8s-ds-amdgpu-dp.yaml

or directly pull from the web using

kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml

If you want to enable the experimental device health check, please use k8s-ds-amdgpu-dp-health.yaml after --allow-privileged=true is set for kube-apiserver and kublet.

Example workload

You can restrict work to a node with GPU by adding resources.limits to the pod definition. An example pod definition is provided in example/pod/alexnet-gpu.yaml. This pod runs the timing benchmark for AlexNet on AMD GPU and then go to sleep. You can create the pod by running:

$ kubectl create -f alexnet-gpu.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/example/pod/alexnet-gpu.yaml

and then check the pod status by running

$ kubectl describe pods

After the pod is created and running, you can see the benchmark result by running:

$ kubectl logs alexnet-tf-gpu-pod alexnet-tf-gpu-container

For comparison, an example pod definition of running the same benchmark with CPU is provided in example/pod/alexnet-cpu.yaml.

Labelling node with additional GPU properties

Please see AMD GPU Kubernetes Node Labeller for details. An example configuration is in k8s-ds-amdgpu-labeller.yaml:

$ kubectl create -f k8s-ds-amdgpu-labeller.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml

Notes

This plugin uses go modules for dependencies management
Please consult the Dockerfile on how to build and use this plugin independent of a docker image

TODOs

Add proper GPU health check (health check without /dev/kfd access.)

Directories ¶

Path	Synopsis
cmd
k8s-device-plugin Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster	Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster
k8s-node-labeller
internal
pkg/amdgpu Package amdgpu is a collection of utility functions to access various properties of AMD GPU via Linux kernel interfaces like sysfs and ioctl (using libdrm.)	Package amdgpu is a collection of utility functions to access various properties of AMD GPU via Linux kernel interfaces like sysfs and ioctl (using libdrm.)
pkg/hwloc Package hwloc is a collection of utility functions to get NUMA membership of AMD GPU via the hwloc library	Package hwloc is a collection of utility functions to get NUMA membership of AMD GPU via the hwloc library

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL