cuda

package
v0.0.0-...-598a827 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 28, 2021 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (

	// UseGPU is used for specific types of testing to disable GPU tests when there
	// are GPU cards potentially present but they need to be disabled, this flag
	// is not used during production to change behavior in any way
	UseGPU *bool

	// CudaInitErr records the result of the CUDA library initialization that would
	// impact ongoing operation
	CudaInitErr *kv.Error

	// CudaInitWarnings records warnings and kv.that are deemed not be be fatal
	// to the ongoing CUDA library usage but are of importance
	CudaInitWarnings = []kv.Error{}

	// CudaInTest is used to check if the running process is a go test process, if so then
	// this will disable certain types of checking when using very limited GPU
	// Hardware
	CudaInTest = false
)

Functions

func GPUCount

func GPUCount() (cnt uint)

GPUCount returns the number of allocatable GPU resources

func GPUSlots

func GPUSlots() (cnt uint, freeCnt uint)

GPUSlots gets the free and total number of GPU capacity slots within the machine

func GetCUDAInfo

func GetCUDAInfo() (outDevs cudaDevices, err kv.Error)

func GetDevices

func GetDevices(slots uint) (devices []string, err kv.Error)

GetDevices will return a list of the possible devices that support a specified compute slot count. The returned order of cards is ascending going from the smaller capacity cards to the largest and most expensive. This function incorporates the AWS naming for cards when using the EC2 information functions to extract card details.

func GetSlots

func GetSlots(name string) (slots uint, err kv.Error)

GetSlots is used to retrieved the number of compute slots that cards are capable of

func HasCUDA

func HasCUDA() bool

HasCUDA allows an external package to test for the presence of CUDA support in the go code of this package

func LargestFreeGPUMem

func LargestFreeGPUMem() (freeMem uint64)

LargestFreeGPUMem will obtain the largest number of available GPU slots on any of the individual cards accessible to the runner

func LargestFreeGPUSlots

func LargestFreeGPUSlots() (cnt uint)

LargestFreeGPUSlots gets the largest number of single device free GPU slots

func MonitorGPUs

func MonitorGPUs(ctx context.Context, statusC chan<- []string, errC chan<- kv.Error)

MonitorGPUs will having initialized all of the devices in the tracking map when started as a go function check the devices for ECC and other kv.marking failed GPUs

func ReturnGPU

func ReturnGPU(alloc *GPUAllocated) (err kv.Error)

ReturnGPU releases the GPU allocation passed in. It will validate some of the allocation details but is an honors system.

func TotalFreeGPUSlots

func TotalFreeGPUSlots() (cnt uint)

TotalFreeGPUSlots gets the largest number of single device free GPU slots

Types

type GPUAllocated

type GPUAllocated struct {
	Slots uint              // The number of GPU slots given from the allocation
	Mem   uint64            // The amount of memory given to the allocation
	Env   map[string]string // Any environment variables the device allocator wants the runner to use
	// contains filtered or unexported fields
}

GPUAllocated is used to record the allocation/reservation of a GPU resource on behalf of a caller

type GPUAllocations

type GPUAllocations []*GPUAllocated

GPUAllocations records the allocations that together are present to a caller.

func AllocGPU

func AllocGPU(maxGPU uint, maxGPUMem uint64, unitsOfAllocation []int, cardCount int, live bool) (alloc GPUAllocations, err kv.Error)

AllocGPU will select the default allocation pool for GPUs and call the allocation for it.

type GPUTrack

type GPUTrack struct {
	UUID       string              // The UUID designation for the GPU being managed
	Slots      uint                // The number of logical slots the GPU based on its throughput/size has
	Mem        uint64              // The amount of memory the GPU has
	Allocated  bool                // Indicates is the card is allocated currently
	EccFailure *kv.Error           // Any Ecc failure related error messages, nil if no kv.encountered
	Tracking   map[string]struct{} // Used to validate allocations as they are released
}

GPUTrack is used to track usage of GPU cards and any kv.generated by the cards at the hardware level

func GPUInventory

func GPUInventory() (gpus []GPUTrack, err kv.Error)

GPUInventory can be used to extract a copy of the current state of the GPU hardware seen within the runner

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL