cuda

package module
v0.0.0-...-741e7c9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2017 License: BSD-2-Clause Imports: 8 Imported by: 4

README

cuda

This is a Go package for interacting with CUDA. See the GoDoc for detailed usage information.

License

This is licensed under a BSD 2-clause license. See LICENSE.

Documentation

Overview

Package cuda provides bindings to the CUDA library.

Building

To use this package, you must tell Go how to link with CUDA. On Mac OS X, this might look like:

export CUDA_PATH="/Developer/NVIDIA/CUDA-8.0"
export DYLD_LIBRARY_PATH="$CUDA_PATH/lib":$DYLD_LIBRARY_PATH
export CPATH="$CUDA_PATH/include/"
export CGO_LDFLAGS="/usr/local/cuda/lib/libcuda.dylib $CUDA_PATH/lib/libcudart.dylib $CUDA_PATH/lib/libcublas.dylib $CUDA_PATH/lib/libcurand.dylib"

On Linux, this might look like:

export CUDA_PATH=/usr/local/cuda
export CPATH="$CUDA_PATH/include/"
export CGO_LDFLAGS="$CUDA_PATH/lib64/libcublas.so $CUDA_PATH/lib64/libcudart.so $CUDA_PATH/lib64/stubs/libcuda.so $CUDA_PATH/lib64/libcurand.so"
export LD_LIBRARY_PATH=$CUDA_PATH/lib64/

Contexts

Virtually every cuda API must be run from within a Context, which can be created like so:

devices, err := cuda.AllDevices()
if err != nil {
    // Handle error.
}
if len(devices) == 0 {
    // No devices found.
}
ctx, err := cuda.NewContext(devices[0], 10)
if err != nil {
    // Handle error.
}

To run code in a Context asynchronously, you can do the following:

ctx.Run(func() error {
    // My code here.
})

To run code synchronously, simply read from the resulting channel:

<-ctx.Run(func() error {
    // My code here.
})

You should never call ctx.Run() inside another call to ctx.Run(), for reasons that are documented on the Context.Run() method.

Memory Management

There are two ways to deal with memory: using Buffers, or using an Allocator directly with unsafe.Pointers. The Buffer API provides a high-level buffer interface with garbage collection and bounds checking. Most APIs use Buffers, including the APIs provided by sub-packages.

No matter what, you will need an Allocator if you want to allocate memory. You can create an Allocator directly on top of CUDA:

allocator := cuda.GCAllocator(cuda.NativeAllocator(ctx), 0)

Once you have an allocator, you can use it to allocate Buffer objects like so:

err := <-ctx.Run(func() error {
    // Allocate 16 bytes.
    buffer, err := cuda.AllocBuffer(allocator, 16)
    if err != nil {
        return err
    }
    // Use the buffer here...
})

There are various functions to help you deal with buffers. The WriteBuffer() and ReadBuffer() functions allow you to copy Go slices to and from buffers. The Slice() function allows you to get a Buffer which points to a sub-region of a parent Buffer.

Kernels

To run kernels, you will use a Module. You can pass various Go primitives, unsafe.Pointers, and Buffers as kernel arguments.

Sub-packages

The cublas and curand sub-packages provide basic linear algebra routines and random number generators, respectively.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ClearBuffer

func ClearBuffer(b Buffer) error

ClearBuffer writes zeros over the contents of a Buffer. It must be called from the correct Context.

func CopyBuffer

func CopyBuffer(dst, src Buffer) error

CopyBuffer copies as many bytes as possible from src into dst.

The two Buffers must not contain overlapping regions of memory.

func MemInfo

func MemInfo() (free, total uint64, err error)

MemInfo gets the free and total amount of memory available for allocation on the current device.

This must be called in a Context.

func Overlap

func Overlap(b1, b2 Buffer) bool

Overlap checks if two buffers overlap in memory.

func ReadBuffer

func ReadBuffer(val interface{}, b Buffer) error

ReadBuffer reads the data from a Buffer into a slice. This must be called from the correct Context.

See WriteBuffer for details on supported slice types.

func Synchronize

func Synchronize() error

Synchronize waits for asynchronous operations to complete.

This should be called in a Context.

func WriteBuffer

func WriteBuffer(b Buffer, val interface{}) error

WriteBuffer writes the data from a slice into a Buffer. It must be called from the correct Context.

Supported slice types are:

[]byte
[]float64
[]float32
[]int32
[]uint32

Similar to the copy() built-in, the maximum possible amount of data will be copied.

Types

type Allocator

type Allocator interface {
	// Get the Context in which all calls to this Allocator
	// should be made.
	//
	// Unlike Alloc and Free, this needn't be called from the
	// allocator's Context.
	Context() *Context

	// Allocate a chunk of CUDA memory.
	//
	// This should only be called from the Context.
	Alloc(size uintptr) (unsafe.Pointer, error)

	// Free a chunk of CUDA memory.
	//
	// The size passed to Free must be the same size that was
	// passed to Alloc().
	//
	// This should only be called from the Context.
	Free(ptr unsafe.Pointer, size uintptr)
}

An Allocator allocates and frees CUDA memory.

In general, Allocators are bound to a Context, meaning that they should only be used from within that Context.

Usually, you should prefer to use the Buffer type over a direct memory allocation, since Buffers take care of garbage collection for you.

Allocators are not responsible for zeroing out returned memory.

func BFCAllocator

func BFCAllocator(ctx *Context, maxSize uintptr) (Allocator, error)

BFCAllocator creates an Allocator that uses memory coalescing and best-fitting to reduce memory fragmentation.

You should wrap the returned allocator with GCAllocator if you plan to use the Buffer API.

The maxSize argument specifies the maximum amount of memory to claim for the allocator. If it is 0, the allocator may claim nearly all of the available device memory.

If the CUDA_BFC_HEADROOM environment variable is set, it is used as the minimum number of bytes to leave free.

If the CUDA_BFC_MAX environment variable is set, it is used as an upper memory bound (in addition to maxSize).

This should be called from a Context.

func GCAllocator

func GCAllocator(a Allocator, frac float64) Allocator

GCAllocator wraps an Allocator in a new Allocator which automatically triggers garbage collections.

The frac argument behaves similarly to the GOGC environment variable, except that GOGC is a percentage whereas frac is a ratio. Thus, a frac of 1.0 is equivalent to GOGC=100. If frac is 0, the value for GOGC is used.

If you are implementing your own Allocator, you will likely want to wrap it with GCAllocator so that it works nicely with the Buffer API.

This need not be called in a Context.

func NativeAllocator

func NativeAllocator(ctx *Context) Allocator

NativeAllocator returns an Allocator that allocates directly from the CUDA APIs.

The resulting Allocator should be wrapped with GCAllocator if you plan to use it with the Buffer API.

This need not be called in a Context.

type Buffer

type Buffer interface {
	// Allocator is the Allocator from which the Buffer was
	// allocated.
	Allocator() Allocator

	// Size is the size of the Buffer.
	Size() uintptr

	// WithPtr runs f with the pointer contained inside the
	// Buffer.
	// During the call to f, it is guaranteed that the Buffer
	// wil not be garbage collected.
	// However, nothing should store a reference to ptr after
	// f has completed.
	WithPtr(f func(ptr unsafe.Pointer))
}

A Buffer provides a high-level interface into an underlying CUDA buffer.

func AllocBuffer

func AllocBuffer(a Allocator, size uintptr) (Buffer, error)

AllocBuffer allocates a new Buffer.

This must be called in the Allocator's Context.

This does not zero out the returned memory. To do that, you should use ClearBuffer().

func Slice

func Slice(b Buffer, start, end uintptr) Buffer

Slice creates a Buffer which views some part of the contents of another Buffer. The start and end indexes are inclusive and exclusive, respectively.

func WrapPointer

func WrapPointer(a Allocator, ptr unsafe.Pointer, size uintptr) Buffer

WrapPointer wraps a pointer in a Buffer. You must specify the Allocator from which the pointer originated and the size of the buffer.

After calling this, you should not use the pointer outside of the buffer. The Buffer will automatically free the pointer.

type Context

type Context struct {
	// contains filtered or unexported fields
}

A Context maintains a CUDA-dedicated thread. All CUDA code should be run by a Context.

func NewContext

func NewContext(d *Device, bufferSize int) (*Context, error)

NewContext creates a new Context on the Device.

The bufferSize is the maximum number of asynchronous calls that can be queued up at once. A larger buffer size means that Run() is less likely to block, all else equal.

If bufferSize is -1, then the CUDA_CTX_BUFFER environment variable is used. If bufferSize is -1 and CUDA_CTX_BUFFER is not set, a reasonable default is used.

func (*Context) Run

func (c *Context) Run(f func() error) <-chan error

Run runs f in the Context and returns a channel that will be sent the result of f when f completes.

This may block until some queued up functions have finished running on the Context.

If you are not interested in the result of f, you can simply ignore the returned channel.

While f is running, no other function can run on the Context. This means that, to avoid deadlock, f should not use the Context.

type DevAttr

type DevAttr int

DevAttr is a CUDA device attribute.

const (
	DevAttrMaxThreadsPerBlock DevAttr = iota
	DevAttrMaxBlockDimX
	DevAttrMaxBlockDimY
	DevAttrMaxBlockDimZ
	DevAttrMaxGridDimX
	DevAttrMaxGridDimY
	DevAttrMaxGridDimZ
	DevAttrMaxSharedMemoryPerBlock
	DevAttrSharedMemoryPerBlock
	DevAttrTotalConstantMemory
	DevAttrWarpSize
	DevAttrMaxPitch
	DevAttrMaxRegistersPerBlock
	DevAttrRegistersPerBlock
	DevAttrClockRate
	DevAttrTextureAlignment
	DevAttrGPUOverlap
	DevAttrMultiprocessorCount
	DevAttrKernelExecTimeout
	DevAttrIntegrated
	DevAttrCanMapHostMemory
	DevAttrComputeMode
	DevAttrMaximumTexture1DWidth
	DevAttrMaximumTexture2DWidth
	DevAttrMaximumTexture2DHeight
	DevAttrMaximumTexture3DWidth
	DevAttrMaximumTexture3DHeight
	DevAttrMaximumTexture3DDepth
	DevAttrMaximumTexture2DLayeredWidth
	DevAttrMaximumTexture2DLayeredHeight
	DevAttrMaximumTexture2DLayeredLayers
	DevAttrMaximumTexture2DArrayWidth
	DevAttrMaximumTexture2DArrayHeight
	DevAttrMaximumTexture2DArrayNumslices
	DevAttrSurfaceAlignment
	DevAttrConcurrentKernels
	DevAttrECCEnabled
	DevAttrPCIBusID
	DevAttrPCIDeviceID
	DevAttrTCCDriver
	DevAttrMemoryClockRate
	DevAttrGlobalMemoryBusWidth
	DevAttrL2CacheSize
	DevAttrMaxThreadsPerMultiprocessor
	DevAttrAsyncEngineCount
	DevAttrUnifiedAddressing
	DevAttrMaximumTexture1DLayeredWidth
	DevAttrMaximumTexture1DLayeredLayers
	DevAttrCanTex2DGather
	DevAttrMaximumTexture2DGatherWidth
	DevAttrMaximumTexture2DGatherHeight
	DevAttrMaximumTexture3DWidthAlternate
	DevAttrMaximumTexture3DHeightAlternate
	DevAttrMaximumTexture3DDepthAlternate
	DevAttrPCIDomainID
	DevAttrTexturePitchAlignment
	DevAttrMaximumTexturecubemapWidth
	DevAttrMaximumTexturecubemapLayeredWidth
	DevAttrMaximumTexturecubemapLayeredLayers
	DevAttrMaximumSurface1DWidth
	DevAttrMaximumSurface2DWidth
	DevAttrMaximumSurface2DHeight
	DevAttrMaximumSurface3DWidth
	DevAttrMaximumSurface3DHeight
	DevAttrMaximumSurface3DDepth
	DevAttrMaximumSurface1DLayeredWidth
	DevAttrMaximumSurface1DLayeredLayers
	DevAttrMaximumSurface2DLayeredWidth
	DevAttrMaximumSurface2DLayeredHeight
	DevAttrMaximumSurface2DLayeredLayers
	DevAttrMaximumSurfacecubemapWidth
	DevAttrMaximumSurfacecubemapLayeredWidth
	DevAttrMaximumSurfacecubemapLayeredLayers
	DevAttrMaximumTexture1DLinearWidth
	DevAttrMaximumTexture2DLinearWidth
	DevAttrMaximumTexture2DLinearHeight
	DevAttrMaximumTexture2DLinearPitch
	DevAttrMaximumTexture2DMipmappedWidth
	DevAttrMaximumTexture2DMipmappedHeight
	DevAttrComputeCapabilityMajor
	DevAttrComputeCapabilityMinor
	DevAttrMaximumTexture1DMipmappedWidth
	DevAttrStreamPrioritiesSupported
	DevAttrGlobalL1CacheSupported
	DevAttrLocalL1CacheSupported
	DevAttrMaxSharedMemoryPerMultiprocessor
	DevAttrMaxRegistersPerMultiprocessor
	DevAttrManagedMemory
	DevAttrMultiGPUBoard
	DevAttrMultiGPUBoardGroupID
	DevAttrHostNativeAtomicSupported
	DevAttrSingleToDoublePrecisionPerfRatio
	DevAttrPageableMemoryAccess
	DevAttrConcurrentManagedAccess
	DevAttrComputePreemptionSupported
	DevAttrCanUseHostPointerForRegisteredMem
)

All supported device attributes.

type Device

type Device struct {
	// contains filtered or unexported fields
}

Device contains a unique ID for a CUDA device.

func AllDevices

func AllDevices() ([]*Device, error)

AllDevices lists the available CUDA devices.

This needn't be called from a Context.

func (*Device) Attr

func (d *Device) Attr(attr DevAttr) (int, error)

Attr gets an attribute of the device.

This needn't be called from a Context.

func (*Device) Name

func (d *Device) Name() (string, error)

Name gets the device's identifier string.

This needn't be called from a Context.

func (*Device) TotalMem

func (d *Device) TotalMem() (uint64, error)

TotalMem gets the device's total memory.

This needn't be called from a Context.

type Error

type Error struct {
	// Context is typically a C function name.
	Context string

	// Name is the C constant name for the error,
	// such as "CURAND_STATUS_INTERNAL_ERROR".
	Name string

	// Message is the main error message.
	//
	// This may be human-readable, although it may often be
	// the same as Name.
	Message string
}

Error is a CUDA-related error.

func (*Error) Error

func (e *Error) Error() string

Error generates a message "context: message".

type Module

type Module struct {
	// contains filtered or unexported fields
}

A Module manages a set of compiled kernels.

func NewModule

func NewModule(ctx *Context, ptx string) (*Module, error)

NewModule creates a Module by compiling a chunk of PTX code.

This should be called from within the Context.

You can build PTX code using the nvcc compiler like so:

nvcc --gpu-architecture=compute_30 --gpu-code=compute_30 --ptx kernels.cu

In the above example, you build "kernels.cu" to a PTX file called "kernels.ptx".

The word size of the PTX should match the word size of the Go program. Depending on your use case, you may want to compile separate PTX files for 32-bit and 64-bit hosts.

func (*Module) Launch

func (m *Module) Launch(kernel string, gridX, gridY, gridZ, blockX, blockY, blockZ,
	sharedMem uint, stream *Stream, args ...interface{}) error

Launch launches a kernel (which is referenced by name).

This should be called from within the same Context that NewModule was called from.

Currently, the following types may be used as kernel arguments:

uint
int
float32
float64
unsafe.Pointer
Buffer

To wait for the launched kernel to complete, use Synchronize() or stream.Synchronize() if you specified a non-nil stream.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

A Stream manages a pipeline of CUDA operations. Streams can be employed to achieve parallelism.

func NewStream

func NewStream(nonBlocking bool) (*Stream, error)

NewStream creates a new Stream.

If nonBlocking is true, then this stream will be able to run concurrently with the default stream.

This should be called in a Context.

func NewStreamPriority

func NewStreamPriority(nonBlocking bool, priority int) (*Stream, error)

NewStreamPriority is like NewStream, but the resulting stream is assigned a certain priority.

This should be called in a Context.

func (*Stream) Close

func (s *Stream) Close() error

Close destroys the stream.

This will return immediately, even if the stream is still doing work.

A stream should not be used after it is closed.

This should be called in a Context.

func (*Stream) Pointer

func (s *Stream) Pointer() unsafe.Pointer

Pointer returns the raw pointer value of the underlying stream object.

If s is nil, then a NULL pointer is returned.

This should be called in a Context.

func (*Stream) Synchronize

func (s *Stream) Synchronize() error

Synchronize waits for the stream's tasks to complete.

Directories

Path Synopsis
Package cublas provides bindings for the CUDA cuBLAS library.
Package cublas provides bindings for the CUDA cuBLAS library.
Package curand binds the CUDA cuRAND API to Go.
Package curand binds the CUDA cuRAND API to Go.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL