cuda

package module

v0.0.0-...-741e7c9 Latest Latest Go to latest Published: Sep 9, 2017 License: BSD-2-Clause Imports: 8 Imported by: 4

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/unixpickle/cuda

Links

Open Source Insights

README ¶

cuda

This is a Go package for interacting with CUDA. See the GoDoc for detailed usage information.

License

This is licensed under a BSD 2-clause license. See LICENSE.

Documentation ¶

Overview ¶

Package cuda provides bindings to the CUDA library.

Building ¶

To use this package, you must tell Go how to link with CUDA. On Mac OS X, this might look like:

export CUDA_PATH="/Developer/NVIDIA/CUDA-8.0"
export DYLD_LIBRARY_PATH="$CUDA_PATH/lib":$DYLD_LIBRARY_PATH
export CPATH="$CUDA_PATH/include/"
export CGO_LDFLAGS="/usr/local/cuda/lib/libcuda.dylib $CUDA_PATH/lib/libcudart.dylib $CUDA_PATH/lib/libcublas.dylib $CUDA_PATH/lib/libcurand.dylib"

On Linux, this might look like:

export CUDA_PATH=/usr/local/cuda
export CPATH="$CUDA_PATH/include/"
export CGO_LDFLAGS="$CUDA_PATH/lib64/libcublas.so $CUDA_PATH/lib64/libcudart.so $CUDA_PATH/lib64/stubs/libcuda.so $CUDA_PATH/lib64/libcurand.so"
export LD_LIBRARY_PATH=$CUDA_PATH/lib64/

Contexts ¶

Virtually every cuda API must be run from within a Context, which can be created like so:

devices, err := cuda.AllDevices()
if err != nil {
    // Handle error.
}
if len(devices) == 0 {
    // No devices found.
}
ctx, err := cuda.NewContext(devices[0], 10)
if err != nil {
    // Handle error.
}

To run code in a Context asynchronously, you can do the following:

ctx.Run(func() error {
    // My code here.
})

To run code synchronously, simply read from the resulting channel:

<-ctx.Run(func() error {
    // My code here.
})

You should never call ctx.Run() inside another call to ctx.Run(), for reasons that are documented on the Context.Run() method.

Memory Management ¶

There are two ways to deal with memory: using Buffers, or using an Allocator directly with unsafe.Pointers. The Buffer API provides a high-level buffer interface with garbage collection and bounds checking. Most APIs use Buffers, including the APIs provided by sub-packages.

No matter what, you will need an Allocator if you want to allocate memory. You can create an Allocator directly on top of CUDA:

allocator := cuda.GCAllocator(cuda.NativeAllocator(ctx), 0)

Once you have an allocator, you can use it to allocate Buffer objects like so:

err := <-ctx.Run(func() error {
    // Allocate 16 bytes.
    buffer, err := cuda.AllocBuffer(allocator, 16)
    if err != nil {
        return err
    }
    // Use the buffer here...
})

There are various functions to help you deal with buffers. The WriteBuffer() and ReadBuffer() functions allow you to copy Go slices to and from buffers. The Slice() function allows you to get a Buffer which points to a sub-region of a parent Buffer.

Kernels ¶

To run kernels, you will use a Module. You can pass various Go primitives, unsafe.Pointers, and Buffers as kernel arguments.

Sub-packages ¶

The cublas and curand sub-packages provide basic linear algebra routines and random number generators, respectively.

Index ¶

func ClearBuffer(b Buffer) error
func CopyBuffer(dst, src Buffer) error
func MemInfo() (free, total uint64, err error)
func Overlap(b1, b2 Buffer) bool
func ReadBuffer(val interface{}, b Buffer) error
func Synchronize() error
func WriteBuffer(b Buffer, val interface{}) error
type Allocator
- func BFCAllocator(ctx *Context, maxSize uintptr) (Allocator, error)
- func GCAllocator(a Allocator, frac float64) Allocator
- func NativeAllocator(ctx *Context) Allocator
type Buffer
- func AllocBuffer(a Allocator, size uintptr) (Buffer, error)
- func Slice(b Buffer, start, end uintptr) Buffer
- func WrapPointer(a Allocator, ptr unsafe.Pointer, size uintptr) Buffer
type Context
- func NewContext(d *Device, bufferSize int) (*Context, error)
- func (c *Context) Run(f func() error) <-chan error
type DevAttr
type Device
- func AllDevices() ([]*Device, error)
- func (d *Device) Attr(attr DevAttr) (int, error)
- func (d *Device) Name() (string, error)
- func (d *Device) TotalMem() (uint64, error)
type Error
- func (e *Error) Error() string
type Module
- func NewModule(ctx *Context, ptx string) (*Module, error)
- func (m *Module) Launch(kernel string, gridX, gridY, gridZ, blockX, blockY, blockZ, sharedMem uint, ...) error
type Stream
- func NewStream(nonBlocking bool) (*Stream, error)
- func NewStreamPriority(nonBlocking bool, priority int) (*Stream, error)
- func (s *Stream) Close() error
- func (s *Stream) Pointer() unsafe.Pointer
- func (s *Stream) Synchronize() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ClearBuffer ¶

func ClearBuffer(b Buffer) error

ClearBuffer writes zeros over the contents of a Buffer. It must be called from the correct Context.

func CopyBuffer ¶

func CopyBuffer(dst, src Buffer) error

CopyBuffer copies as many bytes as possible from src into dst.

The two Buffers must not contain overlapping regions of memory.

func MemInfo ¶

func MemInfo() (free, total uint64, err error)

MemInfo gets the free and total amount of memory available for allocation on the current device.

This must be called in a Context.

func Overlap ¶

func Overlap(b1, b2 Buffer) bool

Overlap checks if two buffers overlap in memory.

func ReadBuffer ¶

func ReadBuffer(val interface{}, b Buffer) error

ReadBuffer reads the data from a Buffer into a slice. This must be called from the correct Context.

See WriteBuffer for details on supported slice types.

func Synchronize ¶

func Synchronize() error

Synchronize waits for asynchronous operations to complete.

This should be called in a Context.

func WriteBuffer ¶

func WriteBuffer(b Buffer, val interface{}) error

WriteBuffer writes the data from a slice into a Buffer. It must be called from the correct Context.

Supported slice types are:

[]byte
[]float64
[]float32
[]int32
[]uint32

Similar to the copy() built-in, the maximum possible amount of data will be copied.

Types ¶

type Allocator ¶

type Allocator interface {
	// Get the Context in which all calls to this Allocator
	// should be made.
	//
	// Unlike Alloc and Free, this needn't be called from the
	// allocator's Context.
	Context() *Context

	// Allocate a chunk of CUDA memory.
	//
	// This should only be called from the Context.
	Alloc(size uintptr) (unsafe.Pointer, error)

	// Free a chunk of CUDA memory.
	//
	// The size passed to Free must be the same size that was
	// passed to Alloc().
	//
	// This should only be called from the Context.
	Free(ptr unsafe.Pointer, size uintptr)
}

An Allocator allocates and frees CUDA memory.

In general, Allocators are bound to a Context, meaning that they should only be used from within that Context.

Usually, you should prefer to use the Buffer type over a direct memory allocation, since Buffers take care of garbage collection for you.

Allocators are not responsible for zeroing out returned memory.

func BFCAllocator ¶

func BFCAllocator(ctx *Context, maxSize uintptr) (Allocator, error)

BFCAllocator creates an Allocator that uses memory coalescing and best-fitting to reduce memory fragmentation.

You should wrap the returned allocator with GCAllocator if you plan to use the Buffer API.

The maxSize argument specifies the maximum amount of memory to claim for the allocator. If it is 0, the allocator may claim nearly all of the available device memory.

If the CUDA_BFC_HEADROOM environment variable is set, it is used as the minimum number of bytes to leave free.

If the CUDA_BFC_MAX environment variable is set, it is used as an upper memory bound (in addition to maxSize).

This should be called from a Context.

func GCAllocator ¶

func GCAllocator(a Allocator, frac float64) Allocator

GCAllocator wraps an Allocator in a new Allocator which automatically triggers garbage collections.

The frac argument behaves similarly to the GOGC environment variable, except that GOGC is a percentage whereas frac is a ratio. Thus, a frac of 1.0 is equivalent to GOGC=100. If frac is 0, the value for GOGC is used.

If you are implementing your own Allocator, you will likely want to wrap it with GCAllocator so that it works nicely with the Buffer API.

This need not be called in a Context.

func NativeAllocator ¶

func NativeAllocator(ctx *Context) Allocator

NativeAllocator returns an Allocator that allocates directly from the CUDA APIs.

The resulting Allocator should be wrapped with GCAllocator if you plan to use it with the Buffer API.

This need not be called in a Context.

type Buffer ¶

type Buffer interface {
	// Allocator is the Allocator from which the Buffer was
	// allocated.
	Allocator() Allocator

	// Size is the size of the Buffer.
	Size() uintptr

	// WithPtr runs f with the pointer contained inside the
	// Buffer.
	// During the call to f, it is guaranteed that the Buffer
	// wil not be garbage collected.
	// However, nothing should store a reference to ptr after
	// f has completed.
	WithPtr(f func(ptr unsafe.Pointer))
}

A Buffer provides a high-level interface into an underlying CUDA buffer.

func AllocBuffer ¶

func AllocBuffer(a Allocator, size uintptr) (Buffer, error)

AllocBuffer allocates a new Buffer.

This must be called in the Allocator's Context.

This does not zero out the returned memory. To do that, you should use ClearBuffer().

func Slice ¶

func Slice(b Buffer, start, end uintptr) Buffer

Slice creates a Buffer which views some part of the contents of another Buffer. The start and end indexes are inclusive and exclusive, respectively.

func WrapPointer ¶

func WrapPointer(a Allocator, ptr unsafe.Pointer, size uintptr) Buffer

WrapPointer wraps a pointer in a Buffer. You must specify the Allocator from which the pointer originated and the size of the buffer.

After calling this, you should not use the pointer outside of the buffer. The Buffer will automatically free the pointer.

type Context ¶

type Context struct {
	// contains filtered or unexported fields
}

A Context maintains a CUDA-dedicated thread. All CUDA code should be run by a Context.

func NewContext ¶

func NewContext(d *Device, bufferSize int) (*Context, error)

NewContext creates a new Context on the Device.

The bufferSize is the maximum number of asynchronous calls that can be queued up at once. A larger buffer size means that Run() is less likely to block, all else equal.

If bufferSize is -1, then the CUDA_CTX_BUFFER environment variable is used. If bufferSize is -1 and CUDA_CTX_BUFFER is not set, a reasonable default is used.

func (*Context) Run ¶

func (c *Context) Run(f func() error) <-chan error

Run runs f in the Context and returns a channel that will be sent the result of f when f completes.

This may block until some queued up functions have finished running on the Context.

If you are not interested in the result of f, you can simply ignore the returned channel.

While f is running, no other function can run on the Context. This means that, to avoid deadlock, f should not use the Context.

type DevAttr ¶

type DevAttr int

DevAttr is a CUDA device attribute.

const (
	DevAttrMaxThreadsPerBlock DevAttr = iota
	DevAttrMaxBlockDimX
	DevAttrMaxBlockDimY
	DevAttrMaxBlockDimZ
	DevAttrMaxGridDimX
	DevAttrMaxGridDimY
	DevAttrMaxGridDimZ
	DevAttrMaxSharedMemoryPerBlock
	DevAttrSharedMemoryPerBlock
	DevAttrTotalConstantMemory
	DevAttrWarpSize
	DevAttrMaxPitch
	DevAttrMaxRegistersPerBlock
	DevAttrRegistersPerBlock
	DevAttrClockRate
	DevAttrTextureAlignment
	DevAttrGPUOverlap
	DevAttrMultiprocessorCount
	DevAttrKernelExecTimeout
	DevAttrIntegrated
	DevAttrCanMapHostMemory
	DevAttrComputeMode
	DevAttrMaximumTexture1DWidth
	DevAttrMaximumTexture2DWidth
	DevAttrMaximumTexture2DHeight
	DevAttrMaximumTexture3DWidth
	DevAttrMaximumTexture3DHeight
	DevAttrMaximumTexture3DDepth
	DevAttrMaximumTexture2DLayeredWidth
	DevAttrMaximumTexture2DLayeredHeight
	DevAttrMaximumTexture2DLayeredLayers
	DevAttrMaximumTexture2DArrayWidth
	DevAttrMaximumTexture2DArrayHeight
	DevAttrMaximumTexture2DArrayNumslices
	DevAttrSurfaceAlignment
	DevAttrConcurrentKernels
	DevAttrECCEnabled
	DevAttrPCIBusID
	DevAttrPCIDeviceID
	DevAttrTCCDriver
	DevAttrMemoryClockRate
	DevAttrGlobalMemoryBusWidth
	DevAttrL2CacheSize
	DevAttrMaxThreadsPerMultiprocessor
	DevAttrAsyncEngineCount
	DevAttrUnifiedAddressing
	DevAttrMaximumTexture1DLayeredWidth
	DevAttrMaximumTexture1DLayeredLayers
	DevAttrCanTex2DGather
	DevAttrMaximumTexture2DGatherWidth
	DevAttrMaximumTexture2DGatherHeight
	DevAttrMaximumTexture3DWidthAlternate
	DevAttrMaximumTexture3DHeightAlternate
	DevAttrMaximumTexture3DDepthAlternate
	DevAttrPCIDomainID
	DevAttrTexturePitchAlignment
	DevAttrMaximumTexturecubemapWidth
	DevAttrMaximumTexturecubemapLayeredWidth
	DevAttrMaximumTexturecubemapLayeredLayers
	DevAttrMaximumSurface1DWidth
	DevAttrMaximumSurface2DWidth
	DevAttrMaximumSurface2DHeight
	DevAttrMaximumSurface3DWidth
	DevAttrMaximumSurface3DHeight
	DevAttrMaximumSurface3DDepth
	DevAttrMaximumSurface1DLayeredWidth
	DevAttrMaximumSurface1DLayeredLayers
	DevAttrMaximumSurface2DLayeredWidth
	DevAttrMaximumSurface2DLayeredHeight
	DevAttrMaximumSurface2DLayeredLayers
	DevAttrMaximumSurfacecubemapWidth
	DevAttrMaximumSurfacecubemapLayeredWidth
	DevAttrMaximumSurfacecubemapLayeredLayers
	DevAttrMaximumTexture1DLinearWidth
	DevAttrMaximumTexture2DLinearWidth
	DevAttrMaximumTexture2DLinearHeight
	DevAttrMaximumTexture2DLinearPitch
	DevAttrMaximumTexture2DMipmappedWidth
	DevAttrMaximumTexture2DMipmappedHeight
	DevAttrComputeCapabilityMajor
	DevAttrComputeCapabilityMinor
	DevAttrMaximumTexture1DMipmappedWidth
	DevAttrStreamPrioritiesSupported
	DevAttrGlobalL1CacheSupported
	DevAttrLocalL1CacheSupported
	DevAttrMaxSharedMemoryPerMultiprocessor
	DevAttrMaxRegistersPerMultiprocessor
	DevAttrManagedMemory
	DevAttrMultiGPUBoard
	DevAttrMultiGPUBoardGroupID
	DevAttrHostNativeAtomicSupported
	DevAttrSingleToDoublePrecisionPerfRatio
	DevAttrPageableMemoryAccess
	DevAttrConcurrentManagedAccess
	DevAttrComputePreemptionSupported
	DevAttrCanUseHostPointerForRegisteredMem
)

All supported device attributes.

type Device ¶

type Device struct {
	// contains filtered or unexported fields
}

Device contains a unique ID for a CUDA device.

func AllDevices ¶

func AllDevices() ([]*Device, error)

AllDevices lists the available CUDA devices.

This needn't be called from a Context.

func (*Device) Attr ¶

func (d *Device) Attr(attr DevAttr) (int, error)

Attr gets an attribute of the device.

This needn't be called from a Context.

func (*Device) Name ¶

func (d *Device) Name() (string, error)

Name gets the device's identifier string.

This needn't be called from a Context.

func (*Device) TotalMem ¶

func (d *Device) TotalMem() (uint64, error)

TotalMem gets the device's total memory.

This needn't be called from a Context.

type Error ¶

type Error struct {
	// Context is typically a C function name.
	Context string

	// Name is the C constant name for the error,
	// such as "CURAND_STATUS_INTERNAL_ERROR".
	Name string

	// Message is the main error message.
	//
	// This may be human-readable, although it may often be
	// the same as Name.
	Message string
}

Error is a CUDA-related error.

func (*Error) Error ¶

func (e *Error) Error() string

Error generates a message "context: message".

type Module ¶

type Module struct {
	// contains filtered or unexported fields
}

A Module manages a set of compiled kernels.

func NewModule ¶

func NewModule(ctx *Context, ptx string) (*Module, error)

NewModule creates a Module by compiling a chunk of PTX code.

This should be called from within the Context.

You can build PTX code using the nvcc compiler like so:

nvcc --gpu-architecture=compute_30 --gpu-code=compute_30 --ptx kernels.cu

In the above example, you build "kernels.cu" to a PTX file called "kernels.ptx".

The word size of the PTX should match the word size of the Go program. Depending on your use case, you may want to compile separate PTX files for 32-bit and 64-bit hosts.

func (*Module) Launch ¶

func (m *Module) Launch(kernel string, gridX, gridY, gridZ, blockX, blockY, blockZ,
	sharedMem uint, stream *Stream, args ...interface{}) error

Launch launches a kernel (which is referenced by name).

This should be called from within the same Context that NewModule was called from.

Currently, the following types may be used as kernel arguments:

uint
int
float32
float64
unsafe.Pointer
Buffer

To wait for the launched kernel to complete, use Synchronize() or stream.Synchronize() if you specified a non-nil stream.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

A Stream manages a pipeline of CUDA operations. Streams can be employed to achieve parallelism.

func NewStream ¶

func NewStream(nonBlocking bool) (*Stream, error)

NewStream creates a new Stream.

If nonBlocking is true, then this stream will be able to run concurrently with the default stream.

This should be called in a Context.

func NewStreamPriority ¶

func NewStreamPriority(nonBlocking bool, priority int) (*Stream, error)

NewStreamPriority is like NewStream, but the resulting stream is assigned a certain priority.

This should be called in a Context.

func (*Stream) Close ¶

func (s *Stream) Close() error

Close destroys the stream.

This will return immediately, even if the stream is still doing work.

A stream should not be used after it is closed.

This should be called in a Context.

func (*Stream) Pointer ¶

func (s *Stream) Pointer() unsafe.Pointer

Pointer returns the raw pointer value of the underlying stream object.

If s is nil, then a NULL pointer is returned.

This should be called in a Context.

func (*Stream) Synchronize ¶

func (s *Stream) Synchronize() error

Synchronize waits for the stream's tasks to complete.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cublas Package cublas provides bindings for the CUDA cuBLAS library.	Package cublas provides bindings for the CUDA cuBLAS library.
curand Package curand binds the CUDA cuRAND API to Go.	Package curand binds the CUDA cuRAND API to Go.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL