kernel

package
v0.0.0-...-4bf4b70 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2021 License: Apache-2.0, MIT Imports: 82 Imported by: 0

Documentation

Overview

Package kernel provides an emulation of the Linux kernel.

See README.md for a detailed overview.

Lock order (outermost locks must be taken first):

Kernel.extMu

ThreadGroup.timerMu
  ktime.Timer.mu (for kernelCPUClockTicker and IntervalTimer)
    TaskSet.mu
      SignalHandlers.mu
        Task.mu
    runningTasksMu

Locking SignalHandlers.mu in multiple SignalHandlers requires locking TaskSet.mu exclusively first. Locking Task.mu in multiple Tasks at the same time requires locking all of their signal mutexes first.

Index

Constants

View Source
const (
	// CtxCanTrace is a Context.Value key for a function with the same
	// signature and semantics as kernel.Task.CanTrace.
	CtxCanTrace contextID = iota

	// CtxKernel is a Context.Value key for a Kernel.
	CtxKernel

	// CtxPIDNamespace is a Context.Value key for a PIDNamespace.
	CtxPIDNamespace

	// CtxTask is a Context.Value key for a Task.
	CtxTask

	// CtxUTSNamespace is a Context.Value key for a UTSNamespace.
	CtxUTSNamespace

	// CtxIPCNamespace is a Context.Value key for a IPCNamespace.
	CtxIPCNamespace
)
View Source
const (
	// SupportUndocumented indicates the syscall is not documented yet.
	SupportUndocumented = iota

	// SupportUnimplemented indicates the syscall is unimplemented.
	SupportUnimplemented

	// SupportPartial indicates the syscall is partially supported.
	SupportPartial

	// SupportFull indicates the syscall is fully supported.
	SupportFull
)
View Source
const (

	// StraceEnableLog enables syscall log tracing.
	StraceEnableLog

	// StraceEnableEvent enables syscall event tracing.
	StraceEnableEvent

	// ExternalBeforeEnable enables the external hook before syscall execution.
	ExternalBeforeEnable

	// ExternalAfterEnable enables the external hook after syscall execution.
	ExternalAfterEnable
)

Possible flags for SyscallFlagsTable.enable.

View Source
const (
	// EventExit represents an exit notification generated for a child thread
	// group leader or a tracee under the conditions specified in the comment
	// above runExitNotify.
	EventExit waiter.EventMask = 1 << iota

	// EventChildGroupStop occurs when a child thread group completes a group
	// stop (i.e. all tasks in the child thread group have entered a stopped
	// state as a result of a group stop).
	EventChildGroupStop

	// EventTraceeStop occurs when a task that is ptraced by a task in the
	// notified thread group enters a ptrace stop (see ptrace(2)).
	EventTraceeStop

	// EventGroupContinue occurs when a child thread group, or a thread group
	// whose leader is ptraced by a task in the notified thread group, that had
	// initiated or completed a group stop leaves the group stop, due to the
	// child thread group or any task in the child thread group being sent
	// SIGCONT.
	EventGroupContinue
)

Task events that can be waited for.

View Source
const FDTableenableLogging = false

enableLogging indicates whether reference-related events should be logged (with stack traces). This is false by default and should only be set to true for debugging purposes, as it can generate an extremely large amount of output and drastically degrade performance.

View Source
const FSContextenableLogging = false

enableLogging indicates whether reference-related events should be logged (with stack traces). This is false by default and should only be set to true for debugging purposes, as it can generate an extremely large amount of output and drastically degrade performance.

View Source
const IPCNamespaceenableLogging = false

enableLogging indicates whether reference-related events should be logged (with stack traces). This is false by default and should only be set to true for debugging purposes, as it can generate an extremely large amount of output and drastically degrade performance.

View Source
const ProcessGroupenableLogging = false

enableLogging indicates whether reference-related events should be logged (with stack traces). This is false by default and should only be set to true for debugging purposes, as it can generate an extremely large amount of output and drastically degrade performance.

View Source
const SessionenableLogging = false

enableLogging indicates whether reference-related events should be logged (with stack traces). This is false by default and should only be set to true for debugging purposes, as it can generate an extremely large amount of output and drastically degrade performance.

View Source
const SignalPanic = linux.SIGUSR2

SignalPanic is used to panic the running threads. It is a signal which cannot be used by the application: it must be caught and ignored by the runtime (in order to catch possible races).

View Source
const StraceEnableBits = StraceEnableLog | StraceEnableEvent

StraceEnableBits combines both strace log and event flags.

View Source
const TasksLimit = (1 << 16)

TasksLimit is the maximum number of threads for untrusted application. Linux doesn't really limit this directly, rather it is limited by total memory size, stacks allocated and a global maximum. There's no real reason for us to limit it either, (esp. since threads are backed by go routines), and we would expect to hit resource limits long before hitting this number. However, for correctness, we still check that the user doesn't exceed this number.

Note that because of the way futexes are implemented, there *are* in fact serious restrictions on valid thread IDs. They are limited to 2^30 - 1 (kernel/fork.c:MAX_THREADS).

Variables

View Source
var (
	// CtrlDoExit is returned by the implementations of the exit and exit_group
	// syscalls to enter the task exit path directly, skipping syscall exit
	// tracing.
	CtrlDoExit = &SyscallControl{next: (*runExit)(nil), ignoreReturn: true}
)
View Source
var ErrNoWaitableEvent = errors.New("non-blocking Wait found eligible threads but no waitable events")

ErrNoWaitableEvent is returned by non-blocking Task.Waits (e.g. waitpid(WNOHANG)) that find no waitable events, but determine that waitable events may exist in the future. (In contrast, if a non-blocking or blocking Wait determines that there are no tasks that can produce a waitable event, Task.Wait returns ECHILD.)

View Source
var FUSEEnabled = false

FUSEEnabled is set to true when FUSE is enabled. Added as a global for allow easy access everywhere. To be removed once FUSE is completed.

View Source
var MAX_RW_COUNT = int(usermem.Addr(math.MaxInt32).RoundDown())

MAX_RW_COUNT is the maximum size in bytes of a single read or write. Reads and writes that exceed this size may be silently truncated. (Linux: include/linux/fs.h:MAX_RW_COUNT)

StopSignals is the set of signals whose default action is SignalActionStop.

UnblockableSignals contains the set of signals which cannot be blocked.

View Source
var VFS2Enabled = false

VFS2Enabled is set to true when VFS2 is enabled. Added as a global for allow easy access everywhere. To be removed once VFS2 becomes the default.

Functions

func ContextCanTrace

func ContextCanTrace(ctx context.Context, t *Task, attach bool) bool

ContextCanTrace returns true if ctx is permitted to trace t, in the same sense as kernel.Task.CanTrace.

func ExtractErrno

func ExtractErrno(err error, sysno int) int

ExtractErrno extracts an integer error number from the error. The syscall number is purely for context in the error case. Use -1 if syscall number is unknown.

func RegisterSyscallTable

func RegisterSyscallTable(s *SyscallTable)

RegisterSyscallTable registers a new syscall table for use by a Kernel.

func SignalInfoNoInfo

func SignalInfoNoInfo(sig linux.Signal, sender, receiver *Task) *arch.SignalInfo

SignalInfoNoInfo returns a SignalInfo equivalent to Linux's SEND_SIG_NOINFO.

func SignalInfoPriv

func SignalInfoPriv(sig linux.Signal) *arch.SignalInfo

SignalInfoPriv returns a SignalInfo equivalent to Linux's SEND_SIG_PRIV.

Types

type AIOCallback

type AIOCallback func(context.Context)

AIOCallback is an function that does asynchronous I/O on behalf of a task.

type AbstractSocketNamespace

type AbstractSocketNamespace struct {
	// contains filtered or unexported fields
}

AbstractSocketNamespace is used to implement the Linux abstract socket functionality.

+stateify savable

func NewAbstractSocketNamespace

func NewAbstractSocketNamespace() *AbstractSocketNamespace

NewAbstractSocketNamespace returns a new AbstractSocketNamespace.

func (*AbstractSocketNamespace) Bind

Bind binds the given socket.

When the last reference managed by socket is dropped, ep may be removed from the namespace.

func (*AbstractSocketNamespace) BoundEndpoint

func (a *AbstractSocketNamespace) BoundEndpoint(name string) transport.BoundEndpoint

BoundEndpoint retrieves the endpoint bound to the given name. The return value is nil if no endpoint was bound.

func (*AbstractSocketNamespace) Remove

func (a *AbstractSocketNamespace) Remove(name string, socket refsvfs2.RefCounter)

Remove removes the specified socket at name from the abstract socket namespace, if it has not yet been replaced.

func (*AbstractSocketNamespace) StateFields

func (a *AbstractSocketNamespace) StateFields() []string

func (*AbstractSocketNamespace) StateLoad

func (a *AbstractSocketNamespace) StateLoad(stateSourceObject state.Source)

func (*AbstractSocketNamespace) StateSave

func (a *AbstractSocketNamespace) StateSave(stateSinkObject state.Sink)

func (*AbstractSocketNamespace) StateTypeName

func (a *AbstractSocketNamespace) StateTypeName() string

type Auxmap

type Auxmap map[string]interface{}

Auxmap contains miscellaneous data for the task.

type CloneOptions

type CloneOptions struct {
	// SharingOptions defines the set of resources that the new task will share
	// with its parent.
	SharingOptions

	// Stack is the initial stack pointer of the new task. If Stack is 0, the
	// new task will start with the same stack pointer as its parent.
	Stack usermem.Addr

	// If SetTLS is true, set the new task's TLS (thread-local storage)
	// descriptor to TLS. If SetTLS is false, TLS is ignored.
	SetTLS bool
	TLS    usermem.Addr

	// If ChildClearTID is true, when the child exits, 0 is written to the
	// address ChildTID in the child's memory, and if the write is successful a
	// futex wake on the same address is performed.
	//
	// If ChildSetTID is true, the child's thread ID (in the child's PID
	// namespace) is written to address ChildTID in the child's memory. (As in
	// Linux, failed writes are silently ignored.)
	ChildClearTID bool
	ChildSetTID   bool
	ChildTID      usermem.Addr

	// If ParentSetTID is true, the child's thread ID (in the parent's PID
	// namespace) is written to address ParentTID in the parent's memory. (As
	// in Linux, failed writes are silently ignored.)
	//
	// Older versions of the clone(2) man page state that CLONE_PARENT_SETTID
	// causes the child's thread ID to be written to ptid in both the parent
	// and child's memory, but this is a documentation error fixed by
	// 87ab04792ced ("clone.2: Fix description of CLONE_PARENT_SETTID").
	ParentSetTID bool
	ParentTID    usermem.Addr

	// If Vfork is true, place the parent in vforkStop until the cloned task
	// releases its TaskImage.
	Vfork bool

	// If Untraced is true, do not report PTRACE_EVENT_CLONE/FORK/VFORK for
	// this clone(), and do not ptrace-attach the caller's tracer to the new
	// task. (PTRACE_EVENT_VFORK_DONE will still be reported if appropriate).
	Untraced bool

	// If InheritTracer is true, ptrace-attach the caller's tracer to the new
	// task, even if no PTRACE_EVENT_CLONE/FORK/VFORK event would be reported
	// for it. If both Untraced and InheritTracer are true, no event will be
	// reported, but tracer inheritance will still occur.
	InheritTracer bool
}

CloneOptions controls the behavior of Task.Clone.

type CreateProcessArgs

type CreateProcessArgs struct {
	// Filename is the filename to load as the init binary.
	//
	// If this is provided as "", File will be checked, then the file will be
	// guessed via Argv[0].
	Filename string

	// File is a passed host FD pointing to a file to load as the init binary.
	//
	// This is checked if and only if Filename is "".
	File fsbridge.File

	// Argvv is a list of arguments.
	Argv []string

	// Envv is a list of environment variables.
	Envv []string

	// WorkingDirectory is the initial working directory.
	//
	// This defaults to the root if empty.
	WorkingDirectory string

	// Credentials is the initial credentials.
	Credentials *auth.Credentials

	// FDTable is the initial set of file descriptors. If CreateProcess succeeds,
	// it takes a reference on FDTable.
	FDTable *FDTable

	// Umask is the initial umask.
	Umask uint

	// Limits is the initial resource limits.
	Limits *limits.LimitSet

	// MaxSymlinkTraversals is the maximum number of symlinks to follow
	// during resolution.
	MaxSymlinkTraversals uint

	// UTSNamespace is the initial UTS namespace.
	UTSNamespace *UTSNamespace

	// IPCNamespace is the initial IPC namespace.
	IPCNamespace *IPCNamespace

	// PIDNamespace is the initial PID Namespace.
	PIDNamespace *PIDNamespace

	// AbstractSocketNamespace is the initial Abstract Socket namespace.
	AbstractSocketNamespace *AbstractSocketNamespace

	// MountNamespace optionally contains the mount namespace for this
	// process. If nil, the init process's mount namespace is used.
	//
	// Anyone setting MountNamespace must donate a reference (i.e.
	// increment it).
	MountNamespace *fs.MountNamespace

	// MountNamespaceVFS2 optionally contains the mount namespace for this
	// process. If nil, the init process's mount namespace is used.
	//
	// Anyone setting MountNamespaceVFS2 must donate a reference (i.e.
	// increment it).
	MountNamespaceVFS2 *vfs.MountNamespace

	// ContainerID is the container that the process belongs to.
	ContainerID string
}

CreateProcessArgs holds arguments to kernel.CreateProcess.

func (*CreateProcessArgs) NewContext

func (args *CreateProcessArgs) NewContext(k *Kernel) *createProcessContext

NewContext returns a context.Context that represents the task that will be created by args.NewContext(k).

type ExitStatus

type ExitStatus struct {
	// Code is the numeric value passed to the call to exit or exit_group that
	// caused the exit. If the exit was not caused by such a call, Code is 0.
	Code int

	// Signo is the signal that caused the exit. If the exit was not caused by
	// a signal, Signo is 0.
	Signo int
}

An ExitStatus is a value communicated from an exiting task or thread group to the party that reaps it.

+stateify savable

func (ExitStatus) ShellExitCode

func (es ExitStatus) ShellExitCode() int

ShellExitCode returns the numeric exit code that Bash would return for an exit status of es.

func (ExitStatus) Signaled

func (es ExitStatus) Signaled() bool

Signaled returns true if the ExitStatus indicates that the exiting task or thread group was killed by a signal.

func (*ExitStatus) StateFields

func (es *ExitStatus) StateFields() []string

func (*ExitStatus) StateLoad

func (es *ExitStatus) StateLoad(stateSourceObject state.Source)

func (*ExitStatus) StateSave

func (es *ExitStatus) StateSave(stateSinkObject state.Sink)

func (*ExitStatus) StateTypeName

func (es *ExitStatus) StateTypeName() string

func (ExitStatus) Status

func (es ExitStatus) Status() uint32

Status returns the numeric representation of the ExitStatus returned by e.g. the wait4() system call.

type FDFlags

type FDFlags struct {
	// CloseOnExec indicates the descriptor should be closed on exec.
	CloseOnExec bool
}

FDFlags define flags for an individual descriptor.

+stateify savable

func (*FDFlags) StateFields

func (f *FDFlags) StateFields() []string

func (*FDFlags) StateLoad

func (f *FDFlags) StateLoad(stateSourceObject state.Source)

func (*FDFlags) StateSave

func (f *FDFlags) StateSave(stateSinkObject state.Sink)

func (*FDFlags) StateTypeName

func (f *FDFlags) StateTypeName() string

func (FDFlags) ToLinuxFDFlags

func (f FDFlags) ToLinuxFDFlags() (mask uint)

ToLinuxFDFlags converts a kernel.FDFlags object to a Linux descriptor flags representation.

func (FDFlags) ToLinuxFileFlags

func (f FDFlags) ToLinuxFileFlags() (mask uint)

ToLinuxFileFlags converts a kernel.FDFlags object to a Linux file flags representation.

type FDTable

type FDTable struct {
	FDTableRefs
	// contains filtered or unexported fields
}

FDTable is used to manage File references and flags.

+stateify savable

var FDTableobj *FDTable

obj is used to customize logging. Note that we use a pointer to T so that we do not copy the entire object when passed as a format parameter.

func (*FDTable) CurrentMaxFDs

func (f *FDTable) CurrentMaxFDs() int

CurrentMaxFDs returns the number of file descriptors that may be stored in f without reallocation.

func (*FDTable) DecRef

func (f *FDTable) DecRef(ctx context.Context)

DecRef implements RefCounter.DecRef.

If f reaches zero references, all of its file descriptors are removed.

func (*FDTable) Fork

func (f *FDTable) Fork(ctx context.Context) *FDTable

Fork returns an independent FDTable.

func (*FDTable) Get

func (f *FDTable) Get(fd int32) (*fs.File, FDFlags)

Get returns a reference to the file and the flags for the FD or nil if no file is defined for the given fd.

N.B. Callers are required to use DecRef when they are done.

func (*FDTable) GetFDs

func (f *FDTable) GetFDs(ctx context.Context) []int32

GetFDs returns a sorted list of valid fds.

Precondition: The caller must be running on the task goroutine, or Task.mu must be locked.

func (*FDTable) GetVFS2

func (f *FDTable) GetVFS2(fd int32) (*vfs.FileDescription, FDFlags)

GetVFS2 returns a reference to the file and the flags for the FD or nil if no file is defined for the given fd.

N.B. Callers are required to use DecRef when they are done.

func (*FDTable) NewFDAt

func (f *FDTable) NewFDAt(ctx context.Context, fd int32, file *fs.File, flags FDFlags) error

NewFDAt sets the file reference for the given FD. If there is an active reference for that FD, the ref count for that existing reference is decremented.

func (*FDTable) NewFDAtVFS2

func (f *FDTable) NewFDAtVFS2(ctx context.Context, fd int32, file *vfs.FileDescription, flags FDFlags) error

NewFDAtVFS2 sets the file reference for the given FD. If there is an active reference for that FD, the ref count for that existing reference is decremented.

func (*FDTable) NewFDVFS2

func (f *FDTable) NewFDVFS2(ctx context.Context, minfd int32, file *vfs.FileDescription, flags FDFlags) (int32, error)

NewFDVFS2 allocates a file descriptor greater than or equal to minfd for the given file description. If it succeeds, it takes a reference on file.

func (*FDTable) NewFDs

func (f *FDTable) NewFDs(ctx context.Context, fd int32, files []*fs.File, flags FDFlags) (fds []int32, err error)

NewFDs allocates new FDs guaranteed to be the lowest number available greater than or equal to the fd parameter. All files will share the set flags. Success is guaranteed to be all or none.

func (*FDTable) NewFDsVFS2

func (f *FDTable) NewFDsVFS2(ctx context.Context, fd int32, files []*vfs.FileDescription, flags FDFlags) (fds []int32, err error)

NewFDsVFS2 allocates new FDs guaranteed to be the lowest number available greater than or equal to the fd parameter. All files will share the set flags. Success is guaranteed to be all or none.

func (*FDTable) Remove

func (f *FDTable) Remove(ctx context.Context, fd int32) (*fs.File, *vfs.FileDescription)

Remove removes an FD from and returns a non-file iff successful.

N.B. Callers are required to use DecRef when they are done.

func (*FDTable) RemoveIf

func (f *FDTable) RemoveIf(ctx context.Context, cond func(*fs.File, *vfs.FileDescription, FDFlags) bool)

RemoveIf removes all FDs where cond is true.

func (*FDTable) SetFlags

func (f *FDTable) SetFlags(ctx context.Context, fd int32, flags FDFlags) error

SetFlags sets the flags for the given file descriptor.

True is returned iff flags were changed.

func (*FDTable) SetFlagsVFS2

func (f *FDTable) SetFlagsVFS2(ctx context.Context, fd int32, flags FDFlags) error

SetFlagsVFS2 sets the flags for the given file descriptor.

True is returned iff flags were changed.

func (*FDTable) StateFields

func (f *FDTable) StateFields() []string

func (*FDTable) StateLoad

func (f *FDTable) StateLoad(stateSourceObject state.Source)

func (*FDTable) StateSave

func (f *FDTable) StateSave(stateSinkObject state.Sink)

func (*FDTable) StateTypeName

func (f *FDTable) StateTypeName() string

func (*FDTable) String

func (f *FDTable) String() string

String is a stringer for FDTable.

type FDTableRefs

type FDTableRefs struct {
	// contains filtered or unexported fields
}

Refs implements refs.RefCounter. It keeps a reference count using atomic operations and calls the destructor when the count reaches zero.

+stateify savable

func (*FDTableRefs) DecRef

func (r *FDTableRefs) DecRef(destroy func())

DecRef implements refs.RefCounter.DecRef.

Note that speculative references are counted here. Since they were added prior to real references reaching zero, they will successfully convert to real references. In other words, we see speculative references only in the following case:

A: TryIncRef [speculative increase => sees non-negative references]
B: DecRef [real decrease]
A: TryIncRef [transform speculative to real]

func (*FDTableRefs) IncRef

func (r *FDTableRefs) IncRef()

IncRef implements refs.RefCounter.IncRef.

func (*FDTableRefs) InitRefs

func (r *FDTableRefs) InitRefs()

InitRefs initializes r with one reference and, if enabled, activates leak checking.

func (*FDTableRefs) LeakMessage

func (r *FDTableRefs) LeakMessage() string

LeakMessage implements refsvfs2.CheckedObject.LeakMessage.

func (*FDTableRefs) LogRefs

func (r *FDTableRefs) LogRefs() bool

LogRefs implements refsvfs2.CheckedObject.LogRefs.

func (*FDTableRefs) ReadRefs

func (r *FDTableRefs) ReadRefs() int64

ReadRefs returns the current number of references. The returned count is inherently racy and is unsafe to use without external synchronization.

func (*FDTableRefs) RefType

func (r *FDTableRefs) RefType() string

RefType implements refsvfs2.CheckedObject.RefType.

func (*FDTableRefs) StateFields

func (r *FDTableRefs) StateFields() []string

func (*FDTableRefs) StateLoad

func (r *FDTableRefs) StateLoad(stateSourceObject state.Source)

func (*FDTableRefs) StateSave

func (r *FDTableRefs) StateSave(stateSinkObject state.Sink)

func (*FDTableRefs) StateTypeName

func (r *FDTableRefs) StateTypeName() string

func (*FDTableRefs) TryIncRef

func (r *FDTableRefs) TryIncRef() bool

TryIncRef implements refs.RefCounter.TryIncRef.

To do this safely without a loop, a speculative reference is first acquired on the object. This allows multiple concurrent TryIncRef calls to distinguish other TryIncRef calls from genuine references held.

type FSContext

type FSContext struct {
	FSContextRefs
	// contains filtered or unexported fields
}

FSContext contains filesystem context.

This includes umask and working directory.

+stateify savable

var FSContextobj *FSContext

obj is used to customize logging. Note that we use a pointer to T so that we do not copy the entire object when passed as a format parameter.

func NewFSContextVFS2

func NewFSContextVFS2(root, cwd vfs.VirtualDentry, umask uint) *FSContext

NewFSContextVFS2 returns a new filesystem context.

func (*FSContext) DecRef

func (f *FSContext) DecRef(ctx context.Context)

DecRef implements RefCounter.DecRef.

When f reaches zero references, DecRef will be called on both root and cwd Dirents.

Note that there may still be calls to WorkingDirectory() or RootDirectory() (that return nil). This is because valid references may still be held via proc files or other mechanisms.

func (*FSContext) Fork

func (f *FSContext) Fork() *FSContext

Fork forks this FSContext.

This is not a valid call after f is destroyed.

func (*FSContext) RootDirectory

func (f *FSContext) RootDirectory() *fs.Dirent

RootDirectory returns the current filesystem root.

This will return nil if called after f is destroyed, otherwise it will return a Dirent with a reference taken.

func (*FSContext) RootDirectoryVFS2

func (f *FSContext) RootDirectoryVFS2() vfs.VirtualDentry

RootDirectoryVFS2 returns the current filesystem root.

This will return an empty vfs.VirtualDentry if called after f is destroyed, otherwise it will return a Dirent with a reference taken.

func (*FSContext) SetRootDirectory

func (f *FSContext) SetRootDirectory(ctx context.Context, d *fs.Dirent)

SetRootDirectory sets the root directory. This will take an extra reference on the Dirent.

This is not a valid call after f is destroyed.

func (*FSContext) SetRootDirectoryVFS2

func (f *FSContext) SetRootDirectoryVFS2(ctx context.Context, vd vfs.VirtualDentry)

SetRootDirectoryVFS2 sets the root directory. It takes a reference on vd.

This is not a valid call after f is destroyed.

func (*FSContext) SetWorkingDirectory

func (f *FSContext) SetWorkingDirectory(ctx context.Context, d *fs.Dirent)

SetWorkingDirectory sets the current working directory. This will take an extra reference on the Dirent.

This is not a valid call after f is destroyed.

func (*FSContext) SetWorkingDirectoryVFS2

func (f *FSContext) SetWorkingDirectoryVFS2(ctx context.Context, d vfs.VirtualDentry)

SetWorkingDirectoryVFS2 sets the current working directory. This will take an extra reference on the VirtualDentry.

This is not a valid call after f is destroyed.

func (*FSContext) StateFields

func (f *FSContext) StateFields() []string

func (*FSContext) StateLoad

func (f *FSContext) StateLoad(stateSourceObject state.Source)

func (*FSContext) StateSave

func (f *FSContext) StateSave(stateSinkObject state.Sink)

func (*FSContext) StateTypeName

func (f *FSContext) StateTypeName() string

func (*FSContext) SwapUmask

func (f *FSContext) SwapUmask(mask uint) uint

SwapUmask atomically sets the current umask and returns the old umask.

func (*FSContext) Umask

func (f *FSContext) Umask() uint

Umask returns the current umask.

func (*FSContext) WorkingDirectory

func (f *FSContext) WorkingDirectory() *fs.Dirent

WorkingDirectory returns the current working directory.

This will return nil if called after f is destroyed, otherwise it will return a Dirent with a reference taken.

func (*FSContext) WorkingDirectoryVFS2

func (f *FSContext) WorkingDirectoryVFS2() vfs.VirtualDentry

WorkingDirectoryVFS2 returns the current working directory.

This will return an empty vfs.VirtualDentry if called after f is destroyed, otherwise it will return a Dirent with a reference taken.

type FSContextRefs

type FSContextRefs struct {
	// contains filtered or unexported fields
}

Refs implements refs.RefCounter. It keeps a reference count using atomic operations and calls the destructor when the count reaches zero.

+stateify savable

func (*FSContextRefs) DecRef

func (r *FSContextRefs) DecRef(destroy func())

DecRef implements refs.RefCounter.DecRef.

Note that speculative references are counted here. Since they were added prior to real references reaching zero, they will successfully convert to real references. In other words, we see speculative references only in the following case:

A: TryIncRef [speculative increase => sees non-negative references]
B: DecRef [real decrease]
A: TryIncRef [transform speculative to real]

func (*FSContextRefs) IncRef

func (r *FSContextRefs) IncRef()

IncRef implements refs.RefCounter.IncRef.

func (*FSContextRefs) InitRefs

func (r *FSContextRefs) InitRefs()

InitRefs initializes r with one reference and, if enabled, activates leak checking.

func (*FSContextRefs) LeakMessage

func (r *FSContextRefs) LeakMessage() string

LeakMessage implements refsvfs2.CheckedObject.LeakMessage.

func (*FSContextRefs) LogRefs

func (r *FSContextRefs) LogRefs() bool

LogRefs implements refsvfs2.CheckedObject.LogRefs.

func (*FSContextRefs) ReadRefs

func (r *FSContextRefs) ReadRefs() int64

ReadRefs returns the current number of references. The returned count is inherently racy and is unsafe to use without external synchronization.

func (*FSContextRefs) RefType

func (r *FSContextRefs) RefType() string

RefType implements refsvfs2.CheckedObject.RefType.

func (*FSContextRefs) StateFields

func (r *FSContextRefs) StateFields() []string

func (*FSContextRefs) StateLoad

func (r *FSContextRefs) StateLoad(stateSourceObject state.Source)

func (*FSContextRefs) StateSave

func (r *FSContextRefs) StateSave(stateSinkObject state.Sink)

func (*FSContextRefs) StateTypeName

func (r *FSContextRefs) StateTypeName() string

func (*FSContextRefs) TryIncRef

func (r *FSContextRefs) TryIncRef() bool

TryIncRef implements refs.RefCounter.TryIncRef.

To do this safely without a loop, a speculative reference is first acquired on the object. This allows multiple concurrent TryIncRef calls to distinguish other TryIncRef calls from genuine references held.

type IPCNamespace

type IPCNamespace struct {
	IPCNamespaceRefs
	// contains filtered or unexported fields
}

IPCNamespace represents an IPC namespace.

+stateify savable

var IPCNamespaceobj *IPCNamespace

obj is used to customize logging. Note that we use a pointer to T so that we do not copy the entire object when passed as a format parameter.

func IPCNamespaceFromContext

func IPCNamespaceFromContext(ctx context.Context) *IPCNamespace

IPCNamespaceFromContext returns the IPC namespace in which ctx is executing, or nil if there is no such IPC namespace. It takes a reference on the namespace.

func NewIPCNamespace

func NewIPCNamespace(userNS *auth.UserNamespace) *IPCNamespace

NewIPCNamespace creates a new IPC namespace.

func (*IPCNamespace) DecRef

func (i *IPCNamespace) DecRef(ctx context.Context)

DecRef implements refsvfs2.RefCounter.DecRef.

func (*IPCNamespace) SemaphoreRegistry

func (i *IPCNamespace) SemaphoreRegistry() *semaphore.Registry

SemaphoreRegistry returns the semaphore set registry for this namespace.

func (*IPCNamespace) ShmRegistry

func (i *IPCNamespace) ShmRegistry() *shm.Registry

ShmRegistry returns the shm segment registry for this namespace.

func (*IPCNamespace) StateFields

func (i *IPCNamespace) StateFields() []string

func (*IPCNamespace) StateLoad

func (i *IPCNamespace) StateLoad(stateSourceObject state.Source)

func (*IPCNamespace) StateSave

func (i *IPCNamespace) StateSave(stateSinkObject state.Sink)

func (*IPCNamespace) StateTypeName

func (i *IPCNamespace) StateTypeName() string

type IPCNamespaceRefs

type IPCNamespaceRefs struct {
	// contains filtered or unexported fields
}

Refs implements refs.RefCounter. It keeps a reference count using atomic operations and calls the destructor when the count reaches zero.

+stateify savable

func (*IPCNamespaceRefs) DecRef

func (r *IPCNamespaceRefs) DecRef(destroy func())

DecRef implements refs.RefCounter.DecRef.

Note that speculative references are counted here. Since they were added prior to real references reaching zero, they will successfully convert to real references. In other words, we see speculative references only in the following case:

A: TryIncRef [speculative increase => sees non-negative references]
B: DecRef [real decrease]
A: TryIncRef [transform speculative to real]

func (*IPCNamespaceRefs) IncRef

func (r *IPCNamespaceRefs) IncRef()

IncRef implements refs.RefCounter.IncRef.

func (*IPCNamespaceRefs) InitRefs

func (r *IPCNamespaceRefs) InitRefs()

InitRefs initializes r with one reference and, if enabled, activates leak checking.

func (*IPCNamespaceRefs) LeakMessage

func (r *IPCNamespaceRefs) LeakMessage() string

LeakMessage implements refsvfs2.CheckedObject.LeakMessage.

func (*IPCNamespaceRefs) LogRefs

func (r *IPCNamespaceRefs) LogRefs() bool

LogRefs implements refsvfs2.CheckedObject.LogRefs.

func (*IPCNamespaceRefs) ReadRefs

func (r *IPCNamespaceRefs) ReadRefs() int64

ReadRefs returns the current number of references. The returned count is inherently racy and is unsafe to use without external synchronization.

func (*IPCNamespaceRefs) RefType

func (r *IPCNamespaceRefs) RefType() string

RefType implements refsvfs2.CheckedObject.RefType.

func (*IPCNamespaceRefs) StateFields

func (r *IPCNamespaceRefs) StateFields() []string

func (*IPCNamespaceRefs) StateLoad

func (r *IPCNamespaceRefs) StateLoad(stateSourceObject state.Source)

func (*IPCNamespaceRefs) StateSave

func (r *IPCNamespaceRefs) StateSave(stateSinkObject state.Sink)

func (*IPCNamespaceRefs) StateTypeName

func (r *IPCNamespaceRefs) StateTypeName() string

func (*IPCNamespaceRefs) TryIncRef

func (r *IPCNamespaceRefs) TryIncRef() bool

TryIncRef implements refs.RefCounter.TryIncRef.

To do this safely without a loop, a speculative reference is first acquired on the object. This allows multiple concurrent TryIncRef calls to distinguish other TryIncRef calls from genuine references held.

type InitKernelArgs

type InitKernelArgs struct {
	// FeatureSet is the emulated CPU feature set.
	FeatureSet *cpuid.FeatureSet

	// Timekeeper manages time for all tasks in the system.
	Timekeeper *Timekeeper

	// RootUserNamespace is the root user namespace.
	RootUserNamespace *auth.UserNamespace

	// RootNetworkNamespace is the root network namespace. If nil, no networking
	// will be available.
	RootNetworkNamespace *inet.Namespace

	// ApplicationCores is the number of logical CPUs visible to sandboxed
	// applications. The set of logical CPU IDs is [0, ApplicationCores); thus
	// ApplicationCores is analogous to Linux's nr_cpu_ids, the index of the
	// most significant bit in cpu_possible_mask + 1.
	ApplicationCores uint

	// If UseHostCores is true, Task.CPU() returns the task goroutine's CPU
	// instead of a virtualized CPU number, and Task.CopyToCPUMask() is a
	// no-op. If ApplicationCores is less than hostcpu.MaxPossibleCPU(), it
	// will be overridden.
	UseHostCores bool

	// ExtraAuxv contains additional auxiliary vector entries that are added to
	// each process by the ELF loader.
	ExtraAuxv []arch.AuxEntry

	// Vdso holds the VDSO and its parameter page.
	Vdso *loader.VDSO

	// RootUTSNamespace is the root UTS namespace.
	RootUTSNamespace *UTSNamespace

	// RootIPCNamespace is the root IPC namespace.
	RootIPCNamespace *IPCNamespace

	// RootAbstractSocketNamespace is the root Abstract Socket namespace.
	RootAbstractSocketNamespace *AbstractSocketNamespace

	// PIDNamespace is the root PID namespace.
	PIDNamespace *PIDNamespace
}

InitKernelArgs holds arguments to Init.

type IntervalTimer

type IntervalTimer struct {
	// contains filtered or unexported fields
}

IntervalTimer represents a POSIX interval timer as described by timer_create(2).

+stateify savable

func (*IntervalTimer) Destroy

func (it *IntervalTimer) Destroy()

Destroy implements ktime.TimerListener.Destroy. Users of Timer should call DestroyTimer instead.

func (*IntervalTimer) DestroyTimer

func (it *IntervalTimer) DestroyTimer()

DestroyTimer releases it's resources.

func (*IntervalTimer) Notify

func (it *IntervalTimer) Notify(exp uint64, setting ktime.Setting) (ktime.Setting, bool)

Notify implements ktime.TimerListener.Notify.

func (*IntervalTimer) PauseTimer

func (it *IntervalTimer) PauseTimer()

PauseTimer pauses the associated Timer.

func (*IntervalTimer) ResumeTimer

func (it *IntervalTimer) ResumeTimer()

ResumeTimer resumes the associated Timer.

func (*IntervalTimer) StateFields

func (it *IntervalTimer) StateFields() []string

func (*IntervalTimer) StateLoad

func (it *IntervalTimer) StateLoad(stateSourceObject state.Source)

func (*IntervalTimer) StateSave

func (it *IntervalTimer) StateSave(stateSinkObject state.Sink)

func (*IntervalTimer) StateTypeName

func (it *IntervalTimer) StateTypeName() string

type Kcov

type Kcov struct {
	// contains filtered or unexported fields
}

Kcov provides kernel coverage data to userspace through a memory-mapped region, as kcov does in Linux.

To give the illusion that the data is always up to date, we update the shared memory every time before we return to userspace.

func (*Kcov) Clear

func (kcov *Kcov) Clear(ctx context.Context)

Clear resets the mode and clears the owning task and memory mapping for kcov. It is called when the fd corresponding to kcov is closed. Note that the mode needs to be set so that the next call to kcov.TaskWork() will exit early.

func (*Kcov) ConfigureMMap

func (kcov *Kcov) ConfigureMMap(ctx context.Context, opts *memmap.MMapOpts) error

ConfigureMMap is called by the vfs.FileDescription for this kcov instance to implement vfs.FileDescription.ConfigureMMap.

func (*Kcov) DisableTrace

func (kcov *Kcov) DisableTrace(ctx context.Context) error

DisableTrace performs the KCOV_DISABLE_TRACE ioctl.

func (*Kcov) EnableTrace

func (kcov *Kcov) EnableTrace(ctx context.Context, traceKind uint8) error

EnableTrace performs the KCOV_ENABLE_TRACE ioctl.

func (*Kcov) InitTrace

func (kcov *Kcov) InitTrace(size uint64) error

InitTrace performs the KCOV_INIT_TRACE ioctl.

func (*Kcov) OnTaskExit

func (kcov *Kcov) OnTaskExit()

OnTaskExit is called when the owning task exits. It is similar to kcov.Clear(), except the memory mapping is not cleared, so that the same mapping can be used in the future if kcov is enabled again by another task.

func (*Kcov) TaskWork

func (kcov *Kcov) TaskWork(t *Task)

TaskWork implements TaskWorker.TaskWork.

type Kernel

type Kernel struct {

	// Platform is the platform that is used to execute tasks in the created
	// Kernel. See comment on pgalloc.MemoryFileProvider for why Platform is
	// embedded anonymously (the same issue applies).
	platform.Platform `state:"nosave"`

	// DirentCacheLimiter controls the number of total dirent entries can be in
	// caches. Not all caches use it, only the caches that use host resources use
	// the limiter. It may be nil if disabled.
	DirentCacheLimiter *fs.DirentCacheLimiter

	// SpecialOpts contains special kernel options.
	SpecialOpts

	// If set to true, report address space activation waits as if the task is in
	// external wait so that the watchdog doesn't report the task stuck.
	SleepForAddressSpaceActivation bool
	// contains filtered or unexported fields
}

Kernel represents an emulated Linux kernel. It must be initialized by calling Init() or LoadFrom().

+stateify savable

func KernelFromContext

func KernelFromContext(ctx context.Context) *Kernel

KernelFromContext returns the Kernel in which ctx is executing, or nil if there is no such Kernel.

func (*Kernel) AfterFunc

func (k *Kernel) AfterFunc(d time.Duration, f func()) tcpip.Timer

AfterFunc implements tcpip.Clock.AfterFunc.

func (*Kernel) ApplicationCores

func (k *Kernel) ApplicationCores() uint

ApplicationCores returns the number of CPUs visible to sandboxed applications.

func (*Kernel) CPUClockNow

func (k *Kernel) CPUClockNow() uint64

CPUClockNow returns the current value of k.cpuClock.

func (*Kernel) CreateProcess

func (k *Kernel) CreateProcess(args CreateProcessArgs) (*ThreadGroup, ThreadID, error)

CreateProcess creates a new task in a new thread group with the given options. The new task has no parent and is in the root PID namespace.

If k.Start() has already been called, then the created process must be started by calling kernel.StartProcess(tg).

If k.Start() has not yet been called, then the created task will begin running when k.Start() is called.

CreateProcess has no analogue in Linux; it is used to create the initial application task, as well as processes started by the control server.

func (*Kernel) DeleteSocketVFS2

func (k *Kernel) DeleteSocketVFS2(sock *vfs.FileDescription)

DeleteSocketVFS2 removes a VFS2 socket from the system-wide socket table.

func (*Kernel) EmitUnimplementedEvent

func (k *Kernel) EmitUnimplementedEvent(ctx context.Context)

EmitUnimplementedEvent emits an UnimplementedSyscall event via the event channel.

func (*Kernel) FeatureSet

func (k *Kernel) FeatureSet() *cpuid.FeatureSet

FeatureSet returns the FeatureSet.

func (*Kernel) GenerateInotifyCookie

func (k *Kernel) GenerateInotifyCookie() uint32

GenerateInotifyCookie generates a unique inotify event cookie.

Returned values may overlap with previously returned values if the value space is exhausted. 0 is not a valid cookie value, all other values representable in a uint32 are allowed.

func (*Kernel) GlobalInit

func (k *Kernel) GlobalInit() *ThreadGroup

GlobalInit returns the thread group with ID 1 in the root PID namespace, or nil if no such thread group exists. GlobalInit may return a thread group containing no tasks if the thread group has already exited.

func (*Kernel) HostMount

func (k *Kernel) HostMount() *vfs.Mount

HostMount returns the hostfs mount.

func (*Kernel) Init

func (k *Kernel) Init(args InitKernelArgs) error

Init initialize the Kernel with no tasks.

Callers must manually set Kernel.Platform and call Kernel.SetMemoryFile before calling Init.

func (*Kernel) Kill

func (k *Kernel) Kill(es ExitStatus)

Kill requests that all tasks in k immediately exit as if group exiting with status es. Kill does not wait for tasks to exit.

func (*Kernel) ListSockets

func (k *Kernel) ListSockets() []*SocketRecord

ListSockets returns a snapshot of all sockets.

Callers of ListSockets() in VFS2 should use SocketRecord.SockVFS2.TryIncRef() to get a reference on a socket in the table.

func (*Kernel) LoadFrom

func (k *Kernel) LoadFrom(ctx context.Context, r wire.Reader, net inet.Stack, clocks sentrytime.Clocks, vfsOpts *vfs.CompleteRestoreOptions) error

LoadFrom returns a new Kernel loaded from args.

func (*Kernel) LoadTaskImage

func (k *Kernel) LoadTaskImage(ctx context.Context, args loader.LoadArgs) (*TaskImage, *syserr.Error)

LoadTaskImage loads a specified file into a new TaskImage.

args.MemoryManager does not need to be set by the caller.

func (*Kernel) MemoryFile

func (k *Kernel) MemoryFile() *pgalloc.MemoryFile

MemoryFile implements pgalloc.MemoryFileProvider.MemoryFile.

func (*Kernel) MonotonicClock

func (k *Kernel) MonotonicClock() ktime.Clock

MonotonicClock returns the application CLOCK_MONOTONIC clock.

func (*Kernel) NetlinkPorts

func (k *Kernel) NetlinkPorts() *port.Manager

NetlinkPorts returns the netlink port manager.

func (*Kernel) NewFDTable

func (k *Kernel) NewFDTable() *FDTable

NewFDTable allocates a new FDTable that may be used by tasks in k.

func (*Kernel) NewKcov

func (k *Kernel) NewKcov() *Kcov

NewKcov creates and returns a Kcov instance.

func (*Kernel) NewThreadGroup

func (k *Kernel) NewThreadGroup(mntns *fs.MountNamespace, pidns *PIDNamespace, sh *SignalHandlers, terminationSignal linux.Signal, limits *limits.LimitSet) *ThreadGroup

NewThreadGroup returns a new, empty thread group in PID namespace pidns. The thread group leader will send its parent terminationSignal when it exits. The new thread group isn't visible to the system until a task has been created inside of it by a successful call to TaskSet.NewTask.

func (*Kernel) NowMonotonic

func (k *Kernel) NowMonotonic() int64

NowMonotonic implements tcpip.Clock.NowMonotonic.

func (*Kernel) NowNanoseconds

func (k *Kernel) NowNanoseconds() int64

NowNanoseconds implements tcpip.Clock.NowNanoseconds.

func (*Kernel) Pause

func (k *Kernel) Pause()

Pause requests that all tasks in k temporarily stop executing, and blocks until all tasks and asynchronous I/O operations in k have stopped. Multiple calls to Pause nest and require an equal number of calls to Unpause to resume execution.

func (*Kernel) PipeMount

func (k *Kernel) PipeMount() *vfs.Mount

PipeMount returns the pipefs mount.

func (*Kernel) RealtimeClock

func (k *Kernel) RealtimeClock() ktime.Clock

RealtimeClock returns the application CLOCK_REALTIME clock.

func (*Kernel) RebuildTraceContexts

func (k *Kernel) RebuildTraceContexts()

RebuildTraceContexts rebuilds the trace context for all tasks.

Unfortunately, if these are built while tracing is not enabled, then we will not have meaningful trace data. Rebuilding here ensures that we can do so after tracing has been enabled.

func (*Kernel) ReceiveTaskStates

func (k *Kernel) ReceiveTaskStates()

ReceiveTaskStates receives full states for all tasks.

func (*Kernel) RecordSocket

func (k *Kernel) RecordSocket(sock *fs.File)

RecordSocket adds a socket to the system-wide socket table for tracking.

Precondition: Caller must hold a reference to sock.

func (*Kernel) RecordSocketVFS2

func (k *Kernel) RecordSocketVFS2(sock *vfs.FileDescription)

RecordSocketVFS2 adds a VFS2 socket to the system-wide socket table for tracking.

Precondition: Caller must hold a reference to sock.

Note that the socket table will not hold a reference on the vfs.FileDescription.

func (*Kernel) Release

func (k *Kernel) Release()

Release releases resources owned by k.

Precondition: This should only be called after the kernel is fully initialized, e.g. after k.Start() has been called.

func (*Kernel) RootAbstractSocketNamespace

func (k *Kernel) RootAbstractSocketNamespace() *AbstractSocketNamespace

RootAbstractSocketNamespace returns the root AbstractSocketNamespace.

func (*Kernel) RootIPCNamespace

func (k *Kernel) RootIPCNamespace() *IPCNamespace

RootIPCNamespace takes a reference and returns the root IPCNamespace.

func (*Kernel) RootNetworkNamespace

func (k *Kernel) RootNetworkNamespace() *inet.Namespace

RootNetworkNamespace returns the root network namespace, always non-nil.

func (*Kernel) RootPIDNamespace

func (k *Kernel) RootPIDNamespace() *PIDNamespace

RootPIDNamespace returns the root PIDNamespace.

func (*Kernel) RootUTSNamespace

func (k *Kernel) RootUTSNamespace() *UTSNamespace

RootUTSNamespace returns the root UTSNamespace.

func (*Kernel) RootUserNamespace

func (k *Kernel) RootUserNamespace() *auth.UserNamespace

RootUserNamespace returns the root UserNamespace.

func (*Kernel) SaveStatus

func (k *Kernel) SaveStatus() (saved, autosaved bool, err error)

SaveStatus returns the sandbox save status. If it was saved successfully, autosaved indicates whether save was triggered by autosave. If it was not saved successfully, err indicates the sandbox error that caused the kernel to exit during save.

func (*Kernel) SaveTo

func (k *Kernel) SaveTo(ctx context.Context, w wire.Writer) error

SaveTo saves the state of k to w.

Preconditions: The kernel must be paused throughout the call to SaveTo.

func (*Kernel) SendContainerSignal

func (k *Kernel) SendContainerSignal(cid string, info *arch.SignalInfo) error

SendContainerSignal sends the given signal to all processes inside the namespace that match the given container ID.

func (*Kernel) SendExternalSignal

func (k *Kernel) SendExternalSignal(info *arch.SignalInfo, context string)

SendExternalSignal injects a signal into the kernel.

context is used only for debugging to describe how the signal was received.

Preconditions: Kernel must have an init process.

func (*Kernel) SendExternalSignalThreadGroup

func (k *Kernel) SendExternalSignalThreadGroup(tg *ThreadGroup, info *arch.SignalInfo) error

SendExternalSignalThreadGroup injects a signal into an specific ThreadGroup. This function doesn't skip signals like SendExternalSignal does.

func (*Kernel) SetHostMount

func (k *Kernel) SetHostMount(mnt *vfs.Mount)

SetHostMount sets the hostfs mount.

func (*Kernel) SetMemoryFile

func (k *Kernel) SetMemoryFile(mf *pgalloc.MemoryFile)

SetMemoryFile sets Kernel.mf. SetMemoryFile must be called before Init or LoadFrom.

func (*Kernel) SetSaveError

func (k *Kernel) SetSaveError(err error)

SetSaveError sets the sandbox error that caused the kernel to exit during save, if one is not already set.

func (*Kernel) SetSaveSuccess

func (k *Kernel) SetSaveSuccess(autosave bool)

SetSaveSuccess sets the flag indicating that save completed successfully, if no status was already set.

func (*Kernel) ShmMount

func (k *Kernel) ShmMount() *vfs.Mount

ShmMount returns the tmpfs mount.

func (*Kernel) SocketMount

func (k *Kernel) SocketMount() *vfs.Mount

SocketMount returns the sockfs mount.

func (*Kernel) Start

func (k *Kernel) Start() error

Start starts execution of all tasks in k.

Preconditions: Start may be called exactly once.

func (*Kernel) StartProcess

func (k *Kernel) StartProcess(tg *ThreadGroup)

StartProcess starts running a process that was created with CreateProcess.

func (*Kernel) StateFields

func (k *Kernel) StateFields() []string

func (*Kernel) StateLoad

func (k *Kernel) StateLoad(stateSourceObject state.Source)

func (*Kernel) StateSave

func (k *Kernel) StateSave(stateSinkObject state.Sink)

func (*Kernel) StateTypeName

func (k *Kernel) StateTypeName() string

func (*Kernel) SupervisorContext

func (k *Kernel) SupervisorContext() context.Context

SupervisorContext returns a Context with maximum privileges in k. It should only be used by goroutines outside the control of the emulated kernel defined by e.

Callers are responsible for ensuring that the returned Context is not used concurrently with changes to the Kernel.

func (*Kernel) Syslog

func (k *Kernel) Syslog() *syslog

Syslog returns the syslog.

func (*Kernel) TaskSet

func (k *Kernel) TaskSet() *TaskSet

TaskSet returns the TaskSet.

func (*Kernel) TestOnlySetGlobalInit

func (k *Kernel) TestOnlySetGlobalInit(tg *ThreadGroup)

TestOnlySetGlobalInit sets the thread group with ID 1 in the root PID namespace.

func (*Kernel) Timekeeper

func (k *Kernel) Timekeeper() *Timekeeper

Timekeeper returns the Timekeeper.

func (*Kernel) UniqueID

func (k *Kernel) UniqueID() uint64

UniqueID returns a unique identifier.

func (*Kernel) Unpause

func (k *Kernel) Unpause()

Unpause ends the effect of a previous call to Pause. If Unpause is called without a matching preceding call to Pause, Unpause may panic.

func (*Kernel) VFS

func (k *Kernel) VFS() *vfs.VirtualFilesystem

VFS returns the virtual filesystem for the kernel.

func (*Kernel) WaitExited

func (k *Kernel) WaitExited()

WaitExited blocks until all tasks in k have exited.

type MissingFn

type MissingFn func(t *Task, sysno uintptr, args arch.SyscallArguments) (uintptr, error)

MissingFn is a syscall to be called when an implementation is missing.

type OldRSeqCriticalRegion

type OldRSeqCriticalRegion struct {
	// When a task in this thread group has its CPU preempted (as defined by
	// platform.ErrContextCPUPreempted) or has a signal delivered to an
	// application handler while its instruction pointer is in CriticalSection,
	// set the instruction pointer to Restart and application register r10 (on
	// amd64) to the former instruction pointer.
	CriticalSection usermem.AddrRange
	Restart         usermem.Addr
}

OldRSeqCriticalRegion describes an old rseq critical region.

+stateify savable

func (*OldRSeqCriticalRegion) StateFields

func (o *OldRSeqCriticalRegion) StateFields() []string

func (*OldRSeqCriticalRegion) StateLoad

func (o *OldRSeqCriticalRegion) StateLoad(stateSourceObject state.Source)

func (*OldRSeqCriticalRegion) StateSave

func (o *OldRSeqCriticalRegion) StateSave(stateSinkObject state.Sink)

func (*OldRSeqCriticalRegion) StateTypeName

func (o *OldRSeqCriticalRegion) StateTypeName() string

type PIDNamespace

type PIDNamespace struct {
	// contains filtered or unexported fields
}

A PIDNamespace represents a PID namespace, a bimap between thread IDs and tasks. See the pid_namespaces(7) man page for further details.

N.B. A task is said to be visible in a PID namespace if the PID namespace contains a thread ID that maps to that task.

+stateify savable

func NewRootPIDNamespace

func NewRootPIDNamespace(userns *auth.UserNamespace) *PIDNamespace

NewRootPIDNamespace creates the root PID namespace. 'owner' is not available yet when root namespace is created and must be set by caller.

func PIDNamespaceFromContext

func PIDNamespaceFromContext(ctx context.Context) *PIDNamespace

PIDNamespaceFromContext returns the PID namespace in which ctx is executing, or nil if there is no such PID namespace.

func (*PIDNamespace) IDOfProcessGroup

func (ns *PIDNamespace) IDOfProcessGroup(pg *ProcessGroup) ProcessGroupID

IDOfProcessGroup returns the process group assigned to pg in PID namespace ns.

The same constraints apply as IDOfSession.

func (*PIDNamespace) IDOfSession

func (ns *PIDNamespace) IDOfSession(s *Session) SessionID

IDOfSession returns the Session assigned to s in PID namespace ns.

If this group isn't visible in this namespace, zero will be returned. It is the callers responsibility to check that before using this function.

func (*PIDNamespace) IDOfTask

func (ns *PIDNamespace) IDOfTask(t *Task) ThreadID

IDOfTask returns the TID assigned to the given task in PID namespace ns. If the task is not visible in that namespace, IDOfTask returns 0. (This return value is significant in some cases, e.g. getppid() is documented as returning 0 if the caller's parent is in an ancestor namespace and consequently not visible to the caller.) If the task is nil, IDOfTask returns 0.

func (*PIDNamespace) IDOfThreadGroup

func (ns *PIDNamespace) IDOfThreadGroup(tg *ThreadGroup) ThreadID

IDOfThreadGroup returns the TID assigned to tg's leader in PID namespace ns. If the task is not visible in that namespace, IDOfThreadGroup returns 0.

func (*PIDNamespace) NewChild

func (ns *PIDNamespace) NewChild(userns *auth.UserNamespace) *PIDNamespace

NewChild returns a new, empty PID namespace that is a child of ns. Authority over the new PID namespace is controlled by userns.

func (*PIDNamespace) NumTasks

func (ns *PIDNamespace) NumTasks() int

NumTasks returns the number of tasks in ns.

func (*PIDNamespace) ProcessGroupWithID

func (ns *PIDNamespace) ProcessGroupWithID(id ProcessGroupID) *ProcessGroup

ProcessGroupWithID returns the ProcessGroup with the given ID in the PID namespace ns, or nil if that given ID is not defined in this namespace.

A reference is not taken on the process group.

func (*PIDNamespace) Root

func (ns *PIDNamespace) Root() *PIDNamespace

Root returns the root PID namespace of ns.

func (*PIDNamespace) SessionWithID

func (ns *PIDNamespace) SessionWithID(id SessionID) *Session

SessionWithID returns the Session with the given ID in the PID namespace ns, or nil if that given ID is not defined in this namespace.

A reference is not taken on the session.

func (*PIDNamespace) StateFields

func (ns *PIDNamespace) StateFields() []string

func (*PIDNamespace) StateLoad

func (ns *PIDNamespace) StateLoad(stateSourceObject state.Source)

func (*PIDNamespace) StateSave

func (ns *PIDNamespace) StateSave(stateSinkObject state.Sink)

func (*PIDNamespace) StateTypeName

func (ns *PIDNamespace) StateTypeName() string

func (*PIDNamespace) TaskWithID

func (ns *PIDNamespace) TaskWithID(tid ThreadID) *Task

TaskWithID returns the task with thread ID tid in PID namespace ns. If no task has that TID, TaskWithID returns nil.

func (*PIDNamespace) Tasks

func (ns *PIDNamespace) Tasks() []*Task

Tasks returns a snapshot of the tasks in ns.

func (*PIDNamespace) ThreadGroupWithID

func (ns *PIDNamespace) ThreadGroupWithID(tid ThreadID) *ThreadGroup

ThreadGroupWithID returns the thread group led by the task with thread ID tid in PID namespace ns. If no task has that TID, or if the task with that TID is not a thread group leader, ThreadGroupWithID returns nil.

func (*PIDNamespace) ThreadGroups

func (ns *PIDNamespace) ThreadGroups() []*ThreadGroup

ThreadGroups returns a snapshot of the thread groups in ns.

func (*PIDNamespace) ThreadGroupsAppend

func (ns *PIDNamespace) ThreadGroupsAppend(tgs []*ThreadGroup) []*ThreadGroup

ThreadGroupsAppend appends a snapshot of the thread groups in ns to tgs.

func (*PIDNamespace) UserNamespace

func (ns *PIDNamespace) UserNamespace() *auth.UserNamespace

UserNamespace returns the user namespace associated with PID namespace ns.

type ProcessGroup

type ProcessGroup struct {
	// contains filtered or unexported fields
}

ProcessGroup contains an originator threadgroup and a parent Session.

+stateify savable

var ProcessGroupobj *ProcessGroup

obj is used to customize logging. Note that we use a pointer to T so that we do not copy the entire object when passed as a format parameter.

func (*ProcessGroup) IsOrphan

func (pg *ProcessGroup) IsOrphan() bool

IsOrphan returns true if this process group is an orphan.

func (*ProcessGroup) Next

func (e *ProcessGroup) Next() *ProcessGroup

Next returns the entry that follows e in the list.

func (*ProcessGroup) Originator

func (pg *ProcessGroup) Originator() *ThreadGroup

Originator retuns the originator of the process group.

func (*ProcessGroup) Prev

func (e *ProcessGroup) Prev() *ProcessGroup

Prev returns the entry that precedes e in the list.

func (*ProcessGroup) SendSignal

func (pg *ProcessGroup) SendSignal(info *arch.SignalInfo) error

SendSignal sends a signal to all processes inside the process group. It is analagous to kernel/signal.c:kill_pgrp.

func (*ProcessGroup) Session

func (pg *ProcessGroup) Session() *Session

Session returns the process group's session without taking a reference.

func (*ProcessGroup) SetNext

func (e *ProcessGroup) SetNext(elem *ProcessGroup)

SetNext assigns 'entry' as the entry that follows e in the list.

func (*ProcessGroup) SetPrev

func (e *ProcessGroup) SetPrev(elem *ProcessGroup)

SetPrev assigns 'entry' as the entry that precedes e in the list.

func (*ProcessGroup) StateFields

func (pg *ProcessGroup) StateFields() []string

func (*ProcessGroup) StateLoad

func (pg *ProcessGroup) StateLoad(stateSourceObject state.Source)

func (*ProcessGroup) StateSave

func (pg *ProcessGroup) StateSave(stateSinkObject state.Sink)

func (*ProcessGroup) StateTypeName

func (pg *ProcessGroup) StateTypeName() string

type ProcessGroupID

type ProcessGroupID ThreadID

ProcessGroupID is the public identifier.

type ProcessGroupRefs

type ProcessGroupRefs struct {
	// contains filtered or unexported fields
}

Refs implements refs.RefCounter. It keeps a reference count using atomic operations and calls the destructor when the count reaches zero.

+stateify savable

func (*ProcessGroupRefs) DecRef

func (r *ProcessGroupRefs) DecRef(destroy func())

DecRef implements refs.RefCounter.DecRef.

Note that speculative references are counted here. Since they were added prior to real references reaching zero, they will successfully convert to real references. In other words, we see speculative references only in the following case:

A: TryIncRef [speculative increase => sees non-negative references]
B: DecRef [real decrease]
A: TryIncRef [transform speculative to real]

func (*ProcessGroupRefs) IncRef

func (r *ProcessGroupRefs) IncRef()

IncRef implements refs.RefCounter.IncRef.

func (*ProcessGroupRefs) InitRefs

func (r *ProcessGroupRefs) InitRefs()

InitRefs initializes r with one reference and, if enabled, activates leak checking.

func (*ProcessGroupRefs) LeakMessage

func (r *ProcessGroupRefs) LeakMessage() string

LeakMessage implements refsvfs2.CheckedObject.LeakMessage.

func (*ProcessGroupRefs) LogRefs

func (r *ProcessGroupRefs) LogRefs() bool

LogRefs implements refsvfs2.CheckedObject.LogRefs.

func (*ProcessGroupRefs) ReadRefs

func (r *ProcessGroupRefs) ReadRefs() int64

ReadRefs returns the current number of references. The returned count is inherently racy and is unsafe to use without external synchronization.

func (*ProcessGroupRefs) RefType

func (r *ProcessGroupRefs) RefType() string

RefType implements refsvfs2.CheckedObject.RefType.

func (*ProcessGroupRefs) StateFields

func (r *ProcessGroupRefs) StateFields() []string

func (*ProcessGroupRefs) StateLoad

func (r *ProcessGroupRefs) StateLoad(stateSourceObject state.Source)

func (*ProcessGroupRefs) StateSave

func (r *ProcessGroupRefs) StateSave(stateSinkObject state.Sink)

func (*ProcessGroupRefs) StateTypeName

func (r *ProcessGroupRefs) StateTypeName() string

func (*ProcessGroupRefs) TryIncRef

func (r *ProcessGroupRefs) TryIncRef() bool

TryIncRef implements refs.RefCounter.TryIncRef.

To do this safely without a loop, a speculative reference is first acquired on the object. This allows multiple concurrent TryIncRef calls to distinguish other TryIncRef calls from genuine references held.

type Session

type Session struct {
	SessionRefs
	// contains filtered or unexported fields
}

Session contains a leader threadgroup and a list of ProcessGroups.

+stateify savable

var Sessionobj *Session

obj is used to customize logging. Note that we use a pointer to T so that we do not copy the entire object when passed as a format parameter.

func (*Session) DecRef

func (s *Session) DecRef()

DecRef drops a reference.

Precondition: callers must hold TaskSet.mu for writing.

func (*Session) Next

func (e *Session) Next() *Session

Next returns the entry that follows e in the list.

func (*Session) Prev

func (e *Session) Prev() *Session

Prev returns the entry that precedes e in the list.

func (*Session) SetNext

func (e *Session) SetNext(elem *Session)

SetNext assigns 'entry' as the entry that follows e in the list.

func (*Session) SetPrev

func (e *Session) SetPrev(elem *Session)

SetPrev assigns 'entry' as the entry that precedes e in the list.

func (*Session) StateFields

func (s *Session) StateFields() []string

func (*Session) StateLoad

func (s *Session) StateLoad(stateSourceObject state.Source)

func (*Session) StateSave

func (s *Session) StateSave(stateSinkObject state.Sink)

func (*Session) StateTypeName

func (s *Session) StateTypeName() string

type SessionID

type SessionID ThreadID

SessionID is the public identifier.

type SessionRefs

type SessionRefs struct {
	// contains filtered or unexported fields
}

Refs implements refs.RefCounter. It keeps a reference count using atomic operations and calls the destructor when the count reaches zero.

+stateify savable

func (*SessionRefs) DecRef

func (r *SessionRefs) DecRef(destroy func())

DecRef implements refs.RefCounter.DecRef.

Note that speculative references are counted here. Since they were added prior to real references reaching zero, they will successfully convert to real references. In other words, we see speculative references only in the following case:

A: TryIncRef [speculative increase => sees non-negative references]
B: DecRef [real decrease]
A: TryIncRef [transform speculative to real]

func (*SessionRefs) IncRef

func (r *SessionRefs) IncRef()

IncRef implements refs.RefCounter.IncRef.

func (*SessionRefs) InitRefs

func (r *SessionRefs) InitRefs()

InitRefs initializes r with one reference and, if enabled, activates leak checking.

func (*SessionRefs) LeakMessage

func (r *SessionRefs) LeakMessage() string

LeakMessage implements refsvfs2.CheckedObject.LeakMessage.

func (*SessionRefs) LogRefs

func (r *SessionRefs) LogRefs() bool

LogRefs implements refsvfs2.CheckedObject.LogRefs.

func (*SessionRefs) ReadRefs

func (r *SessionRefs) ReadRefs() int64

ReadRefs returns the current number of references. The returned count is inherently racy and is unsafe to use without external synchronization.

func (*SessionRefs) RefType

func (r *SessionRefs) RefType() string

RefType implements refsvfs2.CheckedObject.RefType.

func (*SessionRefs) StateFields

func (r *SessionRefs) StateFields() []string

func (*SessionRefs) StateLoad

func (r *SessionRefs) StateLoad(stateSourceObject state.Source)

func (*SessionRefs) StateSave

func (r *SessionRefs) StateSave(stateSinkObject state.Sink)

func (*SessionRefs) StateTypeName

func (r *SessionRefs) StateTypeName() string

func (*SessionRefs) TryIncRef

func (r *SessionRefs) TryIncRef() bool

TryIncRef implements refs.RefCounter.TryIncRef.

To do this safely without a loop, a speculative reference is first acquired on the object. This allows multiple concurrent TryIncRef calls to distinguish other TryIncRef calls from genuine references held.

type SharingOptions

type SharingOptions struct {
	// If NewAddressSpace is true, the task should have an independent virtual
	// address space.
	NewAddressSpace bool

	// If NewSignalHandlers is true, the task should use an independent set of
	// signal handlers.
	NewSignalHandlers bool

	// If NewThreadGroup is true, the task should be the leader of its own
	// thread group. TerminationSignal is the signal that the thread group
	// will send to its parent when it exits. If NewThreadGroup is false,
	// TerminationSignal is ignored.
	NewThreadGroup    bool
	TerminationSignal linux.Signal

	// If NewPIDNamespace is true:
	//
	// - In the context of Task.Clone, the new task should be the init task
	// (TID 1) in a new PID namespace.
	//
	// - In the context of Task.Unshare, the task should create a new PID
	// namespace, and all subsequent clones of the task should be members of
	// the new PID namespace.
	NewPIDNamespace bool

	// If NewUserNamespace is true, the task should have an independent user
	// namespace.
	NewUserNamespace bool

	// If NewNetworkNamespace is true, the task should have an independent
	// network namespace.
	NewNetworkNamespace bool

	// If NewFiles is true, the task should use an independent file descriptor
	// table.
	NewFiles bool

	// If NewFSContext is true, the task should have an independent FSContext.
	NewFSContext bool

	// If NewUTSNamespace is true, the task should have an independent UTS
	// namespace.
	NewUTSNamespace bool

	// If NewIPCNamespace is true, the task should have an independent IPC
	// namespace.
	NewIPCNamespace bool
}

SharingOptions controls what resources are shared by a new task created by Task.Clone, or an existing task affected by Task.Unshare.

type SignalAction

type SignalAction int

SignalAction is an internal signal action.

const (
	SignalActionTerm SignalAction = iota
	SignalActionCore
	SignalActionStop
	SignalActionIgnore
	SignalActionHandler
)

Available signal actions. Note that although we refer the complete set internally, the application is only capable of using the Default and Ignore actions from the system call interface.

type SignalHandlers

type SignalHandlers struct {
	// contains filtered or unexported fields
}

SignalHandlers holds information about signal actions.

+stateify savable

func NewSignalHandlers

func NewSignalHandlers() *SignalHandlers

NewSignalHandlers returns a new SignalHandlers specifying all default actions.

func (*SignalHandlers) CopyForExec

func (sh *SignalHandlers) CopyForExec() *SignalHandlers

CopyForExec returns a copy of sh for a thread group that is undergoing an execve. (See comments in Task.finishExec.)

func (*SignalHandlers) Fork

func (sh *SignalHandlers) Fork() *SignalHandlers

Fork returns a copy of sh for a new thread group.

func (*SignalHandlers) IsIgnored

func (sh *SignalHandlers) IsIgnored(sig linux.Signal) bool

IsIgnored returns true if the signal is ignored.

func (*SignalHandlers) StateFields

func (sh *SignalHandlers) StateFields() []string

func (*SignalHandlers) StateLoad

func (sh *SignalHandlers) StateLoad(stateSourceObject state.Source)

func (*SignalHandlers) StateSave

func (sh *SignalHandlers) StateSave(stateSinkObject state.Sink)

func (*SignalHandlers) StateTypeName

func (sh *SignalHandlers) StateTypeName() string

type SocketRecord

type SocketRecord struct {
	Sock     *refs.WeakRef        // TODO(gvisor.dev/issue/1624): Only used by VFS1.
	SockVFS2 *vfs.FileDescription // Only used by VFS2.
	ID       uint64               // Socket table entry number.
	// contains filtered or unexported fields
}

SocketRecord represents a socket recorded in Kernel.socketsVFS2.

+stateify savable

func (*SocketRecord) StateFields

func (s *SocketRecord) StateFields() []string

func (*SocketRecord) StateLoad

func (s *SocketRecord) StateLoad(stateSourceObject state.Source)

func (*SocketRecord) StateSave

func (s *SocketRecord) StateSave(stateSinkObject state.Sink)

func (*SocketRecord) StateTypeName

func (s *SocketRecord) StateTypeName() string

type SocketRecordVFS1

type SocketRecordVFS1 struct {
	SocketRecord
	// contains filtered or unexported fields
}

SocketRecordVFS1 represents a socket recorded in Kernel.sockets. It implements refs.WeakRefUser for sockets stored in the socket table.

+stateify savable

func (*SocketRecordVFS1) Next

func (e *SocketRecordVFS1) Next() *SocketRecordVFS1

Next returns the entry that follows e in the list.

func (*SocketRecordVFS1) Prev

func (e *SocketRecordVFS1) Prev() *SocketRecordVFS1

Prev returns the entry that precedes e in the list.

func (*SocketRecordVFS1) SetNext

func (e *SocketRecordVFS1) SetNext(elem *SocketRecordVFS1)

SetNext assigns 'entry' as the entry that follows e in the list.

func (*SocketRecordVFS1) SetPrev

func (e *SocketRecordVFS1) SetPrev(elem *SocketRecordVFS1)

SetPrev assigns 'entry' as the entry that precedes e in the list.

func (*SocketRecordVFS1) StateFields

func (s *SocketRecordVFS1) StateFields() []string

func (*SocketRecordVFS1) StateLoad

func (s *SocketRecordVFS1) StateLoad(stateSourceObject state.Source)

func (*SocketRecordVFS1) StateSave

func (s *SocketRecordVFS1) StateSave(stateSinkObject state.Sink)

func (*SocketRecordVFS1) StateTypeName

func (s *SocketRecordVFS1) StateTypeName() string

func (*SocketRecordVFS1) WeakRefGone

func (s *SocketRecordVFS1) WeakRefGone(context.Context)

WeakRefGone implements refs.WeakRefUser.WeakRefGone.

type SpecialOpts

type SpecialOpts struct{}

SpecialOpts contains non-standard options for the kernel.

+stateify savable

func (*SpecialOpts) StateFields

func (s *SpecialOpts) StateFields() []string

func (*SpecialOpts) StateLoad

func (s *SpecialOpts) StateLoad(stateSourceObject state.Source)

func (*SpecialOpts) StateSave

func (s *SpecialOpts) StateSave(stateSinkObject state.Sink)

func (*SpecialOpts) StateTypeName

func (s *SpecialOpts) StateTypeName() string

type Stracer

type Stracer interface {
	// SyscallEnter is called on syscall entry.
	//
	// The returned private data is passed to SyscallExit.
	SyscallEnter(t *Task, sysno uintptr, args arch.SyscallArguments, flags uint32) interface{}

	// SyscallExit is called on syscall exit.
	SyscallExit(context interface{}, t *Task, sysno, rval uintptr, err error)
}

Stracer traces syscall execution.

type Syscall

type Syscall struct {
	// Name is the syscall name.
	Name string
	// Fn is the implementation of the syscall.
	Fn SyscallFn
	// SupportLevel is the level of support implemented in gVisor.
	SupportLevel SyscallSupportLevel
	// Note describes the compatibility of the syscall.
	Note string
	// URLs is set of URLs to any relevant bugs or issues.
	URLs []string
}

Syscall includes the syscall implementation and compatibility information.

type SyscallControl

type SyscallControl struct {
	// contains filtered or unexported fields
}

SyscallControl is returned by syscalls to control the behavior of Task.doSyscallInvoke.

type SyscallFlagsTable

type SyscallFlagsTable struct {
	// contains filtered or unexported fields
}

SyscallFlagsTable manages a set of enable/disable bit fields on a per-syscall basis.

func (*SyscallFlagsTable) Enable

func (e *SyscallFlagsTable) Enable(bit uint32, s map[uintptr]bool, missingEnable bool)

Enable sets enable bit bit for all syscalls based on s.

Syscalls missing from s are disabled.

Syscalls missing from the initial table passed to Init cannot be added as individual syscalls. If present in s they will be ignored.

Callers to Word may see either the old or new value while this function is executing.

func (*SyscallFlagsTable) EnableAll

func (e *SyscallFlagsTable) EnableAll(bit uint32)

EnableAll sets enable bit bit for all syscalls, present and missing.

func (*SyscallFlagsTable) Word

func (e *SyscallFlagsTable) Word(sysno uintptr) uint32

Word returns the enable bitfield for sysno.

type SyscallFn

type SyscallFn func(t *Task, args arch.SyscallArguments) (uintptr, *SyscallControl, error)

SyscallFn is a syscall implementation.

type SyscallRestartBlock

type SyscallRestartBlock interface {
	Restart(t *Task) (uintptr, error)
}

SyscallRestartBlock represents the restart block for a syscall restartable with a custom function. It encapsulates the state required to restart a syscall across a S/R.

type SyscallSupportLevel

type SyscallSupportLevel int

SyscallSupportLevel is a syscall support levels.

func (SyscallSupportLevel) String

func (l SyscallSupportLevel) String() string

String returns a human readable represetation of the support level.

type SyscallTable

type SyscallTable struct {
	// OS is the operating system that this syscall table implements.
	OS abi.OS

	// Arch is the architecture that this syscall table targets.
	Arch arch.Arch

	// The OS version that this syscall table implements.
	Version Version

	// AuditNumber is a numeric constant that represents the syscall table. If
	// non-zero, auditNumber must be one of the AUDIT_ARCH_* values defined by
	// linux/audit.h.
	AuditNumber uint32

	// Table is the collection of functions.
	Table map[uintptr]Syscall

	// Emulate is a collection of instruction addresses to emulate. The
	// keys are addresses, and the values are system call numbers.
	Emulate map[usermem.Addr]uintptr

	// The function to call in case of a missing system call.
	Missing MissingFn

	// Stracer traces this syscall table.
	Stracer Stracer

	// External is used to handle an external callback.
	External func(*Kernel)

	// ExternalFilterBefore is called before External is called before the syscall is executed.
	// External is not called if it returns false.
	ExternalFilterBefore func(*Task, uintptr, arch.SyscallArguments) bool

	// ExternalFilterAfter is called before External is called after the syscall is executed.
	// External is not called if it returns false.
	ExternalFilterAfter func(*Task, uintptr, arch.SyscallArguments) bool

	// FeatureEnable stores the strace and one-shot enable bits.
	FeatureEnable SyscallFlagsTable
	// contains filtered or unexported fields
}

SyscallTable is a lookup table of system calls.

Note that a SyscallTable is not savable directly. Instead, they are saved as an OS/Arch pair and lookup happens again on restore.

func LookupSyscallTable

func LookupSyscallTable(os abi.OS, a arch.Arch) (*SyscallTable, bool)

LookupSyscallTable returns the SyscallCall table for the OS/Arch combination.

func SyscallTables

func SyscallTables() []*SyscallTable

SyscallTables returns a read-only slice of registered SyscallTables.

func (*SyscallTable) Init

func (s *SyscallTable) Init()

Init initializes the system call table.

This should normally be called only during registration.

func (*SyscallTable) Lookup

func (s *SyscallTable) Lookup(sysno uintptr) SyscallFn

Lookup returns the syscall implementation, if one exists.

func (*SyscallTable) LookupEmulate

func (s *SyscallTable) LookupEmulate(addr usermem.Addr) (uintptr, bool)

LookupEmulate looks up an emulation syscall number.

func (*SyscallTable) LookupName

func (s *SyscallTable) LookupName(sysno uintptr) string

LookupName looks up a syscall name.

func (*SyscallTable) LookupNo

func (s *SyscallTable) LookupNo(name string) (uintptr, error)

LookupNo looks up a syscall number by name.

func (*SyscallTable) MaxSysno

func (s *SyscallTable) MaxSysno() (max uintptr)

MaxSysno returns the largest system call number.

type TTY

type TTY struct {
	// Index is the terminal index. It is immutable.
	Index uint32
	// contains filtered or unexported fields
}

TTY defines the relationship between a thread group and its controlling terminal.

+stateify savable

func (*TTY) StateFields

func (t *TTY) StateFields() []string

func (*TTY) StateLoad

func (t *TTY) StateLoad(stateSourceObject state.Source)

func (*TTY) StateSave

func (t *TTY) StateSave(stateSinkObject state.Sink)

func (*TTY) StateTypeName

func (t *TTY) StateTypeName() string

type Task

type Task struct {
	// contains filtered or unexported fields
}

Task represents a thread of execution in the untrusted app. It includes registers and any thread-specific state that you would normally expect.

Each task is associated with a goroutine, called the task goroutine, that executes code (application code, system calls, etc.) on behalf of that task. See Task.run (task_run.go).

All fields that are "owned by the task goroutine" can only be mutated by the task goroutine while it is running. The task goroutine does not require synchronization to read these fields, although it still requires synchronization as described for those fields to mutate them.

All fields that are "exclusive to the task goroutine" can only be accessed by the task goroutine while it is running. The task goroutine does not require synchronization to read or write these fields.

+stateify savable

func TaskFromContext

func TaskFromContext(ctx context.Context) *Task

TaskFromContext returns the Task associated with ctx, or nil if there is no such Task.

func (*Task) AbstractSockets

func (t *Task) AbstractSockets() *AbstractSocketNamespace

AbstractSockets returns t's AbstractSocketNamespace.

func (*Task) Activate

func (t *Task) Activate()

Activate ensures that the task has an active address space.

func (*Task) AppendSyscallFilter

func (t *Task) AppendSyscallFilter(p bpf.Program, syncAll bool) error

AppendSyscallFilter adds BPF program p as a system call filter.

Preconditions: The caller must be running on the task goroutine.

func (*Task) Arch

func (t *Task) Arch() arch.Context

Arch returns t's arch.Context.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) AsyncContext

func (t *Task) AsyncContext() context.Context

AsyncContext returns a context.Context representing t. The returned context.Context is intended for use by goroutines other than t's task goroutine; for example, signal delivery to t will not interrupt goroutines that are blocking using the returned context.Context.

func (*Task) BeginExternalStop

func (t *Task) BeginExternalStop()

BeginExternalStop indicates the start of an external stop that applies to t. BeginExternalStop does not wait for t's task goroutine to stop.

func (*Task) Block

func (t *Task) Block(C <-chan struct{}) error

Block blocks t until an event is received from C or t is interrupted. It returns nil if an event is received from C and syserror.ErrInterrupted if t is interrupted.

Preconditions: The caller must be running on the task goroutine.

func (*Task) BlockWithDeadline

func (t *Task) BlockWithDeadline(C <-chan struct{}, haveDeadline bool, deadline ktime.Time) error

BlockWithDeadline blocks t until an event is received from C, the application monotonic clock indicates a time of deadline (only if haveDeadline is true), or t is interrupted. It returns nil if an event is received from C, ETIMEDOUT if the deadline expired, and syserror.ErrInterrupted if t is interrupted.

Preconditions: The caller must be running on the task goroutine.

func (*Task) BlockWithTimeout

func (t *Task) BlockWithTimeout(C chan struct{}, haveTimeout bool, timeout time.Duration) (time.Duration, error)

BlockWithTimeout blocks t until an event is received from C, the application monotonic clock indicates that timeout has elapsed (only if haveTimeout is true), or t is interrupted. It returns:

- The remaining timeout, which is guaranteed to be 0 if the timeout expired, and is unspecified if haveTimeout is false.

- An error which is nil if an event is received from C, ETIMEDOUT if the timeout expired, and syserror.ErrInterrupted if t is interrupted.

Preconditions: The caller must be running on the task goroutine.

func (*Task) BlockWithTimer

func (t *Task) BlockWithTimer(C <-chan struct{}, tchan <-chan struct{}) error

BlockWithTimer blocks t until an event is received from C or tchan, or t is interrupted. It returns nil if an event is received from C, ETIMEDOUT if an event is received from tchan, and syserror.ErrInterrupted if t is interrupted.

Most clients should use BlockWithDeadline or BlockWithTimeout instead.

Preconditions: The caller must be running on the task goroutine.

func (*Task) CPU

func (t *Task) CPU() int32

CPU returns the cpu id for a given task.

func (*Task) CPUClock

func (t *Task) CPUClock() ktime.Clock

CPUClock returns a clock measuring the CPU time the task has spent executing application and "kernel" code.

func (*Task) CPUMask

func (t *Task) CPUMask() sched.CPUSet

CPUMask returns a copy of t's allowed CPU mask.

func (*Task) CPUStats

func (t *Task) CPUStats() usage.CPUStats

CPUStats returns the CPU usage statistics of t.

func (*Task) CanTrace

func (t *Task) CanTrace(target *Task, attach bool) bool

CanTrace checks that t is permitted to access target's state, as defined by ptrace(2), subsection "Ptrace access mode checking". If attach is true, it checks for access mode PTRACE_MODE_ATTACH; otherwise, it checks for access mode PTRACE_MODE_READ.

NOTE(b/30815691): The result of CanTrace is immediately stale (e.g., a racing setuid(2) may change traceability). This may pose a risk when a task changes from traceable to not traceable. This is only problematic across execve, where privileges may increase.

We currently do not implement privileged executables (set-user/group-ID bits and file capabilities), so that case is not reachable.

func (*Task) ClearRSeq

func (t *Task) ClearRSeq(addr usermem.Addr, length, signature uint32) error

ClearRSeq unregisters addr as this thread's rseq structure.

Preconditions: The caller must be running on the task goroutine.

func (*Task) Clone

func (t *Task) Clone(opts *CloneOptions) (ThreadID, *SyscallControl, error)

Clone implements the clone(2) syscall and returns the thread ID of the new task in t's PID namespace. Clone may return both a non-zero thread ID and a non-nil error.

Preconditions: The caller must be running Task.doSyscallInvoke on the task goroutine.

func (*Task) CompareAndSwapUint32

func (t *Task) CompareAndSwapUint32(addr usermem.Addr, old, new uint32) (uint32, error)

CompareAndSwapUint32 implements futex.Target.CompareAndSwapUint32.

func (*Task) ContainerID

func (t *Task) ContainerID() string

ContainerID returns t's container ID.

func (*Task) CopyContext

func (t *Task) CopyContext(ctx context.Context, opts usermem.IOOpts) *taskCopyContext

CopyContext returns a marshal.CopyContext that copies to/from t's address space using opts.

func (*Task) CopyInBytes

func (t *Task) CopyInBytes(addr usermem.Addr, dst []byte) (int, error)

CopyInBytes is a fast version of CopyIn if the caller can serialize the data without reflection and pass in a byte slice.

This Task's AddressSpace must be active.

func (*Task) CopyInIovecs

func (t *Task) CopyInIovecs(addr usermem.Addr, numIovecs int) (usermem.AddrRangeSeq, error)

CopyInIovecs copies an array of numIovecs struct iovecs from the memory mapped at addr, converts them to usermem.AddrRanges, and returns them as a usermem.AddrRangeSeq.

CopyInIovecs shares the following properties with Linux's lib/iov_iter.c:import_iovec() => fs/read_write.c:rw_copy_check_uvector():

- If the length of any AddrRange would exceed the range of an ssize_t, CopyInIovecs returns EINVAL.

- If the length of any AddrRange would cause its end to overflow, CopyInIovecs returns EFAULT.

- If any AddrRange would include addresses outside the application address range, CopyInIovecs returns EFAULT.

- The combined length of all AddrRanges is limited to MAX_RW_COUNT. If the combined length of all AddrRanges would otherwise exceed this amount, ranges beyond MAX_RW_COUNT are silently truncated.

Preconditions: Same as usermem.IO.CopyIn, plus: * The caller must be running on the task goroutine. * t's AddressSpace must be active.

func (*Task) CopyInSignalAct

func (t *Task) CopyInSignalAct(addr usermem.Addr) (arch.SignalAct, error)

CopyInSignalAct copies an architecture-specific sigaction type from task memory and then converts it into a SignalAct.

func (*Task) CopyInSignalStack

func (t *Task) CopyInSignalStack(addr usermem.Addr) (arch.SignalStack, error)

CopyInSignalStack copies an architecture-specific stack_t from task memory and then converts it into a SignalStack.

func (*Task) CopyInString

func (t *Task) CopyInString(addr usermem.Addr, maxlen int) (string, error)

CopyInString copies a NUL-terminated string of length at most maxlen in from the task's memory. The copy will fail with syscall.EFAULT if it traverses user memory that is unmapped or not readable by the user.

This Task's AddressSpace must be active.

func (*Task) CopyInVector

func (t *Task) CopyInVector(addr usermem.Addr, maxElemSize, maxTotalSize int) ([]string, error)

CopyInVector copies a NULL-terminated vector of strings from the task's memory. The copy will fail with syscall.EFAULT if it traverses user memory that is unmapped or not readable by the user.

maxElemSize is the maximum size of each individual element.

maxTotalSize is the maximum total length of all elements plus the total number of elements. For example, the following strings correspond to the following set of sizes:

{ "a", "b", "c" } => 6 (3 for lengths, 3 for elements)
{ "abc" }         => 4 (3 for length, 1 for elements)

This Task's AddressSpace must be active.

func (*Task) CopyOutBytes

func (t *Task) CopyOutBytes(addr usermem.Addr, src []byte) (int, error)

CopyOutBytes is a fast version of CopyOut if the caller can serialize the data without reflection and pass in a byte slice.

This Task's AddressSpace must be active.

func (*Task) CopyOutIovecs

func (t *Task) CopyOutIovecs(addr usermem.Addr, src usermem.AddrRangeSeq) error

CopyOutIovecs converts src to an array of struct iovecs and copies it to the memory mapped at addr.

Preconditions: Same as usermem.IO.CopyOut, plus: * The caller must be running on the task goroutine. * t's AddressSpace must be active.

func (*Task) CopyOutSignalAct

func (t *Task) CopyOutSignalAct(addr usermem.Addr, s *arch.SignalAct) error

CopyOutSignalAct converts the given SignalAct into an architecture-specific type and then copies it out to task memory.

func (*Task) CopyOutSignalStack

func (t *Task) CopyOutSignalStack(addr usermem.Addr, s *arch.SignalStack) error

CopyOutSignalStack converts the given SignalStack into an architecture-specific type and then copies it out to task memory.

func (*Task) CopyScratchBuffer

func (t *Task) CopyScratchBuffer(size int) []byte

CopyScratchBuffer returns a scratch buffer to be used in CopyIn/CopyOut functions. It must only be used within those functions and can only be used by the task goroutine; it exists to improve performance and thus intentionally lacks any synchronization.

Callers should pass a constant value as an argument if possible, which will allow the compiler to inline and optimize out the if statement below.

func (*Task) Credentials

func (t *Task) Credentials() *auth.Credentials

Credentials returns t's credentials.

This value must be considered immutable.

func (*Task) Deactivate

func (t *Task) Deactivate()

Deactivate relinquishes the task's active address space.

func (*Task) Deadline

func (t *Task) Deadline() (time.Time, bool)

Deadline implements context.Context.Deadline.

func (*Task) DebugDumpState

func (t *Task) DebugDumpState()

DebugDumpState logs task state at log level debug.

Preconditions: The caller must be running on the task goroutine.

func (*Task) Debugf

func (t *Task) Debugf(fmt string, v ...interface{})

Debugf creates a debug string that includes the task ID.

func (*Task) Done

func (t *Task) Done() <-chan struct{}

Done implements context.Context.Done.

func (*Task) DropBoundingCapability

func (t *Task) DropBoundingCapability(cp linux.Capability) error

DropBoundingCapability attempts to drop capability cp from t's capability bounding set.

func (*Task) EndExternalStop

func (t *Task) EndExternalStop()

EndExternalStop indicates the end of an external stop started by a previous call to Task.BeginExternalStop. EndExternalStop does not wait for t's task goroutine to resume.

func (*Task) Err

func (t *Task) Err() error

Err implements context.Context.Err.

func (*Task) Execve

func (t *Task) Execve(newImage *TaskImage) (*SyscallControl, error)

Execve implements the execve(2) syscall by killing all other tasks in its thread group and switching to newImage. Execve always takes ownership of newImage.

Preconditions: The caller must be running Task.doSyscallInvoke on the task goroutine.

func (*Task) ExitState

func (t *Task) ExitState() TaskExitState

ExitState returns t's current progress through the exit path.

func (*Task) ExitStatus

func (t *Task) ExitStatus() ExitStatus

ExitStatus returns t's exit status, which is only guaranteed to be meaningful if t.ExitState() != TaskExitNone.

func (*Task) FDTable

func (t *Task) FDTable() *FDTable

FDTable returns t's FDTable. FDMTable does not take an additional reference on the returned FDMap.

Precondition: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) FSContext

func (t *Task) FSContext() *FSContext

FSContext returns t's FSContext. FSContext does not take an additional reference on the returned FSContext.

Precondition: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) Futex

func (t *Task) Futex() *futex.Manager

Futex returns t's futex manager.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) FutexWaiter

func (t *Task) FutexWaiter() *futex.Waiter

FutexWaiter returns the Task's futex.Waiter.

func (*Task) GID

func (t *Task) GID() uint32

GID returns t's gid. TODO(gvisor.dev/issue/170): This method is not namespaced yet.

func (*Task) GetFile

func (t *Task) GetFile(fd int32) *fs.File

GetFile is a convenience wrapper for t.FDTable().Get.

Precondition: same as FDTable.Get.

func (*Task) GetFileVFS2

func (t *Task) GetFileVFS2(fd int32) *vfs.FileDescription

GetFileVFS2 is a convenience wrapper for t.FDTable().GetVFS2.

Precondition: same as FDTable.Get.

func (*Task) GetRobustList

func (t *Task) GetRobustList() usermem.Addr

GetRobustList sets the robust futex list for the task.

func (*Task) GetSharedKey

func (t *Task) GetSharedKey(addr usermem.Addr) (futex.Key, error)

GetSharedKey implements futex.Target.GetSharedKey.

func (*Task) Getitimer

func (t *Task) Getitimer(id int32) (linux.ItimerVal, error)

Getitimer implements getitimer(2).

Preconditions: The caller must be running on the task goroutine.

func (*Task) GoroutineID

func (t *Task) GoroutineID() int64

GoroutineID returns the ID of t's task goroutine.

func (*Task) HasCapability

func (t *Task) HasCapability(cp linux.Capability) bool

HasCapability checks if the task has capability cp in its user namespace.

func (*Task) HasCapabilityIn

func (t *Task) HasCapabilityIn(cp linux.Capability, ns *auth.UserNamespace) bool

HasCapabilityIn checks if the task has capability cp in user namespace ns.

func (*Task) IOUsage

func (t *Task) IOUsage() *usage.IO

IOUsage returns the io usage of the thread.

func (*Task) IPCNamespace

func (t *Task) IPCNamespace() *IPCNamespace

IPCNamespace returns the task's IPC namespace.

func (*Task) Infof

func (t *Task) Infof(fmt string, v ...interface{})

Infof logs an formatted info message by calling log.Infof.

func (*Task) Interrupted

func (t *Task) Interrupted() bool

Interrupted implements context.ChannelSleeper.Interrupted.

func (*Task) IntervalTimerCreate

func (t *Task) IntervalTimerCreate(c ktime.Clock, sigev *linux.Sigevent) (linux.TimerID, error)

IntervalTimerCreate implements timer_create(2).

func (*Task) IntervalTimerDelete

func (t *Task) IntervalTimerDelete(id linux.TimerID) error

IntervalTimerDelete implements timer_delete(2).

func (*Task) IntervalTimerGetoverrun

func (t *Task) IntervalTimerGetoverrun(id linux.TimerID) (int32, error)

IntervalTimerGetoverrun implements timer_getoverrun(2).

Preconditions: The caller must be running on the task goroutine.

func (*Task) IntervalTimerGettime

func (t *Task) IntervalTimerGettime(id linux.TimerID) (linux.Itimerspec, error)

IntervalTimerGettime implements timer_gettime(2).

func (*Task) IntervalTimerSettime

func (t *Task) IntervalTimerSettime(id linux.TimerID, its linux.Itimerspec, abs bool) (linux.Itimerspec, error)

IntervalTimerSettime implements timer_settime(2).

func (*Task) IovecsIOSequence

func (t *Task) IovecsIOSequence(addr usermem.Addr, iovcnt int, opts usermem.IOOpts) (usermem.IOSequence, error)

IovecsIOSequence returns a usermem.IOSequence representing the array of iovcnt struct iovecs at addr in t's address space. opts applies to the returned IOSequence, not the reading of the struct iovec array.

IovecsIOSequence is analogous to Linux's lib/iov_iter.c:import_iovec().

Preconditions: Same as Task.CopyInIovecs.

func (*Task) IsChrooted

func (t *Task) IsChrooted() bool

IsChrooted returns true if the root directory of t's FSContext is not the root directory of t's MountNamespace.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) IsLogging

func (t *Task) IsLogging(level log.Level) bool

IsLogging returns true iff this level is being logged.

func (*Task) IsNetworkNamespaced

func (t *Task) IsNetworkNamespaced() bool

IsNetworkNamespaced returns true if t is in a non-root network namespace.

func (*Task) Kernel

func (t *Task) Kernel() *Kernel

Kernel returns the Kernel containing t.

func (*Task) Limits

func (t *Task) Limits() *limits.LimitSet

Limits implements context.Context.Limits.

func (*Task) LoadUint32

func (t *Task) LoadUint32(addr usermem.Addr) (uint32, error)

LoadUint32 implements futex.Target.LoadUint32.

func (*Task) MaxRSS

func (t *Task) MaxRSS(which int32) uint64

MaxRSS returns the maximum resident set size of the task in bytes. which should be one of RUSAGE_SELF, RUSAGE_CHILDREN, RUSAGE_THREAD, or RUSAGE_BOTH. See getrusage(2) for documentation on the behavior of these flags.

func (*Task) MemoryManager

func (t *Task) MemoryManager() *mm.MemoryManager

MemoryManager returns t's MemoryManager. MemoryManager does not take an additional reference on the returned MM.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) MountNamespace

func (t *Task) MountNamespace() *fs.MountNamespace

MountNamespace returns t's MountNamespace. MountNamespace does not take an additional reference on the returned MountNamespace.

func (*Task) MountNamespaceVFS2

func (t *Task) MountNamespaceVFS2() *vfs.MountNamespace

MountNamespaceVFS2 returns t's MountNamespace. A reference is taken on the returned mount namespace.

func (*Task) Name

func (t *Task) Name() string

Name returns t's name.

func (*Task) NetworkContext

func (t *Task) NetworkContext() inet.Stack

NetworkContext returns the network stack used by the task. NetworkContext may return nil if no network stack is available.

TODO(gvisor.dev/issue/1833): Migrate callers of this method to NetworkNamespace().

func (*Task) NetworkNamespace

func (t *Task) NetworkNamespace() *inet.Namespace

NetworkNamespace returns the network namespace observed by the task.

func (*Task) NewFDAt

func (t *Task) NewFDAt(fd int32, file *fs.File, flags FDFlags) error

NewFDAt is a convenience wrapper for t.FDTable().NewFDAt.

This automatically passes the task as the context.

Precondition: same as FDTable.

func (*Task) NewFDAtVFS2

func (t *Task) NewFDAtVFS2(fd int32, file *vfs.FileDescription, flags FDFlags) error

NewFDAtVFS2 is a convenience wrapper for t.FDTable().NewFDAtVFS2.

This automatically passes the task as the context.

Precondition: same as FDTable.

func (*Task) NewFDFrom

func (t *Task) NewFDFrom(fd int32, file *fs.File, flags FDFlags) (int32, error)

NewFDFrom is a convenience wrapper for t.FDTable().NewFDs with a single file.

This automatically passes the task as the context.

Precondition: same as FDTable.

func (*Task) NewFDFromVFS2

func (t *Task) NewFDFromVFS2(fd int32, file *vfs.FileDescription, flags FDFlags) (int32, error)

NewFDFromVFS2 is a convenience wrapper for t.FDTable().NewFDVFS2.

This automatically passes the task as the context.

Precondition: same as FDTable.Get.

func (*Task) NewFDs

func (t *Task) NewFDs(fd int32, files []*fs.File, flags FDFlags) ([]int32, error)

NewFDs is a convenience wrapper for t.FDTable().NewFDs.

This automatically passes the task as the context.

Precondition: same as FDTable.

func (*Task) NewFDsVFS2

func (t *Task) NewFDsVFS2(fd int32, files []*vfs.FileDescription, flags FDFlags) ([]int32, error)

NewFDsVFS2 is a convenience wrapper for t.FDTable().NewFDsVFS2.

This automatically passes the task as the context.

Precondition: same as FDTable.

func (*Task) Niceness

func (t *Task) Niceness() int

Niceness returns t's niceness.

func (*Task) NotifyRlimitCPUUpdated

func (t *Task) NotifyRlimitCPUUpdated()

NotifyRlimitCPUUpdated is called by setrlimit.

Preconditions: The caller must be running on the task goroutine.

func (*Task) NumaPolicy

func (t *Task) NumaPolicy() (policy linux.NumaPolicy, nodeMask uint64)

NumaPolicy returns t's current numa policy.

func (*Task) OOMScoreAdj

func (t *Task) OOMScoreAdj() int32

OOMScoreAdj gets the task's thread group's OOM score adjustment.

func (*Task) OldRSeqCPUAddr

func (t *Task) OldRSeqCPUAddr() usermem.Addr

OldRSeqCPUAddr returns the address that old rseq will keep updated with t's CPU number.

Preconditions: The caller must be running on the task goroutine.

func (*Task) OldRSeqCriticalRegion

func (t *Task) OldRSeqCriticalRegion() OldRSeqCriticalRegion

OldRSeqCriticalRegion returns a copy of t's thread group's current old restartable sequence.

func (*Task) OwnCopyContext

func (t *Task) OwnCopyContext(opts usermem.IOOpts) *ownTaskCopyContext

OwnCopyContext returns a marshal.CopyContext that copies to/from t's address space using opts. The returned CopyContext may only be used by t's task goroutine.

Since t already implements marshal.CopyContext, this is only needed to override the usermem.IOOpts used for the copy.

func (*Task) PIDNamespace

func (t *Task) PIDNamespace() *PIDNamespace

PIDNamespace returns the PID namespace containing t.

func (*Task) Parent

func (t *Task) Parent() *Task

Parent returns t's parent.

func (*Task) ParentDeathSignal

func (t *Task) ParentDeathSignal() linux.Signal

ParentDeathSignal returns t's parent death signal.

func (*Task) PendingSignals

func (t *Task) PendingSignals() linux.SignalSet

PendingSignals returns the set of pending signals.

func (*Task) PrepareExit

func (t *Task) PrepareExit(es ExitStatus)

PrepareExit indicates an exit with status es.

Preconditions: The caller must be running on the task goroutine.

func (*Task) PrepareGroupExit

func (t *Task) PrepareGroupExit(es ExitStatus)

PrepareGroupExit indicates a group exit with status es to t's thread group.

PrepareGroupExit is analogous to Linux's do_group_exit(), except that it does not tail-call do_exit(), except that it *does* set Task.exitStatus. (Linux does not do so until within do_exit(), since it reuses exit_code for ptrace.)

Preconditions: The caller must be running on the task goroutine.

func (*Task) Priority

func (t *Task) Priority() int

Priority returns t's priority.

func (*Task) Ptrace

func (t *Task) Ptrace(req int64, pid ThreadID, addr, data usermem.Addr) error

Ptrace implements the ptrace system call.

func (*Task) QueueAIO

func (t *Task) QueueAIO(cb AIOCallback)

QueueAIO queues an AIOCallback which will be run asynchronously.

func (*Task) RSeqAvailable

func (t *Task) RSeqAvailable() bool

RSeqAvailable returns true if t supports (old and new) restartable sequences.

func (*Task) RegisterWork

func (t *Task) RegisterWork(work TaskWorker)

RegisterWork can be used to register additional task work that will be performed prior to returning to user space. See TaskWorker.TaskWork for semantics regarding registration.

func (*Task) ResetKcov

func (t *Task) ResetKcov()

ResetKcov clears the kcov instance associated with t.

func (*Task) SeccompMode

func (t *Task) SeccompMode() int

SeccompMode returns a SECCOMP_MODE_* constant indicating the task's current seccomp syscall filtering mode, appropriate for both prctl(PR_GET_SECCOMP) and /proc/[pid]/status.

func (*Task) SendGroupSignal

func (t *Task) SendGroupSignal(info *arch.SignalInfo) error

SendGroupSignal sends the given signal to t's thread group.

func (*Task) SendSignal

func (t *Task) SendSignal(info *arch.SignalInfo) error

SendSignal sends the given signal to t.

The following errors may be returned:

syserror.ESRCH - The task has exited.
syserror.EINVAL - The signal is not valid.
syserror.EAGAIN - THe signal is realtime, and cannot be queued.

func (*Task) SetCPUMask

func (t *Task) SetCPUMask(mask sched.CPUSet) error

SetCPUMask sets t's allowed CPU mask based on mask. It takes ownership of mask.

Preconditions: mask.Size() == sched.CPUSetSize(t.Kernel().ApplicationCores()).

func (*Task) SetCapabilitySets

func (t *Task) SetCapabilitySets(permitted, inheritable, effective auth.CapabilitySet) error

SetCapabilitySets attempts to change t's permitted, inheritable, and effective capability sets.

func (*Task) SetClearTID

func (t *Task) SetClearTID(addr usermem.Addr)

SetClearTID sets t's cleartid.

Preconditions: The caller must be running on the task goroutine.

func (*Task) SetExtraGIDs

func (t *Task) SetExtraGIDs(gids []auth.GID) error

SetExtraGIDs attempts to change t's supplemental groups. All IDs are interpreted as being in t's user namespace.

func (*Task) SetGID

func (t *Task) SetGID(gid auth.GID) error

SetGID implements the semantics of setgid(2).

func (*Task) SetKcov

func (t *Task) SetKcov(k *Kcov)

SetKcov sets the kcov instance associated with t.

func (*Task) SetKeepCaps

func (t *Task) SetKeepCaps(k bool)

SetKeepCaps will set the keep capabilities flag PR_SET_KEEPCAPS.

func (*Task) SetName

func (t *Task) SetName(name string)

SetName changes t's name.

func (*Task) SetNiceness

func (t *Task) SetNiceness(n int)

SetNiceness sets t's niceness to n.

func (*Task) SetNumaPolicy

func (t *Task) SetNumaPolicy(policy linux.NumaPolicy, nodeMask uint64)

SetNumaPolicy sets t's numa policy.

func (*Task) SetOOMScoreAdj

func (t *Task) SetOOMScoreAdj(adj int32) error

SetOOMScoreAdj sets the task's thread group's OOM score adjustment. The value should be between -1000 and 1000 inclusive.

func (*Task) SetOldRSeqCPUAddr

func (t *Task) SetOldRSeqCPUAddr(addr usermem.Addr) error

SetOldRSeqCPUAddr replaces the address that old rseq will keep updated with t's CPU number.

Preconditions: * t.RSeqAvailable() == true. * The caller must be running on the task goroutine. * t's AddressSpace must be active.

func (*Task) SetOldRSeqCriticalRegion

func (t *Task) SetOldRSeqCriticalRegion(r OldRSeqCriticalRegion) error

SetOldRSeqCriticalRegion replaces t's thread group's old restartable sequence.

Preconditions: t.RSeqAvailable() == true.

func (*Task) SetParentDeathSignal

func (t *Task) SetParentDeathSignal(sig linux.Signal)

SetParentDeathSignal sets t's parent death signal.

func (*Task) SetREGID

func (t *Task) SetREGID(r, e auth.GID) error

SetREGID implements the semantics of setregid(2).

func (*Task) SetRESGID

func (t *Task) SetRESGID(r, e, s auth.GID) error

SetRESGID implements the semantics of the setresgid(2) syscall.

func (*Task) SetRESUID

func (t *Task) SetRESUID(r, e, s auth.UID) error

SetRESUID implements the semantics of the setresuid(2) syscall.

func (*Task) SetREUID

func (t *Task) SetREUID(r, e auth.UID) error

SetREUID implements the semantics of setreuid(2).

func (*Task) SetRSeq

func (t *Task) SetRSeq(addr usermem.Addr, length, signature uint32) error

SetRSeq registers addr as this thread's rseq structure.

Preconditions: The caller must be running on the task goroutine.

func (*Task) SetRobustList

func (t *Task) SetRobustList(addr usermem.Addr)

SetRobustList sets the robust futex list for the task.

func (*Task) SetSavedSignalMask

func (t *Task) SetSavedSignalMask(mask linux.SignalSet)

SetSavedSignalMask sets the saved signal mask (see Task.savedSignalMask's comment).

Preconditions: The caller must be running on the task goroutine.

func (*Task) SetSignalMask

func (t *Task) SetSignalMask(mask linux.SignalSet)

SetSignalMask sets t's signal mask.

Preconditions: * The caller must be running on the task goroutine. * t.exitState < TaskExitZombie.

func (*Task) SetSignalStack

func (t *Task) SetSignalStack(alt arch.SignalStack) bool

SetSignalStack sets the task-private signal stack.

This value may not be changed if the task is currently executing on the signal stack, i.e. if t.onSignalStack returns true. In this case, this function will return false. Otherwise, true is returned.

func (*Task) SetSyscallRestartBlock

func (t *Task) SetSyscallRestartBlock(r SyscallRestartBlock)

SetSyscallRestartBlock sets the restart block for use in restart_syscall(2). After registering a restart block, a syscall should return ERESTART_RESTARTBLOCK to request a restart using the block.

Precondition: The caller must be running on the task goroutine.

func (*Task) SetUID

func (t *Task) SetUID(uid auth.UID) error

SetUID implements the semantics of setuid(2).

func (*Task) SetUserNamespace

func (t *Task) SetUserNamespace(ns *auth.UserNamespace) error

SetUserNamespace attempts to move c into ns.

func (*Task) Setitimer

func (t *Task) Setitimer(id int32, newitv linux.ItimerVal) (linux.ItimerVal, error)

Setitimer implements setitimer(2).

Preconditions: The caller must be running on the task goroutine.

func (*Task) SignalMask

func (t *Task) SignalMask() linux.SignalSet

SignalMask returns a copy of t's signal mask.

func (*Task) SignalRegister

func (t *Task) SignalRegister(e *waiter.Entry, mask waiter.EventMask)

SignalRegister registers a waiter for pending signals.

func (*Task) SignalReturn

func (t *Task) SignalReturn(rt bool) (*SyscallControl, error)

SignalReturn implements sigreturn(2) (if rt is false) or rt_sigreturn(2) (if rt is true).

func (*Task) SignalStack

func (t *Task) SignalStack() arch.SignalStack

SignalStack returns the task-private signal stack.

func (*Task) SignalUnregister

func (t *Task) SignalUnregister(e *waiter.Entry)

SignalUnregister unregisters a waiter for pending signals.

func (*Task) Sigtimedwait

func (t *Task) Sigtimedwait(set linux.SignalSet, timeout time.Duration) (*arch.SignalInfo, error)

Sigtimedwait implements the semantics of sigtimedwait(2).

Preconditions: * The caller must be running on the task goroutine. * t.exitState < TaskExitZombie.

func (*Task) SingleIOSequence

func (t *Task) SingleIOSequence(addr usermem.Addr, length int, opts usermem.IOOpts) (usermem.IOSequence, error)

SingleIOSequence returns a usermem.IOSequence representing [addr, addr+length) in t's address space. If this contains addresses outside the application address range, it returns EFAULT. If length exceeds MAX_RW_COUNT, the range is silently truncated.

SingleIOSequence is analogous to Linux's lib/iov_iter.c:import_single_range(). (Note that the non-vectorized read and write syscalls in Linux do not use import_single_range(). However they check access_ok() in fs/read_write.c:vfs_read/vfs_write, and overflowing address ranges are truncated to MAX_RW_COUNT by fs/read_write.c:rw_verify_area().)

func (*Task) SleepFinish

func (t *Task) SleepFinish(success bool)

SleepFinish implements context.ChannelSleeper.SleepFinish.

func (*Task) SleepStart

func (t *Task) SleepStart() <-chan struct{}

SleepStart implements context.ChannelSleeper.SleepStart.

func (*Task) Stack

func (t *Task) Stack() *arch.Stack

Stack returns the userspace stack.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) Start

func (t *Task) Start(tid ThreadID)

Start starts the task goroutine. Start must be called exactly once for each task returned by NewTask.

'tid' must be the task's TID in the root PID namespace and it's used for debugging purposes only (set as parameter to Task.run to make it visible in stack dumps).

func (*Task) StartTime

func (t *Task) StartTime() ktime.Time

StartTime returns t's start time.

func (*Task) StateFields

func (t *Task) StateFields() []string

func (*Task) StateLoad

func (t *Task) StateLoad(stateSourceObject state.Source)

func (*Task) StateSave

func (t *Task) StateSave(stateSinkObject state.Sink)

func (*Task) StateStatus

func (t *Task) StateStatus() string

StateStatus returns a string representation of the task's current state, appropriate for /proc/[pid]/status.

func (*Task) StateTypeName

func (t *Task) StateTypeName() string

func (*Task) SwapUint32

func (t *Task) SwapUint32(addr usermem.Addr, new uint32) (uint32, error)

SwapUint32 implements futex.Target.SwapUint32.

func (*Task) SyscallRestartBlock

func (t *Task) SyscallRestartBlock() SyscallRestartBlock

SyscallRestartBlock returns the currently registered restart block for use in restart_syscall(2). This function is *not* idempotent and may be called once per syscall. This function must not be called if a restart block has not been registered for the current syscall.

Precondition: The caller must be running on the task goroutine.

func (*Task) SyscallTable

func (t *Task) SyscallTable() *SyscallTable

SyscallTable returns t's syscall table.

Preconditions: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) TGIDInRoot

func (t *Task) TGIDInRoot() ThreadID

TGIDInRoot returns t's TGID in the root PID namespace.

func (*Task) TaskGoroutineSchedInfo

func (t *Task) TaskGoroutineSchedInfo() TaskGoroutineSchedInfo

TaskGoroutineSchedInfo returns a copy of t's task goroutine scheduling info. Most clients should use t.CPUStats() instead.

func (*Task) TaskImage

func (t *Task) TaskImage() *TaskImage

TaskImage returns t's TaskImage.

Precondition: The caller must be running on the task goroutine, or t.mu must be locked.

func (*Task) TaskSet

func (t *Task) TaskSet() *TaskSet

TaskSet returns the TaskSet containing t.

func (*Task) ThreadGroup

func (t *Task) ThreadGroup() *ThreadGroup

ThreadGroup returns the thread group containing t.

func (*Task) ThreadID

func (t *Task) ThreadID() ThreadID

ThreadID returns t's thread ID in its own PID namespace. If the task is dead, ThreadID returns 0.

func (*Task) Timekeeper

func (t *Task) Timekeeper() *Timekeeper

Timekeeper returns the system Timekeeper.

func (*Task) Tracer

func (t *Task) Tracer() *Task

Tracer returns t's ptrace Tracer.

func (*Task) UID

func (t *Task) UID() uint32

UID returns t's uid. TODO(gvisor.dev/issue/170): This method is not namespaced yet.

func (*Task) UTSNamespace

func (t *Task) UTSNamespace() *UTSNamespace

UTSNamespace returns the task's UTS namespace.

func (*Task) UninterruptibleSleepFinish

func (t *Task) UninterruptibleSleepFinish(activate bool)

UninterruptibleSleepFinish implements context.Context.UninterruptibleSleepFinish.

func (*Task) UninterruptibleSleepStart

func (t *Task) UninterruptibleSleepStart(deactivate bool)

UninterruptibleSleepStart implements context.Context.UninterruptibleSleepStart.

func (*Task) Unshare

func (t *Task) Unshare(opts *SharingOptions) error

Unshare changes the set of resources t shares with other tasks, as specified by opts.

Preconditions: The caller must be running on the task goroutine.

func (*Task) UserCPUClock

func (t *Task) UserCPUClock() ktime.Clock

UserCPUClock returns a clock measuring the CPU time the task has spent executing application code.

func (*Task) UserNamespace

func (t *Task) UserNamespace() *auth.UserNamespace

UserNamespace returns the user namespace associated with the task.

func (*Task) Value

func (t *Task) Value(key interface{}) interface{}

Value implements context.Context.Value.

Preconditions: The caller must be running on the task goroutine.

func (*Task) Wait

func (t *Task) Wait(opts *WaitOptions) (*WaitResult, error)

Wait waits for an event from a thread group that is a child of t's thread group, or a task in such a thread group, or a task that is ptraced by t, subject to the options specified in opts.

func (*Task) Warningf

func (t *Task) Warningf(fmt string, v ...interface{})

Warningf logs a warning string by calling log.Warningf.

func (*Task) WithMuLocked

func (t *Task) WithMuLocked(f func(*Task))

WithMuLocked executes f with t.mu locked.

func (*Task) Yield

func (t *Task) Yield()

Yield yields the processor for the calling task.

type TaskConfig

type TaskConfig struct {
	// Kernel is the owning Kernel.
	Kernel *Kernel

	// Parent is the new task's parent. Parent may be nil.
	Parent *Task

	// If InheritParent is not nil, use InheritParent's parent as the new
	// task's parent.
	InheritParent *Task

	// ThreadGroup is the ThreadGroup the new task belongs to.
	ThreadGroup *ThreadGroup

	// SignalMask is the new task's initial signal mask.
	SignalMask linux.SignalSet

	// TaskImage is the TaskImage of the new task. Ownership of the
	// TaskImage is transferred to TaskSet.NewTask, whether or not it
	// succeeds.
	TaskImage *TaskImage

	// FSContext is the FSContext of the new task. A reference must be held on
	// FSContext, which is transferred to TaskSet.NewTask whether or not it
	// succeeds.
	FSContext *FSContext

	// FDTable is the FDTableof the new task. A reference must be held on
	// FDMap, which is transferred to TaskSet.NewTask whether or not it
	// succeeds.
	FDTable *FDTable

	// Credentials is the Credentials of the new task.
	Credentials *auth.Credentials

	// Niceness is the niceness of the new task.
	Niceness int

	// NetworkNamespace is the network namespace to be used for the new task.
	NetworkNamespace *inet.Namespace

	// AllowedCPUMask contains the cpus that this task can run on.
	AllowedCPUMask sched.CPUSet

	// UTSNamespace is the UTSNamespace of the new task.
	UTSNamespace *UTSNamespace

	// IPCNamespace is the IPCNamespace of the new task.
	IPCNamespace *IPCNamespace

	// AbstractSocketNamespace is the AbstractSocketNamespace of the new task.
	AbstractSocketNamespace *AbstractSocketNamespace

	// MountNamespaceVFS2 is the MountNamespace of the new task.
	MountNamespaceVFS2 *vfs.MountNamespace

	// RSeqAddr is a pointer to the the userspace linux.RSeq structure.
	RSeqAddr usermem.Addr

	// RSeqSignature is the signature that the rseq abort IP must be signed
	// with.
	RSeqSignature uint32

	// ContainerID is the container the new task belongs to.
	ContainerID string
}

TaskConfig defines the configuration of a new Task (see below).

type TaskExitState

type TaskExitState int

TaskExitState represents a step in the task exit path.

"Exiting" and "exited" are often ambiguous; prefer to name specific states.

const (
	// TaskExitNone indicates that the task has not begun exiting.
	TaskExitNone TaskExitState = iota

	// TaskExitInitiated indicates that the task goroutine has entered the exit
	// path, and the task is no longer eligible to participate in group stops
	// or group signal handling. TaskExitInitiated is analogous to Linux's
	// PF_EXITING.
	TaskExitInitiated

	// TaskExitZombie indicates that the task has released its resources, and
	// the task no longer prevents a sibling thread from completing execve.
	TaskExitZombie

	// TaskExitDead indicates that the task's thread IDs have been released,
	// and the task no longer prevents its thread group leader from being
	// reaped. ("Reaping" refers to the transitioning of a task from
	// TaskExitZombie to TaskExitDead.)
	TaskExitDead
)

func (TaskExitState) String

func (t TaskExitState) String() string

String implements fmt.Stringer.

type TaskGoroutineSchedInfo

type TaskGoroutineSchedInfo struct {
	// Timestamp was the value of Kernel.cpuClock when this
	// TaskGoroutineSchedInfo was last updated.
	Timestamp uint64

	// State is the current state of the task goroutine.
	State TaskGoroutineState

	// UserTicks is the amount of time the task goroutine has spent executing
	// its associated Task's application code, in units of linux.ClockTick.
	UserTicks uint64

	// SysTicks is the amount of time the task goroutine has spent executing in
	// the sentry, in units of linux.ClockTick.
	SysTicks uint64
}

TaskGoroutineSchedInfo contains task goroutine scheduling state which must be read and updated atomically.

+stateify savable

func SeqAtomicLoadTaskGoroutineSchedInfo

func SeqAtomicLoadTaskGoroutineSchedInfo(seq *sync.SeqCount, ptr *TaskGoroutineSchedInfo) TaskGoroutineSchedInfo

SeqAtomicLoad returns a copy of *ptr, ensuring that the read does not race with any writer critical sections in seq.

func SeqAtomicTryLoadTaskGoroutineSchedInfo

func SeqAtomicTryLoadTaskGoroutineSchedInfo(seq *sync.SeqCount, epoch sync.SeqCountEpoch, ptr *TaskGoroutineSchedInfo) (val TaskGoroutineSchedInfo, ok bool)

SeqAtomicTryLoad returns a copy of *ptr while in a reader critical section in seq initiated by a call to seq.BeginRead() that returned epoch. If the read would race with a writer critical section, SeqAtomicTryLoad returns (unspecified, false).

func (*TaskGoroutineSchedInfo) StateFields

func (ts *TaskGoroutineSchedInfo) StateFields() []string

func (*TaskGoroutineSchedInfo) StateLoad

func (ts *TaskGoroutineSchedInfo) StateLoad(stateSourceObject state.Source)

func (*TaskGoroutineSchedInfo) StateSave

func (ts *TaskGoroutineSchedInfo) StateSave(stateSinkObject state.Sink)

func (*TaskGoroutineSchedInfo) StateTypeName

func (ts *TaskGoroutineSchedInfo) StateTypeName() string

type TaskGoroutineState

type TaskGoroutineState int

TaskGoroutineState is a coarse representation of the current execution status of a kernel.Task goroutine.

const (
	// TaskGoroutineNonexistent indicates that the task goroutine has either
	// not yet been created by Task.Start() or has returned from Task.run().
	// This must be the zero value for TaskGoroutineState.
	TaskGoroutineNonexistent TaskGoroutineState = iota

	// TaskGoroutineRunningSys indicates that the task goroutine is executing
	// sentry code.
	TaskGoroutineRunningSys

	// TaskGoroutineRunningApp indicates that the task goroutine is executing
	// application code.
	TaskGoroutineRunningApp

	// TaskGoroutineBlockedInterruptible indicates that the task goroutine is
	// blocked in Task.block(), and hence may be woken by Task.interrupt()
	// (e.g. due to signal delivery).
	TaskGoroutineBlockedInterruptible

	// TaskGoroutineBlockedUninterruptible indicates that the task goroutine is
	// stopped outside of Task.block() and Task.doStop(), and hence cannot be
	// woken by Task.interrupt().
	TaskGoroutineBlockedUninterruptible

	// TaskGoroutineStopped indicates that the task goroutine is blocked in
	// Task.doStop(). TaskGoroutineStopped is similar to
	// TaskGoroutineBlockedUninterruptible, but is a separate state to make it
	// possible to determine when Task.stop is meaningful.
	TaskGoroutineStopped
)

type TaskImage

type TaskImage struct {
	// Name is the thread name set by the prctl(PR_SET_NAME) system call.
	Name string

	// Arch is the architecture-specific context (registers, etc.)
	Arch arch.Context

	// MemoryManager is the task's address space.
	MemoryManager *mm.MemoryManager
	// contains filtered or unexported fields
}

TaskImage is the subset of a task's data that is provided by the loader.

+stateify savable

func (*TaskImage) Fork

func (image *TaskImage) Fork(ctx context.Context, k *Kernel, shareAddressSpace bool) (*TaskImage, error)

Fork returns a duplicate of image. The copied TaskImage always has an independent arch.Context. If shareAddressSpace is true, the copied TaskImage shares an address space with the original; otherwise, the copied TaskImage has an independent address space that is initially a duplicate of the original's.

func (*TaskImage) StateFields

func (image *TaskImage) StateFields() []string

func (*TaskImage) StateLoad

func (image *TaskImage) StateLoad(stateSourceObject state.Source)

func (*TaskImage) StateSave

func (image *TaskImage) StateSave(stateSinkObject state.Sink)

func (*TaskImage) StateTypeName

func (image *TaskImage) StateTypeName() string

type TaskSet

type TaskSet struct {

	// Root is the root PID namespace, in which all tasks in the TaskSet are
	// visible. The Root pointer is immutable.
	Root *PIDNamespace
	// contains filtered or unexported fields
}

A TaskSet comprises all tasks in a system.

+stateify savable

func (*TaskSet) BeginExternalStop

func (ts *TaskSet) BeginExternalStop()

BeginExternalStop indicates the start of an external stop that applies to all current and future tasks in ts. BeginExternalStop does not wait for task goroutines to stop.

func (*TaskSet) EndExternalStop

func (ts *TaskSet) EndExternalStop()

EndExternalStop indicates the end of an external stop started by a previous call to TaskSet.BeginExternalStop. EndExternalStop does not wait for task goroutines to resume.

func (*TaskSet) Kill

func (ts *TaskSet) Kill(es ExitStatus)

Kill requests that all tasks in ts exit as if group exiting with status es. Kill does not wait for tasks to exit.

Kill has no analogue in Linux; it's provided for save/restore only.

func (*TaskSet) NewTask

func (ts *TaskSet) NewTask(ctx context.Context, cfg *TaskConfig) (*Task, error)

NewTask creates a new task defined by cfg.

NewTask does not start the returned task; the caller must call Task.Start.

If successful, NewTask transfers references held by cfg to the new task. Otherwise, NewTask releases them.

func (*TaskSet) PullFullState

func (ts *TaskSet) PullFullState()

PullFullState receives full states for all tasks.

func (*TaskSet) StateFields

func (ts *TaskSet) StateFields() []string

func (*TaskSet) StateLoad

func (ts *TaskSet) StateLoad(stateSourceObject state.Source)

func (*TaskSet) StateSave

func (ts *TaskSet) StateSave(stateSinkObject state.Sink)

func (*TaskSet) StateTypeName

func (ts *TaskSet) StateTypeName() string

type TaskStop

type TaskStop interface {
	// Killable returns true if Task.Kill should end the stop prematurely.
	// Killable is analogous to Linux's TASK_WAKEKILL.
	Killable() bool
}

A TaskStop is a condition visible to the task control flow graph that prevents a task goroutine from running or exiting, i.e. an internal stop.

NOTE(b/30793614): Most TaskStops don't contain any data; they're distinguished by their type. The obvious way to implement such a TaskStop is:

type groupStop struct{}
func (groupStop) Killable() bool { return true }
...
t.beginInternalStop(groupStop{})

However, this doesn't work because the state package can't serialize values, only pointers. Furthermore, the correctness of save/restore depends on the ability to pass a TaskStop to endInternalStop that will compare equal to the TaskStop that was passed to beginInternalStop, even if a save/restore cycle occurred between the two. As a result, the current idiom is to always use a typecast nil for data-free TaskStops:

type groupStop struct{}
func (*groupStop) Killable() bool { return true }
...
t.beginInternalStop((*groupStop)(nil))

This is pretty gross, but the alternatives seem grosser.

type TaskWorker

type TaskWorker interface {
	// TaskWork will be executed prior to returning to user space. Note that
	// TaskWork may call RegisterWork again, but this will not be executed until
	// the next return to user space, unlike in Linux. This effectively allows
	// registration of indefinite user return hooks, but not by default.
	TaskWork(t *Task)
}

TaskWorker is a deferred task.

This must be savable.

type ThreadGroup

type ThreadGroup struct {
	// contains filtered or unexported fields
}

A ThreadGroup is a logical grouping of tasks that has widespread significance to other kernel features (e.g. signal handling). ("Thread groups" are usually called "processes" in userspace documentation.)

ThreadGroup is a superset of Linux's struct signal_struct.

+stateify savable

func (*ThreadGroup) CPUClock

func (tg *ThreadGroup) CPUClock() ktime.Clock

CPUClock returns a ktime.Clock that measures the time that a thread group has spent executing, including sentry time.

func (*ThreadGroup) CPUStats

func (tg *ThreadGroup) CPUStats() usage.CPUStats

CPUStats returns the combined CPU usage statistics of all past and present threads in tg.

func (*ThreadGroup) Count

func (tg *ThreadGroup) Count() int

Count returns the number of non-exited threads in the group.

func (*ThreadGroup) CreateProcessGroup

func (tg *ThreadGroup) CreateProcessGroup() error

CreateProcessGroup creates a new process group.

An EPERM error will be returned if the ThreadGroup belongs to a different Session, is a Session leader or the group already exists.

func (*ThreadGroup) CreateSession

func (tg *ThreadGroup) CreateSession() error

CreateSession creates a new Session, with the ThreadGroup as the leader.

EPERM may be returned if either the given ThreadGroup is already a Session leader, or a ProcessGroup already exists for the ThreadGroup's ID.

func (*ThreadGroup) ExitStatus

func (tg *ThreadGroup) ExitStatus() ExitStatus

ExitStatus returns the exit status that would be returned by a consuming wait*() on tg.

func (*ThreadGroup) ForegroundProcessGroup

func (tg *ThreadGroup) ForegroundProcessGroup(tty *TTY) (int32, error)

ForegroundProcessGroup returns the process group ID of the foreground process group.

func (*ThreadGroup) ID

func (tg *ThreadGroup) ID() ThreadID

ID returns tg's leader's thread ID in its own PID namespace. If tg's leader is dead, ID returns 0.

func (*ThreadGroup) IOUsage

func (tg *ThreadGroup) IOUsage() *usage.IO

IOUsage returns the total io usage of all dead and live threads in the group.

func (*ThreadGroup) JoinProcessGroup

func (tg *ThreadGroup) JoinProcessGroup(pidns *PIDNamespace, pgid ProcessGroupID, checkExec bool) error

JoinProcessGroup joins an existing process group.

This function will return EACCES if an exec has been performed since fork by the given ThreadGroup, and EPERM if the Sessions are not the same or the group does not exist.

If checkExec is set, then the join is not permitted after the process has executed exec at least once.

func (*ThreadGroup) JoinedChildCPUStats

func (tg *ThreadGroup) JoinedChildCPUStats() usage.CPUStats

JoinedChildCPUStats implements the semantics of RUSAGE_CHILDREN: "Return resource usage statistics for all children of [tg] that have terminated and been waited for. These statistics will include the resources used by grandchildren, and further removed descendants, if all of the intervening descendants waited on their terminated children."

func (*ThreadGroup) Leader

func (tg *ThreadGroup) Leader() *Task

Leader returns tg's leader.

func (*ThreadGroup) Limits

func (tg *ThreadGroup) Limits() *limits.LimitSet

Limits returns tg's limits.

func (*ThreadGroup) MemberIDs

func (tg *ThreadGroup) MemberIDs(pidns *PIDNamespace) []ThreadID

MemberIDs returns a snapshot of the ThreadIDs (in PID namespace pidns) for all tasks in tg.

func (*ThreadGroup) PIDNamespace

func (tg *ThreadGroup) PIDNamespace() *PIDNamespace

PIDNamespace returns the PID namespace containing tg.

func (*ThreadGroup) ProcessGroup

func (tg *ThreadGroup) ProcessGroup() *ProcessGroup

ProcessGroup returns the ThreadGroup's ProcessGroup.

A reference is not taken on the process group.

func (*ThreadGroup) Release

func (tg *ThreadGroup) Release(ctx context.Context)

Release releases the thread group's resources.

func (*ThreadGroup) ReleaseControllingTTY

func (tg *ThreadGroup) ReleaseControllingTTY(tty *TTY) error

ReleaseControllingTTY gives up tty as the controlling tty of tg.

func (*ThreadGroup) SendSignal

func (tg *ThreadGroup) SendSignal(info *arch.SignalInfo) error

SendSignal sends the given signal to tg, using tg's leader to determine if the signal is blocked.

func (*ThreadGroup) Session

func (tg *ThreadGroup) Session() *Session

Session returns the ThreadGroup's Session.

A reference is not taken on the session.

func (*ThreadGroup) SetControllingTTY

func (tg *ThreadGroup) SetControllingTTY(tty *TTY, arg int32) error

SetControllingTTY sets tty as the controlling terminal of tg.

func (*ThreadGroup) SetForegroundProcessGroup

func (tg *ThreadGroup) SetForegroundProcessGroup(tty *TTY, pgid ProcessGroupID) (int32, error)

SetForegroundProcessGroup sets the foreground process group of tty to pgid.

func (*ThreadGroup) SetSignalAct

func (tg *ThreadGroup) SetSignalAct(sig linux.Signal, actptr *arch.SignalAct) (arch.SignalAct, error)

SetSignalAct atomically sets the thread group's signal action for signal sig to *actptr (if actptr is not nil) and returns the old signal action.

func (*ThreadGroup) SignalHandlers

func (tg *ThreadGroup) SignalHandlers() *SignalHandlers

SignalHandlers returns the signal handlers used by tg.

Preconditions: The caller must provide the synchronization required to read tg.signalHandlers, as described in the field's comment.

func (*ThreadGroup) StateFields

func (tg *ThreadGroup) StateFields() []string

func (*ThreadGroup) StateLoad

func (tg *ThreadGroup) StateLoad(stateSourceObject state.Source)

func (*ThreadGroup) StateSave

func (tg *ThreadGroup) StateSave(stateSinkObject state.Sink)

func (*ThreadGroup) StateTypeName

func (tg *ThreadGroup) StateTypeName() string

func (*ThreadGroup) TTY

func (tg *ThreadGroup) TTY() *TTY

TTY returns the thread group's controlling terminal. If nil, there is no controlling terminal.

func (*ThreadGroup) TaskSet

func (tg *ThreadGroup) TaskSet() *TaskSet

TaskSet returns the TaskSet containing tg.

func (*ThreadGroup) TerminationSignal

func (tg *ThreadGroup) TerminationSignal() linux.Signal

TerminationSignal returns the thread group's termination signal.

func (*ThreadGroup) UserCPUClock

func (tg *ThreadGroup) UserCPUClock() ktime.Clock

UserCPUClock returns a ktime.Clock that measures the time that a thread group has spent executing.

func (*ThreadGroup) WaitExited

func (tg *ThreadGroup) WaitExited()

WaitExited blocks until all task goroutines in tg have exited.

WaitExited does not correspond to anything in Linux; it's provided so that external callers of Kernel.CreateProcess can wait for the created thread group to terminate.

type ThreadID

type ThreadID int32

ThreadID is a generic thread identifier.

+marshal

const InitTID ThreadID = 1

InitTID is the TID given to the first task added to each PID namespace. The thread group led by InitTID is called the namespace's init process. The death of a PID namespace's init process causes all tasks visible in that namespace to be killed.

func (*ThreadID) CopyIn

func (tid *ThreadID) CopyIn(cc marshal.CopyContext, addr usermem.Addr) (int, error)

CopyIn implements marshal.Marshallable.CopyIn.

func (*ThreadID) CopyOut

func (tid *ThreadID) CopyOut(cc marshal.CopyContext, addr usermem.Addr) (int, error)

CopyOut implements marshal.Marshallable.CopyOut.

func (*ThreadID) CopyOutN

func (tid *ThreadID) CopyOutN(cc marshal.CopyContext, addr usermem.Addr, limit int) (int, error)

CopyOutN implements marshal.Marshallable.CopyOutN.

func (*ThreadID) MarshalBytes

func (tid *ThreadID) MarshalBytes(dst []byte)

MarshalBytes implements marshal.Marshallable.MarshalBytes.

func (*ThreadID) MarshalUnsafe

func (tid *ThreadID) MarshalUnsafe(dst []byte)

MarshalUnsafe implements marshal.Marshallable.MarshalUnsafe.

func (*ThreadID) Packed

func (tid *ThreadID) Packed() bool

Packed implements marshal.Marshallable.Packed.

func (*ThreadID) SizeBytes

func (tid *ThreadID) SizeBytes() int

SizeBytes implements marshal.Marshallable.SizeBytes.

func (ThreadID) String

func (tid ThreadID) String() string

String returns a decimal representation of the ThreadID.

func (*ThreadID) UnmarshalBytes

func (tid *ThreadID) UnmarshalBytes(src []byte)

UnmarshalBytes implements marshal.Marshallable.UnmarshalBytes.

func (*ThreadID) UnmarshalUnsafe

func (tid *ThreadID) UnmarshalUnsafe(src []byte)

UnmarshalUnsafe implements marshal.Marshallable.UnmarshalUnsafe.

func (*ThreadID) WriteTo

func (tid *ThreadID) WriteTo(w io.Writer) (int64, error)

WriteTo implements io.WriterTo.WriteTo.

type Timekeeper

type Timekeeper struct {
	// contains filtered or unexported fields
}

Timekeeper manages all of the kernel clocks.

+stateify savable

func NewTimekeeper

func NewTimekeeper(mfp pgalloc.MemoryFileProvider, paramPage memmap.FileRange) (*Timekeeper, error)

NewTimekeeper returns a Timekeeper that is automatically kept up-to-date. NewTimekeeper does not take ownership of paramPage.

SetClocks must be called on the returned Timekeeper before it is usable.

func (*Timekeeper) BootTime

func (t *Timekeeper) BootTime() ktime.Time

BootTime returns the system boot real time.

func (*Timekeeper) Destroy

func (t *Timekeeper) Destroy()

Destroy destroys the Timekeeper, freeing all associated resources.

func (*Timekeeper) GetTime

func (t *Timekeeper) GetTime(c sentrytime.ClockID) (int64, error)

GetTime returns the current time in nanoseconds.

func (*Timekeeper) PauseUpdates

func (t *Timekeeper) PauseUpdates()

PauseUpdates stops clock parameter updates. This should only be used when Tasks are not running and thus cannot access the clock.

func (*Timekeeper) ResumeUpdates

func (t *Timekeeper) ResumeUpdates()

ResumeUpdates restarts clock parameter updates stopped by PauseUpdates.

func (*Timekeeper) SetClocks

func (t *Timekeeper) SetClocks(c sentrytime.Clocks)

SetClocks the backing clock source.

SetClocks must be called before the Timekeeper is used, and it may not be called more than once, as changing the clock source without extra correction could cause time discontinuities.

It must also be called after Load.

func (*Timekeeper) StateFields

func (t *Timekeeper) StateFields() []string

func (*Timekeeper) StateLoad

func (t *Timekeeper) StateLoad(stateSourceObject state.Source)

func (*Timekeeper) StateSave

func (t *Timekeeper) StateSave(stateSinkObject state.Sink)

func (*Timekeeper) StateTypeName

func (t *Timekeeper) StateTypeName() string

type UTSNamespace

type UTSNamespace struct {
	// contains filtered or unexported fields
}

UTSNamespace represents a UTS namespace, a holder of two system identifiers: the hostname and domain name.

+stateify savable

func NewUTSNamespace

func NewUTSNamespace(hostName, domainName string, userns *auth.UserNamespace) *UTSNamespace

NewUTSNamespace creates a new UTS namespace.

func UTSNamespaceFromContext

func UTSNamespaceFromContext(ctx context.Context) *UTSNamespace

UTSNamespaceFromContext returns the UTS namespace in which ctx is executing, or nil if there is no such UTS namespace.

func (*UTSNamespace) Clone

func (u *UTSNamespace) Clone(userns *auth.UserNamespace) *UTSNamespace

Clone makes a copy of this UTS namespace, associating the given user namespace.

func (*UTSNamespace) DomainName

func (u *UTSNamespace) DomainName() string

DomainName returns the domain name of this UTS namespace.

func (*UTSNamespace) HostName

func (u *UTSNamespace) HostName() string

HostName returns the host name of this UTS namespace.

func (*UTSNamespace) SetDomainName

func (u *UTSNamespace) SetDomainName(domain string)

SetDomainName sets the domain name of this UTS namespace.

func (*UTSNamespace) SetHostName

func (u *UTSNamespace) SetHostName(host string)

SetHostName sets the host name of this UTS namespace.

func (*UTSNamespace) StateFields

func (u *UTSNamespace) StateFields() []string

func (*UTSNamespace) StateLoad

func (u *UTSNamespace) StateLoad(stateSourceObject state.Source)

func (*UTSNamespace) StateSave

func (u *UTSNamespace) StateSave(stateSinkObject state.Sink)

func (*UTSNamespace) StateTypeName

func (u *UTSNamespace) StateTypeName() string

func (*UTSNamespace) UserNamespace

func (u *UTSNamespace) UserNamespace() *auth.UserNamespace

UserNamespace returns the user namespace associated with this UTS namespace.

type VDSOParamPage

type VDSOParamPage struct {
	// contains filtered or unexported fields
}

VDSOParamPage manages a VDSO parameter page.

Its memory layout looks like:

type page struct {
	// seq is a sequence counter that protects the fields below.
	seq uint64
	vdsoParams
}

Everything in the struct is 8 bytes for easy alignment.

It must be kept in sync with params in vdso/vdso_time.cc.

+stateify savable

func NewVDSOParamPage

func NewVDSOParamPage(mfp pgalloc.MemoryFileProvider, fr memmap.FileRange) *VDSOParamPage

NewVDSOParamPage returns a VDSOParamPage.

Preconditions:

  • fr is a single page allocated from mfp.MemoryFile(). VDSOParamPage does not take ownership of fr; it must remain allocated for the lifetime of the VDSOParamPage.
  • VDSOParamPage must be the only writer to fr.
  • mfp.MemoryFile().MapInternal(fr) must return a single safemem.Block.

func (*VDSOParamPage) StateFields

func (v *VDSOParamPage) StateFields() []string

func (*VDSOParamPage) StateLoad

func (v *VDSOParamPage) StateLoad(stateSourceObject state.Source)

func (*VDSOParamPage) StateSave

func (v *VDSOParamPage) StateSave(stateSinkObject state.Sink)

func (*VDSOParamPage) StateTypeName

func (v *VDSOParamPage) StateTypeName() string

func (*VDSOParamPage) Write

func (v *VDSOParamPage) Write(f func() vdsoParams) error

Write updates the VDSO parameters.

Write starts a write block, calls f to get the new parameters, writes out the new parameters, then ends the write block.

type Version

type Version struct {
	// Operating system name (e.g. "Linux").
	Sysname string

	// Operating system release (e.g. "4.4-amd64").
	Release string

	// Operating system version. On Linux this takes the shape
	// "#VERSION CONFIG_FLAGS TIMESTAMP"
	// where:
	// - VERSION is a sequence counter incremented on every successful build
	// - CONFIG_FLAGS is a space-separated list of major enabled kernel features
	//   (e.g. "SMP" and "PREEMPT")
	// - TIMESTAMP is the build timestamp as returned by `date`
	Version string
}

Version defines the application-visible system version.

type WaitOptions

type WaitOptions struct {
	// If SpecificTID is non-zero, only events from the task with thread ID
	// SpecificTID are eligible to be waited for. SpecificTID is resolved in
	// the PID namespace of the waiter (the method receiver of Task.Wait). If
	// no such task exists, or that task would not otherwise be eligible to be
	// waited for by the waiting task, then there are no waitable tasks and
	// Wait will return ECHILD.
	SpecificTID ThreadID

	// If SpecificPGID is non-zero, only events from ThreadGroups with a
	// matching ProcessGroupID are eligible to be waited for. (Same
	// constraints as SpecificTID apply.)
	SpecificPGID ProcessGroupID

	// If NonCloneTasks is true, events from non-clone tasks are eligible to be
	// waited for.
	NonCloneTasks bool

	// If CloneTasks is true, events from clone tasks are eligible to be waited
	// for.
	CloneTasks bool

	// If SiblingChildren is true, events from children tasks of any task
	// in the thread group of the waiter are eligible to be waited for.
	SiblingChildren bool

	// Events is a bitwise combination of the events defined above that specify
	// what events are of interest to the call to Wait.
	Events waiter.EventMask

	// If ConsumeEvent is true, the Wait should consume the event such that it
	// cannot be returned by a future Wait. Note that if a task exit is
	// consumed in this way, in most cases the task will be reaped.
	ConsumeEvent bool

	// If BlockInterruptErr is not nil, Wait will block until either an event
	// is available or there are no tasks that could produce a waitable event;
	// if that blocking is interrupted, Wait returns BlockInterruptErr. If
	// BlockInterruptErr is nil, Wait will not block.
	BlockInterruptErr error
}

WaitOptions controls the behavior of Task.Wait.

type WaitResult

type WaitResult struct {
	// Task is the task that reported the event.
	Task *Task

	// TID is the thread ID of Task in the PID namespace of the task that
	// called Wait (that is, the method receiver of the call to Task.Wait). TID
	// is provided because consuming exit waits cause the thread ID to be
	// deallocated.
	TID ThreadID

	// UID is the real UID of Task in the user namespace of the task that
	// called Wait.
	UID auth.UID

	// Event is exactly one of the events defined above.
	Event waiter.EventMask

	// Status is the numeric status associated with the event.
	Status uint32
}

WaitResult contains information about a waited-for event.

Directories

Path Synopsis
Package auth implements an access control model that is a subset of Linux's.
Package auth implements an access control model that is a subset of Linux's.
Package epoll provides an implementation of Linux's IO event notification facility.
Package epoll provides an implementation of Linux's IO event notification facility.
Package eventfd provides an implementation of Linux's file-based event notification.
Package eventfd provides an implementation of Linux's file-based event notification.
Package fasync provides FIOASYNC related functionality.
Package fasync provides FIOASYNC related functionality.
Package futex provides an implementation of the futex interface as found in the Linux kernel.
Package futex provides an implementation of the futex interface as found in the Linux kernel.
Package memevent implements the memory usage events controller, which periodically emits events via the eventchannel.
Package memevent implements the memory usage events controller, which periodically emits events via the eventchannel.
Package pipe provides a pipe implementation.
Package pipe provides a pipe implementation.
Package sched implements scheduler related features.
Package sched implements scheduler related features.
Package semaphore implements System V semaphores.
Package semaphore implements System V semaphores.
Package shm implements sysv shared memory segments.
Package shm implements sysv shared memory segments.
Package signalfd provides an implementation of signal file descriptors.
Package signalfd provides an implementation of signal file descriptors.
Package time defines the Timer type, which provides a periodic timer that works by sampling a user-provided clock.
Package time defines the Timer type, which provides a periodic timer that works by sampling a user-provided clock.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL