Documentation ¶
Index ¶
Constants ¶
const TrainingJobs = "TrainingJobs"
TrainingJobs string for registration
Variables ¶
This section is empty.
Functions ¶
func RegisterResource ¶
func RegisterResource(config *rest.Config, resourceType, resourceListType runtime.Object) *rest.Config
RegisterResource registers a resource type and the corresponding resource list type to the local Kubernetes runtime under group version "paddlepaddle.org", so the runtime could encode/decode this Go type. It also change config.GroupVersion to "paddlepaddle.org".
Types ¶
type MasterSpec ¶
type MasterSpec struct { EtcdEndpoint string `json:"etcd-endpoint"` Resources v1.ResourceRequirements `json:"resources"` }
MasterSpec defination +k8s:deepcopy-gen=true
func (*MasterSpec) DeepCopy ¶
func (in *MasterSpec) DeepCopy() *MasterSpec
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new MasterSpec.
func (*MasterSpec) DeepCopyInto ¶
func (in *MasterSpec) DeepCopyInto(out *MasterSpec)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type PserverSpec ¶
type PserverSpec struct { MinInstance int `json:"min-instance"` MaxInstance int `json:"max-instance"` Resources v1.ResourceRequirements `json:"resources"` }
PserverSpec defination +k8s:deepcopy-gen=true
func (*PserverSpec) DeepCopy ¶
func (in *PserverSpec) DeepCopy() *PserverSpec
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new PserverSpec.
func (*PserverSpec) DeepCopyInto ¶
func (in *PserverSpec) DeepCopyInto(out *PserverSpec)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type TrainerSpec ¶
type TrainerSpec struct { Entrypoint string `json:"entrypoint"` Workspace string `json:"workspace"` MinInstance int `json:"min-instance"` MaxInstance int `json:"max-instance"` Resources v1.ResourceRequirements `json:"resources"` }
TrainerSpec defination +k8s:deepcopy-gen=true
func (*TrainerSpec) DeepCopy ¶
func (in *TrainerSpec) DeepCopy() *TrainerSpec
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TrainerSpec.
func (*TrainerSpec) DeepCopyInto ¶
func (in *TrainerSpec) DeepCopyInto(out *TrainerSpec)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type TrainingJob ¶
type TrainingJob struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata"` Spec TrainingJobSpec `json:"spec"` Status TrainingJobStatus `json:"status,omitempty"` }
A TrainingJob is a Kubernetes resource, it describes a PaddlePaddle training job. As a Kubernetes resource,
- Its content must follow the Kubernetes resource definition convention.
- It must be a Go struct with JSON tags.
- It must implement the deepcopy interface.
To start a PadldePaddle training job,
(1) The user uses the paddlecloud command line tool, which sends the command line arguments to the paddlecloud HTTP server.
(2) The paddlecloud server converts the command line arguments into a TrainingJob resource and sends it to the Kubernetes API server.
(3) the EDL controller, which moinitors events about the TrainingJob resource accepted by the Kubernetes API server, converts the TrainingJob resource into the following Kubernetes resources:
(3.1) a ReplicaSet of the master process (3.2) a ReplicaSet of the parameter server proceses (3.3) a Job of trainer processes
(4) some default controllers provided by Kubernetes monitors events about ReplicaSet and Job creates and maintains the Pods.
An example TrainingJob instance:
apiVersion: paddlepaddle.org/v1 kind: TrainingJob metadata:
name: job-1
spec:
image: "paddlepaddle/paddlecloud-job" port: 7164 ports_num: 1 ports_num_for_sparse: 1 fault_tolerant: true imagePullSecrets: name: myregistrykey hostNetwork: true trainer: entrypoint: "python train.py" workspace: "/home/job-1/" min-instance: 3 max-instance: 6 resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 cpu: "800m" memory: "1Gi" requests: cpu: "500m" memory: "600Mi" pserver: min-instance: 3 max-instance: 3 resources: limits: cpu: "800m" memory: "1Gi" requests: cpu: "500m" memory: "600Mi" master: resources: limits: cpu: "800m" memory: "1Gi" requests: cpu: "500m" memory: "600Mi"
+k8s:deepcopy-gen=true +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
func (*TrainingJob) DeepCopy ¶
func (in *TrainingJob) DeepCopy() *TrainingJob
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TrainingJob.
func (*TrainingJob) DeepCopyInto ¶
func (in *TrainingJob) DeepCopyInto(out *TrainingJob)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*TrainingJob) DeepCopyObject ¶
func (in *TrainingJob) DeepCopyObject() runtime.Object
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
func (*TrainingJob) Elastic ¶
func (s *TrainingJob) Elastic() bool
Elastic returns true if the job can scale to more workers.
func (*TrainingJob) GPU ¶
func (s *TrainingJob) GPU() int
GPU convert Resource Limit Quantity to int
func (*TrainingJob) NeedGPU ¶
func (s *TrainingJob) NeedGPU() bool
NeedGPU returns true if the job need GPU resource to run.
func (*TrainingJob) String ¶
func (s *TrainingJob) String() string
type TrainingJobList ¶
type TrainingJobList struct { metav1.TypeMeta `json:",inline"` metav1.ListMeta `json:"metadata"` Items []TrainingJob `json:"items"` }
TrainingJobList defination +k8s:deepcopy-gen=true +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
func (*TrainingJobList) DeepCopy ¶
func (in *TrainingJobList) DeepCopy() *TrainingJobList
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TrainingJobList.
func (*TrainingJobList) DeepCopyInto ¶
func (in *TrainingJobList) DeepCopyInto(out *TrainingJobList)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*TrainingJobList) DeepCopyObject ¶
func (in *TrainingJobList) DeepCopyObject() runtime.Object
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
type TrainingJobSpec ¶
type TrainingJobSpec struct { // General job attributes. Image string `json:"image,omitempty"` Port int `json:"port,omitempty"` PortsNum int `json:"ports_num,omitempty"` PortsNumForSparse int `json:"ports_num_for_sparse,omitempty"` FaultTolerant bool `json:"fault_tolerant,omitempty"` Passes int `json:"passes,omitempty"` Volumes []v1.Volume `json:"volumes"` VolumeMounts []v1.VolumeMount `json:"VolumeMounts"` ImagePullSecrets []v1.LocalObjectReference `json:"imagePullSecrets,omitempty"` HostNetwork bool `josn:"hostNetwork,omitempty"` // Job components. Trainer TrainerSpec `json:"trainer"` Pserver PserverSpec `json:"pserver"` Master MasterSpec `json:"master,omitempty"` }
TrainingJobSpec defination +k8s:deepcopy-gen=true
func (*TrainingJobSpec) DeepCopy ¶
func (in *TrainingJobSpec) DeepCopy() *TrainingJobSpec
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TrainingJobSpec.
func (*TrainingJobSpec) DeepCopyInto ¶
func (in *TrainingJobSpec) DeepCopyInto(out *TrainingJobSpec)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type TrainingJobState ¶
type TrainingJobState string
TrainingJobState defination
const ( StateCreated TrainingJobState = "Created" StateRunning TrainingJobState = "Running" StateFailed TrainingJobState = "Failed" StateSucceed TrainingJobState = "Succeed" )
TrainingJobState consts
type TrainingJobStatus ¶
type TrainingJobStatus struct { State TrainingJobState `json:"state,omitempty"` Message string `json:"message,omitempty"` }
TrainingJobStatus defination +k8s:deepcopy-gen=true
func (*TrainingJobStatus) DeepCopy ¶
func (in *TrainingJobStatus) DeepCopy() *TrainingJobStatus
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new TrainingJobStatus.
func (*TrainingJobStatus) DeepCopyInto ¶
func (in *TrainingJobStatus) DeepCopyInto(out *TrainingJobStatus)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.