super-scheduling

module
v0.0.0-...-3dc4fa0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2021 License: Apache-2.0

README

Super scheduling

Introduction

This project includes a topology-scheduler and a descheduler extened from descheduler.

topology-scheduler will help scheduling pods cross zones, regions or clusters.

We would like this project be merged by upstream in the future, so crd and codes includes xxx.scheduling.sigs.k8s.io

Why we need this

TopologySpreadConstraint helps schedule pods with desired skew, but it can not solve the issue: schedule desired replicas to a zone, region or cluster, e.g.

zoneA: 6 Pods
zoneB: 1 Pods
zoneC: 2 Pods

Install

kube-scheduler

1 Apply crd

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: topologyschedulingpolicies.scheduling.sigs.k8s.io
spec:
  conversion:
    strategy: None
  group: scheduling.sigs.k8s.io
  names:
    kind: TopologySchedulingPolicy
    listKind: TopologySchedulingPolicyList
    plural: topologyschedulingpolicies
    shortNames:
      - tsp
      - tsps
    singular: topologyschedulingpolicy
  scope: Namespaced
  version: v1alpha1
  versions:
    - name: v1alpha1
      served: true
      storage: true

if your cluster only support kubescheduler.config.k8s.io/v1, please replace this with v1.

2 deploy scheduler

Replace the kube-scheudler with this one, and add a config like this when starting scheduler.

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: true
clientConnection:
  kubeconfig: "REPLACE_ME_WITH_KUBE_CONFIG_PATH"
profiles:
  - schedulerName: default-multicluster
    plugins:
      preFilter:
        enabled:
          - name: TopologyScheduling
      filter:
        enabled:
          - name: TopologyScheduling
        disabled:
          - name: "*"
      score:
        enabled:
          - name: TopologyScheduling
        disabled:
          - name: "*"
      reserve:
        enabled:
          - name: TopologyScheduling
    pluginConfig:
      - name: TopologyScheduling
        args:
          kubeConfigPath: "REPLACE_ME_WITH_KUBE_CONFIG_PATH"

If you want to enable multi-cluster, enable the MultiClusterScheduling in the config, as follow:

      filter:
        enabled:
          - name: MultiClusterScheduling

3 deploy descheduler

descheduler should be deployed as deployment in cluster

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      RemovePodsViolatingTopologySchedulingPolicy:
        enabled: true
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: descheduler-cluster-role
  namespace: kube-system
rules:
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "get", "watch", "list", "delete", "patch"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: descheduler-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: descheduler-cluster-role-binding
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: descheduler-cluster-role
subjects:
  - name: descheduler-sa
    kind: ServiceAccount
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: descheduler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: descheduler
  replicas: 1
  template:
    metadata:
      labels:
        app: descheduler
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
      tolerations:
      - effect: NoSchedule
        key: role
        value: not-vk
        operator: Equal
      priorityClassName: system-cluster-critical
      containers:
      - name: descheduler
        image: ${you image}
        volumeMounts:
        - mountPath: /policy-dir
          name: policy-volume
        command:
        - "/bin/descheduler"
        args:
        - "--policy-config-file=/policy-dir/policy.yaml"
        - "--v=3"
      restartPolicy: "Always"
      serviceAccountName: descheduler-sa
      volumes:
      - name: policy-volume
        configMap:
          name: descheduler-policy-configmap

Use Case

multi zone
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: TopologySchedulingPolicy
metadata:
  name: policy-zone
spec:
  deployPlacement:
    - name: sh-1
      replicas: 6
    - name: nj-2
      replicas: 3
  labelSelector:
    matchLabels:
      cluster-test: "true"
  topologyKey: failure-domain.beta.kubernetes.io/zone
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: env
  name: test
  namespace: default
spec:
  replicas: 9
  selector:
    matchLabels:
      app: env
  template:
    labels:
      app: env
      cluster-test: "true"
      topology-scheduling-policy.scheduling.sigs.k8s.io: policy-zone
    spec:
      containers:
        - image: nginx:latest
          imagePullPolicy: Always
          name: nginx
          resources: { }
multi region
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: TopologySchedulingPolicy
metadata:
  name: policy-region
spec:
  deployPlacement:
    - name: nj
      replicas: 6
    - name: sh
      replicas: 3
  labelSelector:
    matchLabels:
      cluster-test: "true"
  topologyKey: failure-domain.beta.kubernetes.io/region
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: env
  name: test
  namespace: default
spec:
  replicas: 9
  selector:
    matchLabels:
      app: env
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: env
        cluster-test: "true"
        topology-scheduling-policy.scheduling.sigs.k8s.io: policy-region
    spec:
      containers:
        - image: nginx:latest
          imagePullPolicy: Always
          name: nginx
          resources: { }
multi cluster

This project also can be used in multi-cluster scene by deploy the tensile-kube, with descheduler in tensile-kube not deployed.

For example, we add a label cluster-name..scheduling.sigs.k8s.io: cluster1 to a virtual node.

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: TopologySchedulingPolicy
metadata:
  name: policy-cluster
spec:
  deployPlacement:
    - name: cluster1
      replicas: 6
    - name: cluster2
      replicas: 3
  labelSelector:
    matchLabels:
      cluster-test: "true"
  topologyKey: cluster-name..scheduling.sigs.k8s.io
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: env
  name: test
  namespace: default
spec:
  replicas: 9
  selector:
    matchLabels:
      app: env
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: env
        cluster-test: "true"
        cluster-name..scheduling.sigs.k8s.io: policy-cluster
    spec:
      containers:
        - image: nginx:latest
          imagePullPolicy: Always
          name: nginx
          resources: { }

Directories

Path Synopsis
cmd
descheduler/app
Package app implements a Server object for running the descheduler.
Package app implements a Server object for running the descheduler.
descheduler/app/options
Package options provides the descheduler flags
Package options provides the descheduler flags
pkg
apis/config/v1beta1
Package v1beta1 is the v1beta1 version of the API.
Package v1beta1 is the v1beta1 version of the API.
apis/scheduling/v1alpha1
Package v1alpha1 is the v1alpha1 version of the API.
Package v1alpha1 is the v1alpha1 version of the API.
generated/clientset/versioned
This package has the automatically generated clientset.
This package has the automatically generated clientset.
generated/clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
generated/clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
generated/clientset/versioned/typed/scheduling/v1alpha1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
generated/clientset/versioned/typed/scheduling/v1alpha1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL