Scheduling

Best practices for Pod scheduling (Affinity, Topology Spread Constraints, etc.).

What is Scheduling?

Scheduling in Kubernetes is the process of assigning Pods to Nodes. While the scheduler does this automatically based on resource availability, you can influence this decision using rules. Node Affinity attracts pods to a set of nodes (e.g., "run only on SSD nodes"). Pod Affinity/Anti-Affinity attracts or repels pods relative to other pods (e.g., "don't run two database replicas on the same node").

Official Kubernetes Scheduling Documentation

NodeSelector / NodeAffinity (NodeAutoscaler)

Node autoscaling (e.g. on GCP) requires a nodeSelector or nodeAffinity to know which node pool to scale.

TopologySpreadConstraints

Pod Topology Spread Constraints should be used whenever you can with the following configuration:

kubernetes.io/hostname: DoNotSchedule (if we absolutely want to distribute workloads over multiple nodes, otherwise use ScheduleAnyway)
topology.kubernetes.io/zone: ScheduleAnyway (spread pods over different power circuits)

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway  ## or DoNotSchedule
    nodeTaintsPolicy: Honor
    labelSelector:
      matchLabels:
        app.kubernetes.io/instance: metrics-server
    # apply topologySpreadConstraint only to pods of the same statefulset revision
    matchLabelKeys:
      - controller-revision-hash
    # apply topologySpreadConstraint only to pods of the same deployment revision
    matchLabelKeys:
      - pod-template-hash

Note that if you don't define a nodeSelector or nodeAffinity, all nodes will be selected for the TopologySpreadConstraint, regardless of their taints. If you only want to include nodes on which the Pod can be scheduled (nodes with no taints and nodes with taints for which the Pod has tolerations), set nodeTaintsPolicy: Honor.

Topology Pod Labels

The following labels can be used to spread pods over different nodes/zones:

Label	Values	Description
`topology.kubernetes.io/region`	`nts-north`, `nts-south`	Spread pods over different regions
`topology.kubernetes.io/zone`	`nts-north-1`, `nts-north-2`, `nts-south-1`, `nts-south-2`	Power circuits in rack, 1 & 2 are separated (not implemented yet)

PodAntiAffinity

Instead of topologySpreadConstraints you can also use podAntiAffinity to distribute replicas over different nodes. Important: Only works when deployment size <= node size, since anti affinity selects one single pod.

# preferred
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        topologyKey: kubernetes.io/hostname
        labelSelector:
          matchLabels:
            gitops.natron.io/application: app1

# required
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          gitops.natron.io/application: app1

PriorityClass

Pod Priority and Preemption classes available:

PriorityClassName	Value	GlobalDefault	Description
`system-node-critical`	2000001000	-	Kubernetes Default
`system-cluster-critical`	2000000000	-	Kubernetes Default (e.g. metrics-server, dns, …)
`operations-priority`	10000000	-	Cluster operations, which are not system critical
`best-effort`	100000	`true`	Best-Effort Priority
`operations-best-effort`	10000	-	Cluster operations, which run as best-effort (below customer workload)

Use preemptionPolicy: Never if needed.

nodeSelector:
  disktype: ssd

Affinity

Advanced scheduling rules.

Pod Anti-Affinity

Spread pods across nodes/zones for high availability.

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - services
        topologyKey: kubernetes.io/hostname

Topology Spread Constraints

Ensure even distribution of pods.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: services

Scheduling Best Practices