Scheduling Best Practices
Scheduling
Best practices for Pod scheduling (Affinity, Topology Spread Constraints, etc.).
What is Scheduling?
Scheduling in Kubernetes is the process of assigning Pods to Nodes. While the scheduler does this automatically based on resource availability, you can influence this decision using rules. Node Affinity attracts pods to a set of nodes (e.g., "run only on SSD nodes"). Pod Affinity/Anti-Affinity attracts or repels pods relative to other pods (e.g., "don't run two database replicas on the same node").
Official Kubernetes Scheduling Documentation
NodeSelector / NodeAffinity (NodeAutoscaler)
Node autoscaling (e.g. on GCP) requires a nodeSelector or nodeAffinity to know which node pool to scale.
TopologySpreadConstraints
Pod Topology Spread Constraints should be used whenever you can with the following configuration:
kubernetes.io/hostname:DoNotSchedule(if we absolutely want to distribute workloads over multiple nodes, otherwise useScheduleAnyway)topology.kubernetes.io/zone:ScheduleAnyway(spread pods over different power circuits)
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway ## or DoNotSchedule
nodeTaintsPolicy: Honor
labelSelector:
matchLabels:
app.kubernetes.io/instance: metrics-server
# apply topologySpreadConstraint only to pods of the same statefulset revision
matchLabelKeys:
- controller-revision-hash
# apply topologySpreadConstraint only to pods of the same deployment revision
matchLabelKeys:
- pod-template-hashNote that if you don't define a nodeSelector or nodeAffinity, all nodes will be selected for the TopologySpreadConstraint, regardless of their taints. If you only want to include nodes on which the Pod can be scheduled (nodes with no taints and nodes with taints for which the Pod has tolerations), set nodeTaintsPolicy: Honor.
Topology Pod Labels
The following labels can be used to spread pods over different nodes/zones:
| Label | Values | Description |
|---|---|---|
topology.kubernetes.io/region | nts-north, nts-south | Spread pods over different regions |
topology.kubernetes.io/zone | nts-north-1, nts-north-2, nts-south-1, nts-south-2 | Power circuits in rack, 1 & 2 are separated (not implemented yet) |
PodAntiAffinity
Instead of topologySpreadConstraints you can also use podAntiAffinity to distribute replicas over different nodes.
Important: Only works when deployment size <= node size, since anti affinity selects one single pod.
# preferred
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
gitops.natron.io/application: app1
# required
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
gitops.natron.io/application: app1PriorityClass
Pod Priority and Preemption classes available:
| PriorityClassName | Value | GlobalDefault | Description |
|---|---|---|---|
system-node-critical | 2000001000 | - | Kubernetes Default |
system-cluster-critical | 2000000000 | - | Kubernetes Default (e.g. metrics-server, dns, …) |
operations-priority | 10000000 | - | Cluster operations, which are not system critical |
best-effort | 100000 | true | Best-Effort Priority |
operations-best-effort | 10000 | - | Cluster operations, which run as best-effort (below customer workload) |
Use preemptionPolicy: Never if needed.
nodeSelector:
disktype: ssdAffinity
Advanced scheduling rules.
Pod Anti-Affinity
Spread pods across nodes/zones for high availability.
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- services
topologyKey: kubernetes.io/hostnameTopology Spread Constraints
Ensure even distribution of pods.
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: services