• Scheduling in Kubernetes


    Scheduling in Kubernetes



    Pods are the smallest deployable unit of Kubernetes where we can run our applications. Scheduling in Kubernetes is a core component as it aims to schedule the pod to a correct and available node. If you want to understand why Pods are placed onto a particular Node, or if you’re planning to know types of scheduling then this chapter is for you!



    kube-scheduler

    kube-scheduler is the default scheduler for Kubernetes and runs as part of the control plane. kube-scheduler is designed so that, if you want and need to, you can write your own scheduling component and use that instead.

    Kube-scheduler selects an optimal node to run newly created or not yet scheduled (unscheduled) pods. Since containers in pods - and pods themselves - can have different requirements, the scheduler filters out any nodes that don’t meet a Pod’s specific scheduling needs. Alternatively, the API lets you specify a node for a Pod when you create it, but this is unusual and is only done in special cases.

    In a cluster, Nodes that meet the scheduling requirements for a Pod are called feasible nodes. If none of the nodes are suitable, the pod remains unscheduled until the scheduler is able to place it.

    The scheduler finds feasible Nodes for a Pod and then runs a set of functions to score the feasible Nodes and picks a Node with the highest score among the feasible ones to run the Pod. The scheduler then notifies the API server about this decision in a process called binding.

    Factors that need to be taken into account for scheduling decisions include individual and collective resource requirements, hardware / software / policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and so on.



    Overview of node selection in kube-scheduler

    kube-scheduler selects a node for the pod in a 2-step operation:

    1. Filtering
    2. Scoring

    The filtering step finds the set of Nodes where it’s feasible to schedule the Pod. For example, the PodFitsResources filter checks whether a candidate Node has enough available resource to meet a Pod’s specific resource requests. After this step, the node list contains any suitable Nodes; often, there will be more than one. If the list is empty, that Pod isn’t (yet) schedulable.

    In the scoring step, the scheduler ranks the remaining nodes to choose the most suitable Pod placement. The scheduler assigns a score to each Node that survived filtering, basing this score on the active scoring rules.

    Finally, kube-scheduler assigns the Pod to the Node with the highest ranking. If there is more than one node with equal scores, kube-scheduler selects one of these at random.



    Use nodeName to schedule the Pod

    A scheduler watches for newly created pods and finds the best node for their assignment. It chooses the optimal node based on Kubernetes’ scheduling principles and your configuration options.

    The simplest configuration option is setting the nodeName field in podspec directly as follows:

    root@AlexRampUpVM-01:~# kubectl get node
    NAME                                   STATUS                     ROLES   AGE   VERSION
    aks-nodepool1-14102961-vmss000002      Ready                      agent   25d   v1.26.6
    aks-usernodepool-33612472-vmss000003   Ready                      agent   25d   v1.26.6
    akswin1000002                          Ready,SchedulingDisabled   agent   25d   v1.26.6
    
    root@AlexRampUpVM-01:/tmp# cat schedulingtest.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: schedulingtest
    spec:
      containers:
      - name: nginx
        image: nginx
      nodeName: aks-usernodepool-33612472-vmss000003
     
    root@AlexRampUpVM-01:/tmp# kubectl apply -f schedulingtest.yaml
    pod/schedulingtest created
    
    root@AlexRampUpVM-01:/tmp# kubectl get pod -o wide
    NAME                             READY   STATUS    RESTARTS   AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
    schedulingtest                   1/1     Running   0          6s      10.243.0.21   aks-usernodepool-33612472-vmss000003              
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    The schedulingtest pod above will run on aks-usernodepool-33612472-vmss000003 by default. However, nodeName has many limitations that lead to non-functional pods, such as unknown node names in the cloud, out of resource nodes, and nodes with intermittent network problems. For this reason, you should not use nodeName at any time other than during testing or development.



    Use nodeSelector to schedule the Pod

    Labels and selectors are key concepts in Kubernetes that allow you to organize and categorize objects, such as pods, services, and nodes, and perform targeted operations on them. Labels are key-value pairs attached to Kubernetes objects, while selectors are used to filter and select objects based on their labels. Labels and selectors are a standard method to group things together.

    Labels

    • Labels are arbitrary key-value pairs attached to Kubernetes objects to identify and categorize them.
    • They are typically used to express metadata about objects, such as their purpose, environment, version, or any other relevant information.
    • Labels are defined within the metadata section of an object and can have multiple labels assigned to a single object.

    You can check the node labels with the following command:

    root@AlexRampUpVM-01:/tmp# kubectl get node --show-labels
    NAME                                   STATUS                     ROLES   AGE   VERSION   LABELS
    aks-nodepool1-14102961-vmss000002      Ready                      agent   25d   v1.26.6   agentpool=nodepool1,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_B2s,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=eastasia ....
    aks-usernodepool-33612472-vmss000003   Ready                      agent   25d   v1.26.6   agentpool=usernodepool,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_B2ms,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=eastasia ...
    akswin1000002                          Ready,SchedulingDisabled   agent   25d   v1.26.6    ....
    
    • 1
    • 2
    • 3
    • 4
    • 5

    nodeSelector

    nodeSelector is the simplest recommended form of node selection constraint. You can add the nodeSelector field to your Pod specification and specify the node labels you want the target node to have. Kubernetes only schedules the Pod onto nodes that have each of the labels you specify.

    root@AlexRampUpVM-01:/tmp# cat scheduling_nodeselector.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: schedulingwithnodeselector
    spec:
      containers:
      - name: nginx
        image: nginx
      nodeSelector:
        agentpool: usernodepool
    
    root@AlexRampUpVM-01:/tmp# kubectl apply -f scheduling_nodeselector.yaml
    pod/schedulingwithnodeselector created
    
    root@AlexRampUpVM-01:/tmp# kubectl get pod -o wide
    NAME                             READY   STATUS    RESTARTS   AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
    schedulingtest                   1/1     Running   0          58m     10.243.0.21   aks-usernodepool-33612472-vmss000003              
    schedulingwithnodeselector       1/1     Running   0          17s     10.243.0.6    aks-usernodepool-33612472-vmss000003              
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    For the schedulingwithnodeselector pod above, Kubernetes Scheduler will find a node with the agentpool: usernodepool label.

    The use of nodeSelector efficiently constrains pods to run on nodes with specific labels. However, its use is only constrained with labels and their values. There are two more comprehensive features in Kubernetes to express more complicated scheduling requirements: node affinity, to mark pods to attract them to a set of nodes; and taints and tolerations, to mark nodes to repel a set of pods. These features are discussed below.



    Use nodeAffinity to schedule the Pod

    Node affinity is a set of constraints defined on pods that determine which nodes are eligible for scheduling. It’s possible to define hard and soft requirements for the pods’ node assignments using affinity rules. For instance, you can configure a pod to run only the nodes with GPUs and preferably with NVIDIA_TESLA_V100 for your deep learning workload. The scheduler evaluates the rules and tries to find a suitable node within the defined constraints. Like nodeSelectors, node affinity rules work with the node labels; however, they are more powerful than nodeSelectors.

    There are four affinity rules you can add to podspec:

    • requiredDuringSchedulingIgnoredDuringExecution
    • requiredDuringSchedulingRequiredDuringExecution
    • preferredDuringSchedulingIgnoredDuringExecution
    • preferredDuringSchedulingRequiredDuringExecution

    These four rules consist of two criteria: required or preferred, and two stages: Scheduling and Execution. Rules starting with required describe hard requirements that must be met. Rules beginning with preferred are soft requirements that will be enforced but not guaranteed. The Scheduling stage refers to the first assignment of the pod to the nodes. The Execution stage applies to situations where node labels change after the scheduling assignment.

    If a rule is stated as IgnoredDuringExecution, the scheduler will not check its validity after the first assignment. However, if the rule is specified with RequiredDuringExecution, the scheduler will always ensure the rule’s validity by moving the pod to a suitable node.

    Check out the following example to help you grasp these affinities:

    root@AlexRampUpVM-01:/tmp# kubectl get node --show-labels
    NAME                                   STATUS                     ROLES   AGE     VERSION   LABELS
    aks-nodepool1-14102961-vmss000002      Ready                      agent   25d     v1.26.6   ...topology.kubernetes.io/region=eastasia,topology.kubernetes.io/zone=0
    aks-usernodepool-33612472-vmss000003   Ready                      agent   25d     v1.26.6   ...topology.kubernetes.io/region=eastasia,topology.kubernetes.io/zone=0
    aks-usernodepool-33612472-vmss000004   Ready                      agent   8m25s   v1.26.6   ...topology.kubernetes.io/region=eastasia,topology.kubernetes.io/zone=0
    akswin1000002                          Ready,SchedulingDisabled   agent   25d     v1.26.6   ...topology.kubernetes.io/region=eastasia,topology.kubernetes.io/zone=0
    
    root@AlexRampUpVM-01:/tmp# cat scheduling_nodeaffinity.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: schedulingwithnodeaffinity
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/region
                operator: In
                values:
                - eastasia
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "1"
                - "2"
      containers:
      - name: nginx
        image: nginx
    
    
    root@AlexRampUpVM-01:/tmp# kubectl apply -f scheduling_nodeaffinity.yaml
    pod/schedulingwithnodeaffinity created
    
    root@AlexRampUpVM-01:/tmp# kubectl get pod -o wide|grep scheduling
    schedulingtest                   1/1     Running   0          75m     10.243.0.21   aks-usernodepool-33612472-vmss000003              
    schedulingwithnodeaffinity       1/1     Running   0          25s     10.243.0.17   aks-usernodepool-33612472-vmss000004              
    schedulingwithnodeselector       1/1     Running   0          17m     10.243.0.6    aks-usernodepool-33612472-vmss000003              
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    The schedulingwithnodeaffinity pod above has a node affinity rule indicating that Kubernetes Scheduler should only place the pod to a node in the eastasia region. The second rule indicates that the “zone 1” or “zone 2” should be preferred.

    Using affinity rules, you can make Kubernetes scheduling decisions work for your custom requirements.



    Use Taints and Tolerations to schedule the Pod

    Not all Kubernetes nodes are the same in a cluster. It’s possible to have nodes with special hardware, such as GPU, disk, or network capabilities. Similarly, you may need to dedicate some nodes for testing, data protection, or user groups. Taints can be added to the nodes to repel pods, as in the following example:

    root@AlexRampUpVM-01:/tmp# kubectl taint nodes aks-usernodepool-33612472-vmss000004 test-environment=true:NoSchedule
    node/aks-usernodepool-33612472-vmss000004 tainted
    
    • 1
    • 2

    With taint test-environment=true:NoSchedule, Kubernetes Scheduler will not assign any pod unless it has matching toleration in the podspec:

    root@AlexRampUpVM-01:/tmp# cat schedulingwithtoleration.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: schedulingwithtoleration
    spec:
      containers:
      - name: nginx
        image: nginx
      tolerations:
      - key: "test-environment"
        operator: "Exists"
        effect: "NoSchedule"
    
    root@AlexRampUpVM-01:/tmp# kubectl apply -f schedulingwithtoleration.yaml
    pod/schedulingwithtoleration created
    
    root@AlexRampUpVM-01:/tmp# kubectl get pod -o wide
    NAME                             READY   STATUS    RESTARTS   AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
    schedulingwithtoleration         1/1     Running   0          5s      10.243.0.16   aks-usernodepool-33612472-vmss000004              
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    Taints and tolerations work together to make Kubernetes Scheduler dedicate some nodes and assign only specific pods.



  • 相关阅读:
    Pyecharts一文速学-绘制桑基图详解+Python代码
    python的str.find()用法及实例
    IDEA的database使用教程(使用mysql数据库)
    ssm技术
    Python基础内容训练9(文件操作)
    知识点!你知道什么是幂等请求吗?
    数学物理专题复习
    开发 packet-beat 插件
    差分约束原理及其应用
    使用helm快速安装Nightingale夜莺监控系统
  • 原文地址:https://blog.csdn.net/mukouping82/article/details/133970875