k8s的亲和调度
出于高效通信等需求,偶尔需要把一些Pod对象组织在相近的位置(同一节点、机架、区域或地区等),例如应用程序的Pod及其后端提供数据服务的Pod等,我们可以认为这是一类具有亲和关系的Pod对象。
理想的实现方式是允许调度器把第一个Pod放置在任何位置,而后与其有着亲和或反亲和关系的其他Pod据此动态完成位置编排,这就是Pod亲和调度与反亲和调度的功用。Pod间的亲和关系也存在强制亲和及首选亲和的区别,它们表示的约束意义同节点亲和相似。
Pod 亲和性
Pod 亲和性(podAffinity)主要解决 Pod 可以和哪些 Pod 部署在同一个拓扑域中的问题(其中拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的 cluster、zone 等等),而 Pod 反亲和性主要是解决 Pod 不能和哪些 Pod 部署在同一个拓扑域中的问题,它们都是处理的 Pod 与 Pod 之间的关。
- [root@k8s-01 ~]# kubectl explain deploy.spec.template.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
- KIND: Deployment
- VERSION: apps/v1
-
- RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object>
-
- DESCRIPTION:
- If the affinity requirements specified by this field are not met at
- scheduling time, the pod will not be scheduled onto the node. If the
- affinity requirements specified by this field cease to be met at some point
- during pod execution (e.g. due to a pod label update), the system may or
- may not try to eventually evict the pod from its node. When there are
- multiple elements, the lists of nodes corresponding to each podAffinityTerm
- are intersected, i.e. all terms must be satisfied.
-
- Defines a set of pods (namely those matching the labelSelector relative to
- the given namespace(s)) that this pod should be co-located (affinity) or
- not co-located (anti-affinity) with, where co-located is defined as running
- on a node whose value of the label with key
matches that of - any node on which a pod of the set of pods is running
-
- FIELDS:
- labelSelector
- A label query over a set of resources, in this case pods.
-
- namespaceSelector
- A label query over the set of namespaces that the term applies to. The term
- is applied to the union of the namespaces selected by this field and the
- ones listed in the namespaces field. null selector and null or empty
- namespaces list means "this pod's namespace". An empty selector ({})
- matches all namespaces. This field is beta-level and is only honored when
- PodAffinityNamespaceSelector feature is enabled.
-
- namespaces <[]string>
- namespaces specifies a static list of namespace names that the term applies
- to. The term is applied to the union of the namespaces listed in this field
- and the ones selected by namespaceSelector. null or empty namespaces list
- and null namespaceSelector means "this pod's namespace"
-
- topologyKey
-required- - This pod should be co-located (affinity) or not co-located (anti-affinity)
- with the pods matching the labelSelector in the specified namespaces, where
- co-located is defined as running on a node whose value of the label with
- key topologyKey matches that of any node on which any of the selected pods
- is running. Empty topologyKey is not allowed.
-
- [root@k8s-01 ~]#
Pod间的亲和关系定义在spec.affinity.podAffinity字段中,而反亲和关系定义在spec.affinity.podAntiAffinity字段中,它们各自的约束特性也存在强制与首选两种,它们都支持使用如下关键字段。
- topologyKey :拓扑键,用来划分拓扑结构的节点标签,在指定的键上具有相同值的节点归属为同一拓扑;必选字段。
- labelSelector
- namespaces <[]string>:用于指示labelSelector字段的生效目标名称空间,默认为当前Pod所属的同一名称空间。
下面是测试的yaml
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: pod-affinity
- labels:
- app: pod-affinity
- spec:
- replicas: 3
- selector:
- matchLabels:
- app: pod-affinity
- template:
- metadata:
- labels:
- app: pod-affinity
- spec:
- containers:
- - name: nginx
- image: nginx
- ports:
- - containerPort: 80
- name: nginxweb
- affinity:
- podAffinity:
- requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
- - labelSelector:
- matchExpressions:
- - key: logging
- operator: In
- values:
- - true
- topologyKey: kubernetes.io/hostname
这里的 topologyKey为 kubernetes.io/hostname,即以每个node节点名为一个区域,然后在选择有pod为logging=true的pod所在的节点

查看pods,发现所有的pods都在node3节点
- [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
- pod-affinity-64bc56d789-2bczb 1/1 Running 0 5m25s 10.244.165.213 k8s-03
- pod-affinity-64bc56d789-qgtkd 1/1 Running 0 5m25s 10.244.165.211 k8s-03
- pod-affinity-64bc56d789-w95dv 1/1 Running 0 5m25s 10.244.165.208 k8s-03
- [root@k8s-01 ~]#
如果此时,我们修改部分的yaml,并将副本改成10
- - labelSelector:
- matchExpressions:
- - key: app
- operator: In
- values: ["nginx-readiness","nginx-test"]
- topologyKey: disk
运行yaml,可以看见pod分散在node2和node4 2个节点上。
- [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
- pod-affinity-94b66f75b-2cxns 1/1 Running 0 107s 10.244.7.86 k8s-04
- pod-affinity-94b66f75b-6jfrv 1/1 Running 0 107s 10.244.7.87 k8s-04
- pod-affinity-94b66f75b-7bftn 1/1 Running 0 107s 10.244.179.15 k8s-02
- pod-affinity-94b66f75b-9tqgm 1/1 Running 0 107s 10.244.7.85 k8s-04
- pod-affinity-94b66f75b-dnph9 1/1 Running 0 107s 10.244.7.88 k8s-04
- pod-affinity-94b66f75b-fznzb 1/1 Running 0 107s 10.244.179.11 k8s-02
- pod-affinity-94b66f75b-q6lv2 1/1 Running 0 107s 10.244.179.13 k8s-02
- pod-affinity-94b66f75b-s7jj5 1/1 Running 0 107s 10.244.179.16 k8s-02
- pod-affinity-94b66f75b-tn4s4 1/1 Running 0 107s 10.244.179.10 k8s-02
- pod-affinity-94b66f75b-xpbnq 1/1 Running 0 107s 10.244.7.89 k8s-04
- [root@k8s-01 ~]#
由此可见,Pod间的亲和调度能够将有密切关系或密集通信的应用约束在同一位置,通过降低通信延迟来降低性能损耗。需要注意的是,若节点上的标签在运行时发生更改导致不能再满足Pod上的亲和关系定义时,该Pod将继续在该节点上运行而不会被重新调度。另外,labelSelector属性仅匹配与被调度的Pod在同一名称空间中的Pod资源,不过也可以通过为其添加namespace字段以指定其他名称空间。
pod的亲和也支持柔性亲和,和节点亲和一致,这里不再给出具体的测试过程。
Pod 反亲和性
Pod 反亲和性(podAntiAffinity)则是反着来的,比如一个节点上运行了某个 Pod,那么我们的模板 Pod 则不希望被调度到这个节点上面去了。我们把上面的 podAffinity 直接改成podAntiAffinity。
反亲和可以实现DaemonSe+nodeSelector的效果,但是比它更加的灵活,前者如果node节点挂了,则pod就少一份,必须要等这个node起来,才会拉起pod,而反亲和的话,则可以在满足的topologyKey中,选择任意一节点,在起一个pod。因此,反亲和性调度一般用于分散同一类应用的Pod对象等,也包括把不同安全级别的Pod对象调度至不同的区域、机架或节点等。
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: pod-antiaffinity
- labels:
- app: pod-antiaffinity
- spec:
- replicas: 3
- selector:
- matchLabels:
- app: pod-antiaffinity
- template:
- metadata:
- labels:
- app: pod-antiaffinity
- spec:
- containers:
- - name: nginx
- image: nginx
- ports:
- - containerPort: 80
- name: nginxweb
- affinity:
- podAntiAffinity:
- requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
- - labelSelector:
- matchExpressions:
- - key: app
- operator: In
- values:
- - pod-antiaffinity
- topologyKey: kubernetes.io/hostname
发现每一个pod都运行在不同的节点上
- [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
- pod-antiaffinity-86566d4dd5-bpspt 1/1 Running 0 23s 10.244.61.220 k8s-01
- pod-antiaffinity-86566d4dd5-ggbgc 1/1 Running 0 23s 10.244.179.2 k8s-02
- pod-antiaffinity-86566d4dd5-q5jl4 1/1 Running 0 23s 10.244.7.83 k8s-04
- [root@k8s-01 ~]#
如果此时将副本改成5个,则有一个pod处于pending状态
- [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
- pod-antiaffinity-86566d4dd5-5h9h7 1/1 Running 0 59s 10.244.61.224 k8s-01
- pod-antiaffinity-86566d4dd5-fslqk 1/1 Running 0 59s 10.244.179.14 k8s-02
- pod-antiaffinity-86566d4dd5-n474x 1/1 Running 0 59s 10.244.165.222 k8s-03
- pod-antiaffinity-86566d4dd5-pcbhs 1/1 Running 0 59s 10.244.7.91 k8s-04
- pod-antiaffinity-86566d4dd5-vqvhv 0/1 Pending 0 59s
- [root@k8s-01 ~]#
类似地,Pod反亲和调度也支持使用柔性约束机制,调度器会尽量不把位置相斥的Pod对象调度到同一位置,但约束关系无法得到满足时,也可以违反约束规则进行调度,而非把Pod置于Pending状态。
