3

Docker 与 K8S学习笔记(二十五)—— Pod的各种调度策略(中) - 阿拉懒神灯

 2 years ago
source link: https://www.cnblogs.com/alalazy/p/16514426.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Docker 与 K8S学习笔记(二十五)—— Pod的各种调度策略(中)

一、Pod亲和性与反亲和性调度

在实际应用中,我们往往会遇到特殊的Pod调度需求:存在某些相互依赖、频繁调用的Pod,他们需要尽可能部署在同一个节点、网段、机柜或区域中,这就是Pod间亲和性,反之,出于避免竞争或容错需求,我们需要使某些Pod尽可能远离某些特定Pod时,这就是Pod间反亲和性。简单的说,就是相关的两种或多种Pod是否可以在同一个拓扑域中共存或互斥。

拓扑域:一个拓扑域由一些Node组成,这些Node通常有相同的地理空间坐标,如部署在同一个机架、机房或地区,我们一般用region表示机架或机房的拓扑区域,用Zone表示地区这种跨度大的拓扑区域。

Pod的亲和性与反亲和性设置是通过在Pod的定义上增加topologyKey属性来实现的,我们来看一下具体的例子:我们部署web应用极其对应的redis缓存服务,一般情况下我们希望redis多个实例尽可能分散部署,而web应用和redis缓存尽可能部署在一台机器上,但多个web应用副本应当分散部署,这该如何实现呢?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:latest

上面是redis部署配置,包含三个副本以及 app=store 标签选择器。Deployment 中配置了 PodAntiAffinity,通过设置topologyKey为kbernetes.io/hostname确保调度器不会将三个副本调度到一个节点上。

[root@kubevm1 workspace]# kubectl create -f redis_deployment.yml
deployment.apps/redis-cache created
[root@kubevm1 workspace]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
redis-cache-5d67cf4885-4vj4q   1/1     Running   0          15m   10.244.0.7    kubevm1   <none>           <none>
redis-cache-5d67cf4885-bn7dt   1/1     Running   0          15m   10.244.2.18   kubevm3   <none>           <none>
redis-cache-5d67cf4885-rx2b9   1/1     Running   0          15m   10.244.1.35   kubevm2   <none>           <none>

接下来是web应用的部署配置:通过配置 podAntiAffinity 以及 podAffinity。要求将其副本与 包含 app=store 标签的 Pod 放在同一个节点上;同时也要求 web-app 的副本不被调度到同一个节点上。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  selector:
    matchLabels:
      app: web-app
  replicas: 3
  template:
    metadata:
      labels:
        app: web-app
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-app
            topologyKey: "kubernetes.io/hostname"
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: nginx:latest

我们来看一下部署结果:

[root@kubevm1 workspace]# kubectl get pods -o wide
NAME                           READY   STATUS             RESTARTS   AGE     IP            NODE      NOMINATED NODE   READINESS GATES
redis-cache-5d67cf4885-4vj4q   1/1     Running            0          20m     10.244.0.7    kubevm1   <none>           <none>
redis-cache-5d67cf4885-bn7dt   1/1     Running            0          20m     10.244.2.18   kubevm3   <none>           <none>
redis-cache-5d67cf4885-rx2b9   1/1     Running            0          20m     10.244.1.35   kubevm2   <none>           <none>
web-server-f98798775-87fdn     1/1     Running            0          3m33s   10.244.1.36   kubevm2   <none>           <none>
web-server-f98798775-wd7vk     1/1     Running            0          3m33s   10.244.0.8    kubevm1   <none>           <none>
web-server-f98798775-whlbr     1/1     Running            0          3m33s   10.244.2.19   kubevm3   <none>           <none>

二、污点与容忍度

节点亲和性使得Pod能够被调度到某些Node上运行,污点则相反,它可以让Node拒绝Pod的运行,比如有些Node磁盘满了,需要清理,或者它的内存或CPU占用较高,这个时候我们不希望将新的Pod调度过去。当然,被标记为污点的Node并非故障不可用的节点,我们还可以通过设置容忍度将某些Pod调度过来。默认情况下,在Node上设置一个或多个污点标签后除非Pod明确声明能够容忍这些污点,否则无法调度到这些Node上运行,我们用一个例子演示一下。

[root@kubevm1 ~]# kubectl taint nodes kubevm2 disk=full:NoSchedule
node/kubevm2 tainted

我们这里给kubevm2设置污点标签,此标签的键为disk,值为full,其效果是NoSchedule,这样除非Pod明确声明可以容忍磁盘满的Node,否则将不会调度到kubevm2上,我们还是以上面redis的部署为例,当不设置容忍度时的情况是如何的:

[root@kubevm1 workspace]# kubectl create -f redis_deployment.yml
[root@kubevm1 workspace]# kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
redis-cache-5d67cf4885-2xfdt   0/1     Pending   0          28s
redis-cache-5d67cf4885-j2zxh   1/1     Running   0          28s
redis-cache-5d67cf4885-mbk5g   1/1     Running   0          28s

由于redis的部署中设置了Pod反亲和性,即每一个副本必须部署在不同节点上,所以可以看到当kubevm2设置污点后,redis其中一个Pod一直处于Pending状态无法调度。ok,接下来我们声明一下容忍度试试:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      tolerations:
      - key: "disk"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:latest

我们创建此deployment看看效果:

[root@kubevm1 workspace]# kubectl get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
redis-cache-d67c79fd7-r2mw9   1/1     Running   0          44s   10.244.2.21   kubevm3   <none>           <none>
redis-cache-d67c79fd7-t2vsh   1/1     Running   0          44s   10.244.0.10   kubevm1   <none>           <none>
redis-cache-d67c79fd7-xppql   1/1     Running   0          44s   10.244.1.37   kubevm2   <none>           <none>

我们可以看到,通过设置容忍度,Pod也可以调度到kubevm2上了。

这里要注意,在Pod的容忍度声明中,key和effect需要与污点的设置保持一致,并满足以下两个条件之一:

  • operator的值时Exists,则无需指定value;

  • operator的值是Equal则需要指定value并且和污点的value相同。

所以我们也可以这样设置容忍度:

tolerations:
- key: "disk"
  operator: "Equal"
  value: "full"
  effect: "NoSchedule"

除此之外还有两种特殊情况需要注意:

  • 如果一个容忍度的 key 为空且 operator 为 Exists, 则能容忍所有污点;

  • 如果 effect 为空,则能匹配所有effect。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK