4

亲和性调度

 2 years ago
source link: https://tangxusc.github.io/2019/05/%E4%BA%B2%E5%92%8C%E6%80%A7%E8%B0%83%E5%BA%A6/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

亲和性调度

May 22, 2019

in k8s

在kubernetes中对pod的调度由调度控制器负责,在调度的时候有很大的随机性.

但是我们很多时候需要容器更多的分散在不同的node上,有一些情况又需要尽量集中在一起,减少网络开销

将pod调度到特定的节点上,其实有以下三种方法:

  1. 直接在pod上设置spec.nodeName=节点名称
  2. 使用nodeSelector,在pod中设置spec.nodeSelector的map结构为目标node的label
  3. 使用亲和性调度(affinity)
  4. 自定义调度器

上面几种方法也都有优缺点:

  • nodeName定的太死了,node宕机后,pod就无法启动.
  • nodeSelector以label为基础元素提供了调度,但是还是无法做到更细致的调度(例如,两个pod放在不同的node上),并且nodeSelector官方明确表示最终会被弃用
  • 自定义调度虽然拥有最高的自由度,可复杂度也是最高的(多集群调度器mutilcluster-scheduler就是这么干的)
  • 亲和性调度会提高master的cpu使用率,因为会进行计算

看到了优缺点,显然亲和性调度更好.

亲和性调度在kubernetes中分为两个字段来标示:

requiredDuringSchedulingIgnoredDuringExecution: 可以理解为必须满足的条件.

preferredDuringSchedulingIgnoredDuringExecution: 可以理解为推荐满足的条件.

DuringSchedulingIgnoredDuringExecution表示在调度期间忽略节点/pod的变动条件

其中matchExpressions.operator的取值又有如下几个值:

//源代码来自:k8s.io/[email protected]/pkg/apis/meta/v1/types.go
// A label selector operator is the set of operators that can be used in a selector requirement.
type LabelSelectorOperator string

const (
	LabelSelectorOpIn           LabelSelectorOperator = "In"
	LabelSelectorOpNotIn        LabelSelectorOperator = "NotIn"
	LabelSelectorOpExists       LabelSelectorOperator = "Exists"
	LabelSelectorOpDoesNotExist LabelSelectorOperator = "DoesNotExist"
)

In: label在值列表中

NotIn: label不在值列表中

Exists: label存在(空字符串也算存在),并不关心值

DoesNotExist: label不存在

在调度的过程中,还存在一个权重(weight)的字段.该字段用于标示preferredDuringSchedulingIgnoredDuringExecution在调度的时候,各数组中的值的权重,范围为0-100

node亲和性调度

节点的亲和力调度全部spec.affinity.nodeAffinity字段下,实例如下

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
  	# 只有这一个字段,说白了,只有亲和性
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        # 可配置多个match,满足任意一个即可调度
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      # 可配置为多个带权重的建议性调度,通过计算选择出最合适的节点进行调度
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

注意: node亲和性调度只存在亲和性,不存在反亲和性调度,在kubernetes的pod源码中也只找到了亲和性的字段.

pod亲和性和反亲和性

注意: pod亲和性需要master进行大量计算,对于特别多的pod的集群,性能堪忧.

再注意: pod的反亲和性需要集群中的节点有适量的标签作为topologyKey,如果缺少,则导致意外行为.

pod的亲和性和反亲和性由spec.affinity.podAffinityspec.affinity.podAntiAffinity标示.实例如下:

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    # 注意,pod是单独的亲和性
    podAffinity:
      # 亲和性一样提供了 "必须" 和  "建议" 规则,这里只写出了一个
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    # pod单独提供了反亲和性
    podAntiAffinity:
      # 反亲和性一样提供了 "必须" 和  "建议" 规则,这里只写出了一个
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: failure-domain.beta.kubernetes.io/zone
  containers:
  - name: with-pod-affinity
    image: k8s.gcr.io/pause:2.0

其中的topologyKey标示:按照node的failure-domain.beta.kubernetes.io/zone标签值分组,统计其中的pod的label,然后用来匹配亲和性和反亲和性.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-store
            topologyKey: "kubernetes.io/hostname"
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: nginx:1.12-alpine

在这个示例中,我们对于pod的调度如下:

  1. 以主机分组(topologyKey: “kubernetes.io/hostname”),必须(requiredDuring)反对(podAntiAffinity) pod具有 app=webstore的pod调度在一起
  2. 以主机分组(topologyKey: “kubernetes.io/hostname”),必须(requiredDuring)亲和(podAffinity) pod具有app= store标签

连接多集群的几个工具(估计要翻墙)

kubernetes亲和性官方文档


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK