亲和性调度
source link: https://tangxusc.github.io/2019/05/%E4%BA%B2%E5%92%8C%E6%80%A7%E8%B0%83%E5%BA%A6/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
亲和性调度
May 22, 2019
in k8s
在kubernetes中对pod的调度由调度控制器负责,在调度的时候有很大的随机性.
但是我们很多时候需要容器更多的分散在不同的node上,有一些情况又需要尽量集中在一起,减少网络开销
将pod调度到特定的节点上,其实有以下三种方法:
- 直接在pod上设置
spec.nodeName=节点名称
- 使用nodeSelector,在pod中设置
spec.nodeSelector
的map结构为目标node的label - 使用亲和性调度(affinity)
- 自定义调度器
上面几种方法也都有优缺点:
- nodeName定的太死了,node宕机后,pod就无法启动.
- nodeSelector以label为基础元素提供了调度,但是还是无法做到更细致的调度(例如,两个pod放在不同的node上),并且nodeSelector官方明确表示最终会被弃用
- 自定义调度虽然拥有最高的自由度,可复杂度也是最高的(多集群调度器mutilcluster-scheduler就是这么干的)
- 亲和性调度会提高master的cpu使用率,因为会进行计算
看到了优缺点,显然亲和性调度更好.
亲和性调度在kubernetes中分为两个字段来标示:
requiredDuringSchedulingIgnoredDuringExecution
: 可以理解为必须满足的条件.
preferredDuringSchedulingIgnoredDuringExecution
: 可以理解为推荐满足的条件.
DuringSchedulingIgnoredDuringExecution
表示在调度期间忽略节点/pod的变动条件
其中matchExpressions.operator
的取值又有如下几个值:
//源代码来自:k8s.io/[email protected]/pkg/apis/meta/v1/types.go
// A label selector operator is the set of operators that can be used in a selector requirement.
type LabelSelectorOperator string
const (
LabelSelectorOpIn LabelSelectorOperator = "In"
LabelSelectorOpNotIn LabelSelectorOperator = "NotIn"
LabelSelectorOpExists LabelSelectorOperator = "Exists"
LabelSelectorOpDoesNotExist LabelSelectorOperator = "DoesNotExist"
)
In
: label在值列表中
NotIn
: label不在值列表中
Exists
: label存在(空字符串也算存在),并不关心值
DoesNotExist
: label不存在
在调度的过程中,还存在一个权重(weight
)的字段.该字段用于标示preferredDuringSchedulingIgnoredDuringExecution
在调度的时候,各数组中的值的权重,范围为0-100
node亲和性调度
节点的亲和力调度全部在spec.affinity.nodeAffinity
字段下,实例如下
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
# 只有这一个字段,说白了,只有亲和性
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
# 可配置多个match,满足任意一个即可调度
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
# 可配置为多个带权重的建议性调度,通过计算选择出最合适的节点进行调度
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
注意: node亲和性调度只存在亲和性,不存在反亲和性调度,在kubernetes的pod源码中也只找到了亲和性的字段.
pod亲和性和反亲和性
注意: pod亲和性需要master进行大量计算,对于特别多的pod的集群,性能堪忧.
再注意: pod的反亲和性需要集群中的节点有适量的标签作为
topologyKey
,如果缺少,则导致意外行为.
pod的亲和性和反亲和性由spec.affinity.podAffinity
和spec.affinity.podAntiAffinity
标示.实例如下:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
# 注意,pod是单独的亲和性
podAffinity:
# 亲和性一样提供了 "必须" 和 "建议" 规则,这里只写出了一个
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: failure-domain.beta.kubernetes.io/zone
# pod单独提供了反亲和性
podAntiAffinity:
# 反亲和性一样提供了 "必须" 和 "建议" 规则,这里只写出了一个
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: failure-domain.beta.kubernetes.io/zone
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
其中的topologyKey
标示:按照node的failure-domain.beta.kubernetes.io/zone
标签值分组,统计其中的pod的label,然后用来匹配亲和性和反亲和性.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: web-server
spec:
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.12-alpine
在这个示例中,我们对于pod的调度如下:
- 以主机分组(topologyKey: “kubernetes.io/hostname”),必须(requiredDuring)反对(podAntiAffinity) pod具有
app=webstore
的pod调度在一起 - 以主机分组(topologyKey: “kubernetes.io/hostname”),必须(requiredDuring)亲和(podAffinity) pod具有
app= store
标签
连接多集群的几个工具(估计要翻墙)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK