3

1.15 kubelet 在 nodefs 容量富裕下循环 reclaim ephemeral-storage

 2 years ago
source link: https://zhangguanzhang.github.io/2021/10/29/kubelet-ephemeral-storage-loop-evicted/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

1.15 kubelet 在 nodefs 容量富裕下循环 reclaim ephemeral-storage



字数统计: 1.7k阅读时长: 9 min
 2021/10/29  108  Share

现场 k8s node 很多 pod 都被硬性驱逐显示 Evicted ,现场人员查看分区容量和 inode 都正常,但是一直 reclaim ephemeral-storage

$ uname -a
Linux xxx-2 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
CentOS Linux release 7.4.1708 (Core)
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.5",
"gitCommit": "20c265fef0741dd71a66480e35bd69f18351daea",
"gitTreeState": "clean",
"buildDate": "2019-10-15T19:16:51Z",
"goVersion": "go1.12.10",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.5",
"gitCommit": "20c265fef0741dd71a66480e35bd69f18351daea",
"gitTreeState": "clean",
"buildDate": "2019-10-15T19:07:57Z",
"goVersion": "go1.12.10",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ docker info
Containers: 5
Running: 4
Paused: 0
Stopped: 1
Images: 40
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 62.91GiB
Name: SCJY-2
ID: XZ33:PHUQ:U2CI:7PXH:SYFG:Y6LK:3K3U:XXM6:QJWP:U3B3:MW4M:XPJS
Docker Root Dir: /data/kube/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
reg.xxx.lan:5000
treg.yun.xxx.cn
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
Product License: Community Engine

向日葵远程上去看了下,根分区容量都是正常的,inode 也是。看了下 uptime -s 重启过,现场说重启过还是没用。重启 kubelet 的话,看了下还是一直 reclaim ephemeral-storage

$ du -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rootvg-lvroot 30G 5.2G 25G 18% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 160K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sdb 600G 36G 565G 6% /data
/dev/sda1 1014M 160M 855M 16% /boot
/dev/mapper/rootvg-lvopt 10G 33M 10G 1% /opt
/dev/mapper/rootvg-lvhome 1014M 39M 976M 4% /home
/dev/mapper/rootvg-lvvar 2.0G 1.2G 888M 57% /var
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/788ee4620da0a3f76ef5f4b24755a68de0e66c8f2425d8332d5a792116d7659f/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/d2b5f08e9873f5c9365aaf57eeca492734631a3842ccb2f379aa89998b0c7304/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/c4793b6c3f774cc960ef23e18b61405040698be698306ee993d4d501bdcf485a/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/b5a0fc544935db77c92bd978db9c1c7018e5e09bba9d2bf53bd300e96c656cec/merged
shm 64M 0 64M 0% /data/kube/docker/containers/ad86ab9b01e1ce0d62e1f98249274d9bfe75eca6efd8ce0e8f1c591d5570d75f/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/e3ebeac9a82264869429f44ea6834bcbc94b79013621490c071ef002b4b8e90e/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/a917bd3b8006198a58900efb5c82c6e162cfc4e732c7e588eaadfb59294ea22b/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/aa52df1894ad495f4f269d77ddd90954fdc7bbd0fbf25d9d4aa0674a76ff6a6c/mounts/shm
tmpfs 6.3G 12K 6.3G 1% /run/user/42
tmpfs 6.3G 0 6.3G 0% /run/user/1003
tmpfs 6.3G 0 6.3G 0% /run/user/1000

$ dh -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/rootvg-lvroot 15726592 184878 15541714 2% /
devtmpfs 8242230 527 8241703 1% /dev
tmpfs 8246150 41 8246109 1% /dev/shm
tmpfs 8246150 735 8245415 1% /run
tmpfs 8246150 16 8246134 1% /sys/fs/cgroup
/dev/sdb 314572800 303816 314268984 1% /data
/dev/sda1 524288 327 523961 1% /boot
/dev/mapper/rootvg-lvopt 5242880 7 5242873 1% /opt
/dev/mapper/rootvg-lvhome 524288 397 523891 1% /home
/dev/mapper/rootvg-lvvar 1048576 10179 1038397 1% /var
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/788ee4620da0a3f76ef5f4b24755a68de0e66c8f2425d8332d5a792116d7659f/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/d2b5f08e9873f5c9365aaf57eeca492734631a3842ccb2f379aa89998b0c7304/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/c4793b6c3f774cc960ef23e18b61405040698be698306ee993d4d501bdcf485a/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/b5a0fc544935db77c92bd978db9c1c7018e5e09bba9d2bf53bd300e96c656cec/merged
shm 8246150 1 8246149 1% /data/kube/docker/containers/ad86ab9b01e1ce0d62e1f98249274d9bfe75eca6efd8ce0e8f1c591d5570d75f/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/e3ebeac9a82264869429f44ea6834bcbc94b79013621490c071ef002b4b8e90e/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/a917bd3b8006198a58900efb5c82c6e162cfc4e732c7e588eaadfb59294ea22b/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/aa52df1894ad495f4f269d77ddd90954fdc7bbd0fbf25d9d4aa0674a76ff6a6c/mounts/shm
tmpfs 8246150 9 8246141 1% /run/user/42
tmpfs 8246150 1 8246149 1% /run/user/1003
tmpfs 8246150 1 8246149 1% /run/user/1000

$ kubectl describe node xx.xx.112.135
...
Capacity:
cpu: 32
ephemeral-storage: 2038Mi
hugepages-2Mi: 0
memory: 65969200Ki
pods: 110
Allocatable:
cpu: 31800m
ephemeral-storage: 1014Mi
hugepages-2Mi: 0
memory: 65469200Ki
pods: 110
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 3m57s (x1434 over 4h3m) kubelet, xx.xx.112.135 Attempting to reclaim ephemeral-storage
Normal Starting 37s kubelet, xx.xx.112.135 Starting kubelet.
Normal NodeHasSufficientMemory 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37s (x2 over 37s) kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasSufficientPID
Normal NodeNotReady 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeNotReady
Normal NodeAllocatableEnforced 37s kubelet, xx.xx.112.135 Updated Node Allocatable limit across pods
Normal NodeReady 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeReady
Normal NodeHasDiskPressure 27s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasDiskPressure
Warning EvictionThresholdMet 7s (x4 over 37s) kubelet, xx.xx.112.135 Attempting to reclaim ephemeral-storage

看了一会儿后发现上面的 ephemeral-storage 不对,Capacity 居然是 2038Mi

源码的一些探索

本地开发环境起了下 kubelet 调试了下,一些信息:

./build/run.sh make kubelet GOFLAGS="-v -tags=nokmem" GOGCFLAGS="all=-N -l"  KUBE_BUILD_PLATFORMS=linux/amd64

cp _output/dockerized/bin/linux/amd64/kubelet .

dlv exec --check-go-version=false ./kubelet -- --cgroup-driver=systemd

# 推荐下面两个断点
vendor/github.com/google/cadvisor/container/docker/handler.go#L421

vendor/github.com/google/cadvisor/container/docker/handler.go:364

724: func (self *manager) GetFsInfo(label string) ([]v2.FsInfo, error) {
=> 725: var empty time.Time
726: // Get latest data from filesystems hanging off root container.
727: stats, err := self.memoryCache.RecentStats("/", empty, empty, 1)
728: if err != nil {
729: return nil, err
730: }
(dlv) so
> k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*manager).getFsInfoByDeviceName() _output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/manager.go:1311 (PC: 0x1fc7180)
Values returned:
~r1: []k8s.io/kubernetes/vendor/github.com/google/cadvisor/info/v2.FsInfo len: 2, cap: 2, [
{
Timestamp: (*time.Time)(0xc0002262d0),
Device: "/dev/sda1",
Mountpoint: "/",
Capacity: 75150372864,
Available: 36613033984,
Usage: 38537338880,
Labels: []string len: 2, cap: 2, [
"docker-images",
"root",
],
Inodes: *36699584,
InodesFree: *35850609,},
{
Timestamp: (*time.Time)(0xc000226348),
Device: "tmpfs",
Mountpoint: "/dev/shm",
Capacity: 1986203648,
Available: 1986203648,
Usage: 0,
Labels: []string len: 0, cap: 0, [],
Inodes: *484913,
InodesFree: *484912,},
]
~r2: error nil

容量这部分我现场通过特性 --feature-gates=LocalStorageCapacityIsolation=false 后删掉 node restart 后 describe 看不到 ephemeral-storage 了,但是还是问题还在,看了下源码,这个容量大小是 vendor/github.com/google/cadvisor/container/docker 下从 docker 获取的,嵌套的 interface 太多了,查看麻烦。现场是已经重启过机器了,docker 我重启和查看日志也没啥有用的地方。

ephemeral-storage 这个 limit 是 1.15 alpha 的,暂时不想折腾了。 尝试换下 kubelet 的 root 目录。

$ systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
...
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/data/kube/bin/kubelet \
...

主要修改 WorkingDirectory 和给 kubelet 增加参数 --root-dir 以及 --docker-root ,现场 /data 是单独分区的,切到 /data/kube/kubelet 下,--docker-root 则是 docker 的 data-root:

$ vi /etc/systemd/system/kubelet.service
...
WorkingDirectory=/data/kube/kubelet
ExecStart=/data/kube/bin/kubelet \
--root-dir=/data/kube/kubelet \
--docker-root=/data/kube/docker \
systemctl daemon-reload
systemctl restart kubelet

问题解决。后面才发现 /var 是单独分区的,客户现场动过分区表,之前是 /var 没有单独分区,后面他们创建了个 lv 并写在 /etc/fstab 里,并没有挂载和重启。一周前他们重启了下,而且有一些服务在 /var/log 输出日志,所以造成了这次故障。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK