2

开启shareProcessNamespace后容器异常

 1 year ago
source link: https://qingwave.github.io/cotainer-init/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Jul 28, 2020 · cloud

开启shareProcessNamespace后容器异常



目前 k8s 不支持容器启动顺序,部分业务通过开启shareProcessNamespace监控某些进程状态。当开启共享 pid 后,有用户反馈某个容器主进程退出,但是容器并没有重启,执行exec会卡住,现象参考issue

  1. 创建 deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      name: nginx
    spec:
      shareProcessNamespace: true
      containers:
        - image: nginx:alpine
          name: nginx
  1. 查看进程信息 由于开启了shareProcessNamespace, pause变为pid 1, nginx daemonpid 为6, ppid 为containerd-shim
# 查看容器内进程
/ # ps -efo "pid,ppid,comm,args"
PID   PPID  COMMAND          COMMAND
    1     0 pause            /pause
    6     0 nginx            nginx: master process nginx -g daemon off;
   11     6 nginx            nginx: worker process
   12     6 nginx            nginx: worker process
   13     6 nginx            nginx: worker process
   14     6 nginx            nginx: worker process
   15     0 sh               sh
   47    15 ps               ps -efo pid,ppid,comm,args
  1. 删除主进程 子进程被pid 1回收, 有时也会被containerd-shim回收
/ # kill -9 6
/ #
/ # ps -efo "pid,ppid,comm,args"
PID   PPID  COMMAND          COMMAND
    1     0 pause            /pause
   11     1 nginx            nginx: worker process
   12     1 nginx            nginx: worker process
   13     1 nginx            nginx: worker process
   14     1 nginx            nginx: worker process
   15     0 sh               sh
   48    15 ps               ps -efo pid,ppid,comm,args
  1. docker hang 此时对此容器执行 docker 命令(inspect, logs, exec)将卡住, 同样通过kubectl执行会超时。

在未开启shareProcessNamespace的容器中,主进程退出pid 1, 此 pid namespace 销毁,系统会kill其下的所有进程。开启后,pid 1pause进程,容器主进程退出,由于共享 pid namespace,其他进程没有退出变成孤儿进程。此时调用 docker 相关接口去操作容器,docker 首先去找主进程,但主进程已经不存在了,导致异常(待确认)。

清理掉这些孤儿进程容器便会正常退出,可以kill掉这些进程或者killpause 进程,即可恢复。

有没有优雅的方式解决此种问题,如果主进程退出子进程也一起退出便符合预期,这就需要进程管理工具来实现,在宿主机中有systemdgod,容器中也有类似的工具即init进程(传递信息,回收子进程),常见的有

  1. docker init, docker 自带的 init 进程(即tini)
  2. tini, 可回收孤儿进程/僵尸进程,kill进程组等
  3. dumb-init, 可管理进程,重写信号等

经过测试,tini进程只能回收前台程序,对于后台程序则无能为力(例如nohup, &启动的程序),dumb-init在主进程退出时,会传递信号给子进程,符合预期。

开启dumb-init进程的dockerfile如下,tini也类似

FROM nginx:alpine

# tini
# RUN apk add --no-cache tini
# ENTRYPOINT ["/sbin/tini", "-s", "-g", "--"]

# dumb-init
RUN wget -O /usr/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_amd64
RUN chmod +x /usr/bin/dumb-init
ENTRYPOINT ["/usr/bin/dumb-init", "-v", "--"]

CMD ["nginx", "-g", "daemon off;"]

init 方式对于此问题是一种临时的解决方案,需要 docker 从根本上解决此种情况。容器推荐单进程运行,但某些情况必须要运行多进程,如果不想处理处理传递回收进程等,可以通过init进程,无需更改代码即可实现。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK