2

cri-dockerd 无法拉取需认证仓库上的 pause 镜像解决

 2 months ago
source link: https://zhangguanzhang.github.io/2024/04/11/cri-docker-sandbox-image/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

cri-dockerd 无法拉取需认证仓库上的 pause 镜像解决



字数统计: 1.3k阅读时长: 6 min
 2024/04/11  184  Share

私有化下,cri-dockerd Pulling the image without credentials. Image: reg.xxx.lan:5000/xxx/pause:3.9

私有化下,环境都会部署一个内网仓库镜像,然后发现某天客户环境的 pod 无法拉起来,发现是镜像 gc 后,cri-dockerd 的 pause 镜像无法拉取了,手动拉取没问题的。

之前遇到过,但是当时比较忙,今天有空看下。

cri-dockerd 版本无关,参考官方文档使用 systemd 部署:

$ systemctl cat --no-pager cri-dockerd
...
[Service]
Type=notify
ExecStart=/data/kube/bin/cri-dockerd \
--container-runtime-endpoint unix:///var/run/cri-dockerd.sock \
--network-plugin=cni \
--streaming-bind-addr=127.0.0.1 \
--cni-bin-dir=/data/kube/bin/ \
--pod-infra-container-image=reg.xxx.lan:5000/xxx/pause:3.9
...

报错信息为:

$ journalctl -xe -u cri-dockerd
Apr 11 15:11:48 xxx cri-dockerd[5894]: level=info msg="Pulling the image without credentials. Image: reg.xxx.lan:5000/xxx/pause:3.9"
Apr 11 15:12:14 xxx cri-dockerd[5894]: level=info msg="Pulling the image without credentials. Image: reg.xxx.lan:5000/xxx/pause:3.9"
Apr 11 15:13:11 xxx cri-dockerd[5894]: level=info msg="Pulling the image without credentials. Image: reg.xxx.lan:5000/xxx/pause:3.9"

查看源码逻辑

根据日志关键字,找到是如下函数

// https://github.com/Mirantis/cri-dockerd/blob/b138f5226ae901b99ea34d40ab1eaed1c26445a4/core/sandbox_helpers.go#L408-L448
func ensureSandboxImageExists(client libdocker.DockerClientInterface, image string) error {
_, err := client.InspectImageByRef(image)
if err == nil {
return nil
}
if !libdocker.IsImageNotFoundError(err) {
return fmt.Errorf("failed to inspect sandbox image %q: %v", image, err)
}

repoToPull, _, _, err := utils.ParseImageName(image)
if err != nil {
return err
}

keyring := credentialprovider.NewDockerKeyring()
creds, withCredentials := keyring.Lookup(repoToPull)
if !withCredentials {
logrus.Infof("Pulling the image without credentials. Image: %v", image)

err := client.PullImage(image, dockerregistry.AuthConfig{}, dockertypes.ImagePullOptions{})
if err != nil {
return fmt.Errorf("failed pulling image %q: %v", image, err)
}

return nil
}

var pullErrs []error
for _, currentCreds := range creds {
authConfig := dockerregistry.AuthConfig(currentCreds)
err := client.PullImage(image, authConfig, dockertypes.ImagePullOptions{})
// If there was no error, return success
if err == nil {
return nil
}

pullErrs = append(pullErrs, err)
}

return errors.NewAggregate(pullErrs)
}

按照 credentialprovider.NewDockerKeyring() 往下找,发现最终是在 ./vendor/k8s.io/kubernetes/pkg/credentialprovider/ 下的逻辑:

// https://github.com/Mirantis/cri-dockerd/blob/b138f5226ae901b99ea34d40ab1eaed1c26445a4/vendor/k8s.io/kubernetes/pkg/credentialprovider/provider.go#L46-L 52
func init() {
RegisterCredentialProvider(".dockercfg",
&CachingDockerConfigProvider{
Provider: &defaultDockerConfigProvider{},
Lifetime: 5 * time.Minute,
})
}

上面的 CachingDockerConfigProvider 是定义一个间隔时间读取文件的 provider,读取文件的逻辑在:

// https://github.com/Mirantis/cri-dockerd/blob/b138f5226ae901b99ea34d40ab1eaed1c26445a4/vendor/k8s.io/kubernetes/pkg/credentialprovider/provider.go#L77C1-L85C2
func (d *defaultDockerConfigProvider) Provide(image string) DockerConfig {
// Read the standard Docker credentials from .dockercfg
if cfg, err := ReadDockerConfigFile(); err == nil {
return cfg
} else if !os.IsNotExist(err) {
klog.V(2).Infof("Docker config file not found: %v", err)
}
return DockerConfig{}
}

一路跳转,到文件 ./vendor/k8s.io/kubernetes/pkg/credentialprovider/config.go 里的:


var (
preferredPathLock sync.Mutex
preferredPath = ""
workingDirPath = ""
homeDirPath, _ = os.UserHomeDir()
rootDirPath = "/"
homeJSONDirPath = filepath.Join(homeDirPath, ".docker")
rootJSONDirPath = filepath.Join(rootDirPath, ".docker")

configFileName = ".dockercfg"
configJSONFileName = "config.json"
)

...

func DefaultDockercfgPaths() []string {
return []string{GetPreferredDockercfgPath(), workingDirPath, homeDirPath, rootDirPath}
}

func ReadDockercfgFile(searchPaths []string) (cfg DockerConfig, err error) {
if len(searchPaths) == 0 {
searchPaths = DefaultDockercfgPaths()
}

for _, configPath := range searchPaths {
...

查找目录逻辑也没问题,cri-dockerd 是 root 运行的,/root/.docker/config.json 里有的,也没其他特殊权限啥的。

手动拉取没问题的,所以主要逻辑是为啥进程没读取到 /root/.docker/config.json,然后下载源码后 dlv 调试下:

$ systemctl stop cri-docker kubelet
$ docker rmi -f reg.xxx.lan:5000/xxx/pause:3.9
$ reboot
# 恢复没有拉取的环境情况再 debug
$ dlv debug main.go -- --container-runtime-endpoint unix:///var/run/cri-dockerd.sock \
--network-plugin=cni \
--streaming-bind-addr=127.0.0.1 \
--cni-bin-dir=/data/kube/bin/ \
--pod-infra-container-image=reg.xxx.lan:5000/xxx/pause:3.9

最后发现代码逻辑没走到 if !withCredentials { ,就很奇怪,然后自己编译一个替换启动后发现也能复现,就打算 dlv attach 看下:

$ go build  -gcflags="all=-N -l"  -o cri-dockerd
$ systemctl stop cri-dockerd
$ \cp cri-dockerd /data/kube/bin/cri-dockerd
$ systemctl start cri-dockerd
$ dlv attach $(pgrep cri-dockerd)

打了三个断点后 continue ,发现 DefaultDockercfgPaths() 返回的四个查找路径值不对:

(dlv) c
> k8s.io/kubernetes/pkg/credentialprovider.ReadDockerConfigJSONFile() ./vendor/k8s.io/kubernetes/pkg/credentialprovider/config.go:138 (hits goroutine(677):1 total:1) (PC: 0x2285942)
133: // if searchPaths is empty, the default paths are used.
134: func ReadDockerConfigJSONFile(searchPaths []string) (cfg DockerConfig, err error) {
135: if len(searchPaths) == 0 {
136: searchPaths = DefaultDockerConfigJSONPaths()
137: }
=> 138: for _, configPath := range searchPaths {
139: absDockerConfigFileLocation, err := filepath.Abs(filepath.Join(configPath, configJSONFileName))
140: if err != nil {
141: klog.Errorf("while trying to canonicalize %s: %v", configPath, err)
142: continue
143: }
(dlv) p searchPaths
[]string len: 4, cap: 4, [
"",
"",
".docker",
"/.docker",
]

看了下 homeJSONDirPath 发现也不对

(dlv) p homeJSONDirPath
".docker"

代码里它的值来源是:

homeDirPath, _    = os.UserHomeDir()
...
homeJSONDirPath = filepath.Join(homeDirPath, ".docker")

调用下 os.UserHomeDir() 看看:

(dlv) call os.UserHomeDir()
> k8s.io/kubernetes/pkg/credentialprovider.ReadDockerConfigJSONFile() ./vendor/k8s.io/kubernetes/pkg/credentialprovider/config.go:138 (PC: 0x2285942)
Values returned:
~r0: ""
~r1: error(*errors.errorString) *{
s: "$HOME is not defined",}

居然没有 HOME 变量,从 procfs 看看启动时候的 env:

$ xargs -0 -n1 < /proc/$(pgrep cri-dockerd)/environ
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
NOTIFY_SOCKET=/run/systemd/notify
LISTEN_PID=5894
LISTEN_FDS=1

看来 systemd 没有给配置 $HOME 变量,然后发现设置了 User 才有 HOME=/root 环境变量。

几种解决方法:

  • systemd 文件里设置 WorkingDirectory 下:
    • 直接设置为 /root
    • 拷贝一个 docker login 的 config.json 文件到进程 WorkingDirectory 下: config.json 或者 .dockercfg
  • 设置 User=root

已提交 pr 修复 Mirantis/cri-dockerd/pull/349


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK