Why strace doesn't work in Docker

While editing the capabilities page of the how containers work zine, I found myself trying to explain why strace doesn’t work in a Docker container.

The problem here is – if you run strace in a Docker container, this happens:

$ docker run  -it ubuntu:18.04 /bin/bash
$ # ... install strace ...
<a href="/cdn-cgi/l/email-protection" data-cfemail="77051818033712454011424e4313164f4047">[email protected]</a>:/# strace ls
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted

strace works using the ptrace system call, so if ptrace isn’t allowed, it’s definitely not gonna work! This is pretty easy to fix – on my machine, this fixes it:

docker run --cap-add=SYS_PTRACE  -it ubuntu:18.04 /bin/bash

But I wasn’t interested in fixing it, I wanted to know why it happens. So why does strace not work, and why does --cap-add=SYS_PTRACE fix it?

hypothesis 1: container processes are missing the `CAP_SYS_PTRACE` capability

I always thought the reason was that Docker container processes by default didn’t have the CAP_SYS_PTRACE capability. This is consistent with it being fixed by --cap-add=SYS_PTRACE , right?

But this actually doesn’t make sense for 2 reasons.

Reason 1: Experimentally, as a regular user, I can strace on any process run by my user. But if I check if my current process has the CAP_SYS_PTRACE capability, I don’t:

$ getpcaps $$
Capabilities for `11589': =

Reason 2: man capabilities says this about CAP_SYS_PTRACE :

CAP_SYS_PTRACE
       * Trace arbitrary processes using ptrace(2);

So the point of CAP_SYS_PTRACE is to let you ptrace arbitrary processes owned by any user, the way that root usually can. You shouldn’t need it to just ptrace a regular process owned by your user.

And I tested this a third way – I ran a Docker container with docker run --cap-add=SYS_PTRACE -it ubuntu:18.04 /bin/bash , dropped the CAP_SYS_PTRACE capability, and I could still strace processes even though I didn’t have that capability anymore. What? Why?

hypothesis 2: something about user namespaces???

My next (much less well-founded) hypothesis was something along the lines of “um, maybe the process is in a different user namespace and strace doesn’t work because of… reasons?” This isn’t really coherent but here’s what happened when I looked into it.

Is the container process in a different user namespace? Well, in the container:

<a href="/cdn-cgi/l/email-protection" data-cfemail="46342929320623747120737f7222277e7176">[email protected]</a>:/# ls /proc/$$/ns/user -l
... /proc/1/ns/user -> 'user:[4026531837]'

On the host:

<a href="/cdn-cgi/l/email-protection" data-cfemail="bcded3ced7fcd7d5cbd5">[email protected]</a>:~$ ls /proc/$$/ns/user -l
... /proc/12177/ns/user -> 'user:[4026531837]'

Because the user namespace ID ( 4026531837 ) is the same, the root user in the container is the exact same user as the root user on the host. So there’s definitely no reason it shouldn’t be able to strace processes that it created!

This hypothesis doesn’t make much sense but I hadn’t realized that the root user in a Docker container is the same as the root user on the host, so I thought that was interesting.

hypothesis 3: the ptrace system call is being blocked by a seccomp-bpf rule

I also knew that Docker uses seccomp-bpf to stop container processes from running a lot of system calls. And ptrace is in the list of system calls blocked by Docker’s default seccomp profile ! (actually the list of allowed system calls is a whitelist, so it’s just that ptrace is not in the default whitelist. But it comes out to the same thing.)

That easily explains why strace wouldn’t work in a Docker container – if the ptrace system call is totally blocked, then of course you can’t call it at all and strace would fail.

Let’s verify this hypothesis – if we disable all seccomp rules, can we strace in a Docker container?

$ docker run --security-opt seccomp=unconfined -it ubuntu:18.04  /bin/bash
$ strace ls
execve("/bin/ls", ["ls"], 0x7ffc69a65580 /* 8 vars */) = 0
... it works fine ...

Yes! It works! Great. Mystery solved, except…

why does `--cap-add=SYS_PTRACE` fix the problem?

What we still haven’t explained is: why does --cap-add=SYS_PTRACE would fix the problem?

The man page for docker run explains the --cap-add argument

hypothesis 1: container processes are missing the `CAP_SYS_PTRACE` capability

hypothesis 2: something about user namespaces???

hypothesis 3: the ptrace system call is being blocked by a seccomp-bpf rule

why does `--cap-add=SYS_PTRACE` fix the problem?

Recommend

次世代通讯产品「画音」面世，带来了有趣的新思路

因远程协作大火的Figma，会取代Sketch的地位吗？

History of Auto Layout constraints

Which Face is Real?

Google 员工在 6 月之前不会重返办公室

Lyft宣布裁员近千人：在员工总数中占17%

那天看到一个名词， java6 程序员

现在市面上除了百度网盘以外，还有哪些良心网盘值得推荐使用？一张图告诉你百度网盘与...

求大佬别攻击萌搜 mengso.com 了

Using Reinforcement Learning in the Algorithmic Trading Problem

About Joyk

Why strace doesn&#39;t work in Docker

hypothesis 1: container processes are missing the CAP_SYS_PTRACE capability

hypothesis 2: something about user namespaces???

hypothesis 3: the ptrace system call is being blocked by a seccomp-bpf rule

why does --cap-add=SYS_PTRACE fix the problem?

Recommend

About Joyk

Why strace doesn't work in Docker

hypothesis 1: container processes are missing the `CAP_SYS_PTRACE` capability

why does `--cap-add=SYS_PTRACE` fix the problem?