2

一次虚拟内存排查经历

 1 year ago
source link: https://blog.gotocoding.com/archives/1821?amp%3Butm_medium=rss&%3Butm_campaign=%25e4%25b8%2580%25e6%25ac%25a1%25e8%2599%259a%25e6%258b%259f%25e5%2586%2585%25e5%25ad%2598%25e6%258e%2592%25e6%259f%25a5%25e7%25bb%258f%25e5%258e%2586
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

一次虚拟内存排查经历

事情的起因是这样的,我最近在给silly增加prometheus数据库支持。

在测试过程中发现,在docker中,silly刚起动就占了将近110MiB虚拟内存。

我将相同的代码在宿主机直接直行,虚拟内存只有48.32MiB。

与此同时, silly暴露给prometheus数据库的指标显示,应用程序分配了3.7MiB内存,而jemalloc一共给应用程序分配了9.38 MiB内存。

我打算先来看看这48.32MiB内存是不是合理的。

通过cat /proc/[pid]/smaps查看了一下虚拟内存的大致分配,下面是一些比较大的匿名内存段。

7ffff587e000-7ffff607e000 rw-p 00000000 00:00 0
Size:               8192 kB
...
7ffff607f000-7ffff687f000 rw-p 00000000 00:00 0
Size:               8192 kB
...
7ffff6880000-7ffff7080000 rw-p 00000000 00:00 0
Size:               8192 kB
...
7ffff7080000-7ffff7c00000 rw-p 00000000 00:00 0
Size:              11776 kB

其中三个8192KiB内存块肯定是三个线程栈所分配的虚拟内存,一共24MiB。

而11776KiB这块内存大小和jemalloc向应用程序分配的内存大小非常接近(稍大一些), 有理由怀疑这块内存其实就是jemalloc所使用的虚拟内存大小,毕竟jemalloc本身的元数据也是需要一些内存消耗的。

为了确认这一怀疑,我使用strace -f -k -e mmap ./silly/silly server-src/config来确认这一点(事实上这一步我遇到了困境,因为我没有加-f标志,导致strace不能跟踪所有线程的系统调用,以致于浪费了很多时间)。

下面是截取的一段jemalloc相关的mmap系统调用,可以看到这块内存刚好就是jemalloc所用掉的。

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff7fcf000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_boot+0x1c7) [0xb0877]
 > /home/silly/silly(malloc_init_hard_a0_locked+0x12d) [0x4f20d]
 > /home/silly/silly(malloc_init_hard+0x79) [0x4f4d9]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7fcf000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_boot+0x2b7) [0xb0967]
 > /home/silly/silly(malloc_init_hard_a0_locked+0x12d) [0x4f20d]
 > /home/silly/silly(malloc_init_hard+0x79) [0x4f4d9]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7a49000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_map+0x4c) [0xb00cc]
 > /home/silly/silly(je_extent_alloc_mmap+0x14) [0xa56c4]
 > /home/silly/silly(base_block_alloc.isra.21+0x281) [0x63681]
 > /home/silly/silly(je_base_new+0x70) [0x64030]
 > /home/silly/silly(je_base_boot+0x17) [0x64b97]
 > /home/silly/silly(malloc_init_hard_a0_locked+0x138) [0x4f218]
 > /home/silly/silly(malloc_init_hard+0x79) [0x4f4d9]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 4190208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff784a000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_map+0x163) [0xb01e3]
 > /home/silly/silly(je_extent_alloc_mmap+0x14) [0xa56c4]
 > /home/silly/silly(base_block_alloc.isra.21+0x281) [0x63681]
 > /home/silly/silly(je_base_new+0x70) [0x64030]
 > /home/silly/silly(je_base_boot+0x17) [0x64b97]
 > /home/silly/silly(malloc_init_hard_a0_locked+0x138) [0x4f218]
 > /home/silly/silly(malloc_init_hard+0x79) [0x4f4d9]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7800000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_map+0x4c) [0xb00cc]
 > /home/silly/silly(je_extent_alloc_mmap+0x14) [0xa56c4]
 > /home/silly/silly(je_ehooks_default_alloc_impl+0xd3) [0x9de03]
 > /home/silly/silly(je_ecache_alloc_grow+0x9b9) [0xa5039]
 > /home/silly/silly(pac_alloc_real+0x93) [0xaf3e3]
 > /home/silly/silly(pac_alloc_impl+0xe5) [0xaf525]
 > /home/silly/silly(je_pa_alloc+0x57) [0xae547]
 > /home/silly/silly(je_arena_extent_alloc_large+0x9f) [0x5b56f]
 > /home/silly/silly(je_large_palloc+0xcb) [0xac53b]
 > /home/silly/silly(je_arena_palloc+0xd5) [0x5ede5]
 > /home/silly/silly(je_tsd_tcache_data_init+0xe2) [0xbc2a2]
 > /home/silly/silly(je_tsd_tcache_enabled_data_init+0x28) [0xbc948]
 > /home/silly/silly(je_tsd_fetch_slow+0x114) [0xbe3d4]
 > /home/silly/silly(malloc_init_hard+0x9b) [0x4f4fb]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7400000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_map+0x4c) [0xb00cc]
 > /home/silly/silly(je_extent_alloc_mmap+0x14) [0xa56c4]
 > /home/silly/silly(base_block_alloc.isra.21+0x281) [0x63681]
 > /home/silly/silly(je_base_alloc+0x269) [0x644d9]
 > /home/silly/silly(je_rtree_leaf_elm_lookup_hard+0x100) [0xb5280]
 > /home/silly/silly(je_emap_register_boundary+0x357) [0x9ea17]
 > /home/silly/silly(je_ecache_alloc_grow+0x60b) [0xa4c8b]
 > /home/silly/silly(pac_alloc_real+0x93) [0xaf3e3]
 > /home/silly/silly(pac_alloc_impl+0xe5) [0xaf525]
 > /home/silly/silly(je_pa_alloc+0x57) [0xae547]
 > /home/silly/silly(je_arena_extent_alloc_large+0x9f) [0x5b56f]
 > /home/silly/silly(je_large_palloc+0xcb) [0xac53b]
 > /home/silly/silly(je_arena_palloc+0xd5) [0x5ede5]
 > /home/silly/silly(je_tsd_tcache_data_init+0xe2) [0xbc2a2]
 > /home/silly/silly(je_tsd_tcache_enabled_data_init+0x28) [0xbc948]
 > /home/silly/silly(je_tsd_fetch_slow+0x114) [0xbe3d4]
 > /home/silly/silly(malloc_init_hard+0x9b) [0x4f4fb]
 > /home/silly/silly(__libc_csu_init+0x45) [0xc3895]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0x7a) [0x2402a]
 > /home/silly/silly(_start+0x2a) [0x1dcea]
mmap(NULL, 3670016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7080000
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /home/silly/silly(je_pages_map+0x4c) [0xb00cc]
 > /home/silly/silly(je_extent_alloc_mmap+0x14) [0xa56c4]
 > /home/silly/silly(je_ehooks_default_alloc_impl+0xd3) [0x9de03]
 > /home/silly/silly(je_ecache_alloc_grow+0x9b9) [0xa5039]
 > /home/silly/silly(pac_alloc_real+0x93) [0xaf3e3]
 > /home/silly/silly(pac_alloc_impl+0xe5) [0xaf525]
 > /home/silly/silly(je_pa_alloc+0x57) [0xae547]
 > /home/silly/silly(je_arena_extent_alloc_large+0x9f) [0x5b56f]
 > /home/silly/silly(je_large_palloc+0xcb) [0xac53b]
 > /home/silly/silly(je_arena_malloc_hard+0x50d) [0x5ec0d]
 > /home/silly/silly(je_malloc_default+0x5a1) [0x52371]
 > /home/silly/silly(silly_malloc+0x6) [0x23356]
 > /home/silly/silly(silly_socket_init+0x57) [0x215e7]
 > /home/silly/silly(silly_run+0x88) [0x22cb8]
 > /home/silly/silly(main+0x468) [0x1d888]
 > /usr/lib/x86_64-linux-gnu/libc-2.28.so(__libc_start_main+0xeb) [0x2409b]
 > /home/silly/silly(_start+0x2a) [0x1dcea]

有了宿主机的这些经验后,再去查看docker中那110MiB内存来源时就轻车熟路了。

首先使用docker run --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it --entrypoint /bin/bash registry.cn-hangzhou.aliyuncs.com/findstr-vps/xxx来强行改写entrypoint为bash和添加strace的能力,之后所有的分析都和宿主机一样了。

通过cat /proc/[pid]/smaps发现了一大块内存一个字节都没有使用,他很可疑.

7ffff0021000-7ffff4000000 ---p 00000000 00:00 0
Size:              65404 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    1
VmFlags: mr mw me nr sd

再使用strace -f -k -e mmap ./silly/silly server-conf/config, 可以找到相应的callstack如下:

[pid    34] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fffed87d000
 > /lib/x86_64-linux-gnu/libc-2.28.so(mmap64+0x43) [0xf3cd3]
 > /lib/x86_64-linux-gnu/libc-2.28.so(_IO_str_seekoff+0xd87) [0x80127]
 > /lib/x86_64-linux-gnu/libc-2.28.so(_IO_str_seekoff+0x157c) [0x8091c]
 > /lib/x86_64-linux-gnu/libc-2.28.so(_IO_str_seekoff+0x44ad) [0x8384d]
 > /lib/x86_64-linux-gnu/libc-2.28.so(__libc_malloc+0x116) [0x84626]
 > /lib/x86_64-linux-gnu/libc-2.28.so(fgets+0x1bb) [0x7022b]
 > /silly/silly(luaL_loadfilex+0x4d) [0x3a35d]
 > /silly/silly(silly_worker_start+0x19b) [0x2200b]
 > /silly/silly(thread_worker+0xb) [0x228fb]
 > /lib/x86_64-linux-gnu/libpthread-2.28.so(start_thread+0xf3) [0x7fa3]
 > /lib/x86_64-linux-gnu/libc-2.28.so(clone+0x3f) [0xf906f]

可以看到是libc中的fgets函数导致的,当我使用gdb断这个函数时,相同的系统调用又会出现在别的线程中调用的glibc函数中。

我怀疑这是glibc的一个bug。报着试试看的态度google了一下,没想到这是一个Feature(glibc 2.10+版本的多线程程序,glibc会预分配很多虚拟内存, 用来提高性能)。

但是奇怪的时,我的宿主机和docker的glibc的版本都是2.28, 宿主机却没有虚拟内存问题。

ldd (Debian GLIBC 2.28-10+deb10u2) 2.28
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

7月30号补充:

在分析过程中,我还发现一个奇怪的现象。

mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7a49000 产出了块大内存,但是从smaps中发现,这块内存包含了文件/home/silly/silly/luaclib/sys.so的内存映射, 截取部分smaps如下:

7ffff7080000-7ffff7c00000 rw-p 00000000 00:00 0
Size:              11776 kB
7ffff7c23000-7ffff7c27000 r--p 00000000 08:00 253822                     /home/silly/silly/luaclib/sys.so
Size:                 16 kB
7ffff7c27000-7ffff7c3d000 r-xp 00004000 08:00 253822                     /home/silly/silly/luaclib/sys.so
Size:                 88 kB
7ffff7c3d000-7ffff7c40000 r--p 0001a000 08:00 253822                     /home/silly/silly/luaclib/sys.so
Size:                 12 kB
7ffff7c40000-7ffff7c41000 r--p 0001c000 08:00 253822                     /home/silly/silly/luaclib/sys.so
Size:                  4 kB
7ffff7c41000-7ffff7c42000 rw-p 0001d000 08:00 253822                     /home/silly/silly/luaclib/sys.so
Size:                  4 kB
7ffff7c42000-7ffff7c4e000 rw-p 00000000 00:00 0
Size:                 48 kB

理论上这是不可能的。我想了很久,觉得只有一种可能,那就是刚mmap之后,程序就使用munmap释放掉了这块地址空间。然后刚好这块地址空间就被OS拿来做sys.so文件的映射了。

为了验证这一猜想,我使用strace -f -e mmap,munmap ./silly/silly server-src/config确认了一下,果然就是这样,下面的strace的部分输出。

mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7ffff7a49000
munmap(0x7ffff7a49000, 2097152)         = 0

jemalloc在mmap分配了一块内存后,随即就使用munmap给释放了。

这也给我一个提示, 当我们监控内存分配时,只监控mmap是不够用。至少需要同时监控mmap,munmap一起配合分析才行。如果有必要我们还需要监控brk系统调用。

ps. 由于linux会地址随机化,可能会导致多出来很多匿名内存段,不便于排查问题。可以使用echo 0 > /proc/sys/kernel/randomize_va_space来临时关闭内存地址随机化。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK