Posted 2 days agoUpdated 15 hours agooperation6 minutes read (About 860 words)

How to debug Kubernetes OOMKilled when the process is not using memory directly

We investigated the memory increase problem some time ago and learned a lot about JVM metrics. This happened again, we noticed several Java applications deployed in Kubernetes got the memory usage increasing gradually until it reached the memory limit, even after several times of increasing the memory limit, the usage can always hit above 90%, sometimes the container will be OOMKilled.

A normal process of investigating Java memory

We followed the way we did last time to analyze the memory usage,

Some figures first: container’s memory limit (12 Gi); container’s memory usage (11 Gi)

check the JVM memory usage
We checked the Java process memory usage (3 Gi) and it was way lower than the app container memory usage (11 Gi)
The Java process was the main process running in the container, no other processes were consuming memory.
native memory tracking
We thought NMT can help us find some native memory leak, so we enabled the native memory tracking and checked different regions, all looked normal.

emmm, what do we miss?

Start from the beginning

Which memory are we talking about

Kubernetes will kill the container when it runs out of its memory limit, the metrics it uses are container_memory_working_set_bytes & container_memory_rss , the container will be killed if either of them exceeds the memory limit.

What’s in it

According to the metric collector cadvisor,

cadvisor fetches this data from the cgroup memory stats in each container’s /sys/fs/cgroup/memory folder, the lwn.net explains this data well.

memory.usage_in_bytes		 # show current memory(RSS+Cache) usage.
memory.memsw.usage_in_bytes	 # show current memory+Swap usage
memory.limit_in_bytes		 # set/show limit of memory usage
memory.memsw.limit_in_bytes	 # set/show limit of memory+Swap usage
memory.failcnt			 # show the number of memory usage hits limits
memory.memsw.failcnt		 # show the number of memory+Swap hits limits
memory.max_usage_in_bytes	 # show max memory usage recorded
memory.memsw.usage_in_bytes	 # show max memory+Swap usage recorded
memory.soft_limit_in_bytes	 # set/show soft limit of memory usage
memory.stat			 # show various statistics
memory.use_hierarchy		 # set/show hierarchical account enabled
memory.force_empty		 # trigger forced move charge to parent
memory.swappiness		 # set/show swappiness parameter of vmscan
...

Based on this, the working_set_bytes contains the page cache and memory_rss, we went to the container and printed the memory stats.

bash-4.2$ cat /sys/fs/cgroup/memory/memory.stat 
cache 8815085056 # of bytes of page cache memory.
rss 2360238080   # of bytes of anonymous and swap cache memory.
rss_huge 0
shmem 0
mapped_file 540672
dirty 0
writeback 2162688
swap 0
pgpgin 6545913
pgpgout 5526026
pgfault 1145124816
pgmajfault 0
inactive_anon 0
total_inactive_file 484167680
...

The page cache (cache) consumed almost 9 Gi memory, after excluding the total_inactive_file (~480Mi), it’s above 8 Gi.

Page cache is allocated by the operating system to improve the performance of disk I/O, after some investigation, we found we had a big file written by the app without file rotation, at that moment, it reached 100Gi.

We truncated that file and the page cache dropped down to tens of megabytes.

A thorough check routine

This is the complete memory layout we have now, based on this, a thorough check routine will be

Container memory layout

Find a pod with the issue, get the metrics
- memory_usage_bytes
- working_set_bytes
- memory_rss_bytes
Check if the file cache (working_set_bytes - memory_rss_bytes) is high
working_set_bytes - memory_rss_bytes is the active page cache size, if it’s above hundreds of MBs or several GBs, it means I/O is quite heavy and OS improves it by caching file. Sometimes, it’s reasonable but usually you need to check if it’s what you expect.
Check if the rss is equal to memory usage
If so, check the application metrics instead, JVM metrics, Golang metrics etc.
Otherwise, things are interesting again …

Conclusion

Now we know page cache can be an important contributor to the memory increase, therefore we need to monitor the page cache size.

In Cadvisor, it’s container_memory_working_set_bytes - container_memory_rss, when the application is I/O intensive, the page cache can be high because the OS tries to improve the I/O efficiency, but for CPU intensive applications, take care of those unnecessary page cache.

How to debug Kubernetes OOMKilled when the process is not using memory directly

How to debug Kubernetes OOMKilled when the process is not using memory directly

A normal process of investigating Java memory

Start from the beginning

Which memory are we talking about

What’s in it

A thorough check routine

Conclusion

Recommend

#创作者激励# OpenHarmony关系型数据库封装前的知识要点-开源基础软件社区-51CTO.COM

假如你是饼干店老板，如何用互联网思维卖饼干？

印度Paytm：获准延期申请支付聚合许可证

New low-calorie rice could help cut rising obesity rates

弗拉基米尔·弗拉基米罗维奇·普京

詹姆斯·卡拉汉

OpenAI宣布ChatGPT支持第三方插件，不仅是可以联网搜索这么简单

做完GPT-4完整测评，微软爆火论文称初版AGI就快来了

黄油相机退出编辑页面的确认操作竟然在这里？ - 优设网 - 学设计上优设

瓷颜茶语丨钱塘瓷画五杰的“桃花源”，盛大开幕！

About Joyk