1

Android OOM 问题探究 -- 从入门到放弃 - 二的次方

 1 year ago
source link: https://www.cnblogs.com/roger-yu/p/16599262.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

最近客户反馈了一些OOM的问题,很早之前自己也有简单了解过OOM的知识,但时间久远,很多东西都记不清了。

现在遇到这个OOM问题,也即趁此搜索了一些资料,对OOM问题做一些探究,把资料记录于此,一遍后续查阅。本文内容大量借鉴参考了网络上经典文章的内容,站在巨人的肩膀上登高望远!

注:以下分析基于 Android R source

二、OOM问题的可能原因

网络上可以搜索到很多的解释,都很详细,我在此也做一个简单的总结,当然可能不全面,仅供学习参考

979092-20220819194409002-282089665.png

Android系统中,OutOfMemoryError这个错误是怎么被系统抛出的?在代码进行搜索可以看到

979092-20220819120955567-1338459460.png

重点关注下面两点

✔️ 堆内存分配失败时的OOM  ==   /art/runtime/gc/heap.cc

✔️ 创建线程失败时的OOM     ==   /art/runtime/thread.cc

三、OOM -- 堆内存分配失败

在source code中我们可以看到,当堆内存分配失败时,会抛出一些典型的log,如下代码

void Heap::ThrowOutOfMemoryError(Thread* self, size_t byte_count, AllocatorType allocator_type) {
  ...
  std::ostringstream oss;
  size_t total_bytes_free = GetFreeMemory();
  oss << "Failed to allocate a " << byte_count << " byte allocation with " << total_bytes_free
      << " free bytes and " << PrettySize(GetFreeMemoryUntilOOME()) << " until OOM,"
      << " target footprint " << target_footprint_.load(std::memory_order_relaxed)
      << ", growth limit "
      << growth_limit_;
  // If the allocation failed due to fragmentation, print out the largest continuous allocation.
  ...
}

在出现OOM问题时,logcat中应该会看到类似下面的信息输出

08-19 11:34:53.860 28028 28028 E AndroidRuntime: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728

上面这段logcat的大概解释:想要去分配 20971536 bytes的heap memory,但时app剩余可用的free heap只有6147912 bytes,而且当前app最大可分配的heap是134217728 bytes

堆内存分配失败的原因可以分两种情况:1. 超过APP进程的heap内存上限 与 2. 没有足够大小的连续地址空间

3.1 超过APP进程的内存上限

Android设备上java虚拟机对单个应用的最大内存分配做了约束,超出这个值就会OOM。由Runtime.getRuntime.MaxMemory()可以得到Android中每个进程被系统分配的内存上限,当进程占用内存达到这个上限时就会发生OOM,这也是Android中最常见的OOM类型。


Android系统有如下约定:

  • /vendor/build.prop有定义属性值来对单个应用的最大内存分配做约束

dalvik.vm.heapgrowthlimit 常规app使用的参数

dalvik.vm.heapsize 应用在AndroidManifest.xml设置了android:largeHeap="true",将会变成大应用的设置

 979092-20220819132351137-750713830.png

  • 代码中也可以使用如下API来获取内存限制的信息

ActivityManager.getMemoryClass() 常规app最大可用的堆内存,对应 dalvik.vm.heapgrowthlimit;

ActivityManager.getLargeMemoryClass()应用在AndroidManifest.xml设置了android:largeHeap="true",将会变成大应用时最大可用的堆内存,对应dalvik.vm.heapsize;

Runtime.getRuntime().maxMemory()  可以得到Android中每个进程被系统分配的内存上限,等于上面的两个值之一;


如下是一段简单的代码来演示这种类型的OOM:

private void testOOMCreatHeap(Context context) {
    ActivityManager activityManager =(ActivityManager)context.getSystemService(Context.ACTIVITY_SERVICE);
    Log.d("OOM_TEST", "app maxMemory = " + activityManager.getMemoryClass() + "MB");
    Log.d("OOM_TEST", "large app maxMemory = " + activityManager.getLargeMemoryClass() + "MB");
    Log.d("OOM_TEST", "current app maxMemory = " + Runtime.getRuntime().maxMemory()/1024/1024 + "MB");
    List<byte[]> bytesList = new ArrayList<>();
    int count = 0;
    while (true) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Log.e("OOM-TEST", "allocate 20MB heap: " + count++ + ", total " + 20*count + "MB");
        // 每次申请20MB内存
        bytesList.add(new byte[1024 * 1024 * 20]);
    }
}

注:我的测试平台 dalvik.vm.heapgrowthlimit=128MB , dalvik.vm.heapsize=384MB

上面的测试代码中,我们每次分配20MB的内存

情况一:常规应用,不要在AndroidManifest.xml设置android:largeHeap="true",此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapgrowthlimit=128MB

看运行结果:

08-19 11:34:53.555 28028 28028 D OOM_TEST: app maxMemory = 128MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: large app maxMemory = 384MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: current app maxMemory = 128MB
08-19 11:34:53.576 28028 28028 E OOM-TEST: allocate 20MB heap: 0, total 20MB
08-19 11:34:53.596 28028 28028 E OOM-TEST: allocate 20MB heap: 1, total 40MB
08-19 11:34:53.617 28028 28028 E OOM-TEST: allocate 20MB heap: 2, total 60MB
08-19 11:34:53.637 28028 28028 E OOM-TEST: allocate 20MB heap: 3, total 80MB
08-19 11:34:53.658 28028 28028 E OOM-TEST: allocate 20MB heap: 4, total 100MB
08-19 11:34:53.678 28028 28028 E OOM-TEST: allocate 20MB heap: 5, total 120MB
08-19 11:34:53.699 28028 28028 E OOM-TEST: allocate 20MB heap: 6, total 140MB
08-19 11:34:53.699 28028 28028 I com.demo: Waiting for a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.732 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 73us total 19.225ms
08-19 11:34:53.733 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.766 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.767 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 71us total 33.715ms
08-19 11:34:53.767 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.767 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.792 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.792 28028 28028 I com.demo: Alloc concurrent copying GC freed 1046(50KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 57us total 25.120ms
08-19 11:34:53.792 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971532 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264080 kB)
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.808 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 62us total 15.229ms
08-19 11:34:53.808 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.835 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.836 28028 28028 I com.demo: Alloc concurrent copying GC freed 3(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 59us total 27.042ms
08-19 11:34:53.836 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.836 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.857 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.857 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 50us total 21.249ms
08-19 11:34:53.857 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264016 kB)
08-19 11:34:53.858 28028 28028 E InputEventSender: Exception dispatching finished signal.
08-19 11:34:53.858 28028 28028 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

最后一次请求分配heap memory时,此时因为已经分配了120+MB的内存,如果继续分配20MB显然超过了限制的128MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

情况二:常规应用,在AndroidManifest.xml设置android:largeHeap="true",此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapsize=384MB

看运行结果:

08-19 11:32:22.660 27539 27539 D OOM_TEST: app maxMemory = 128MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: large app maxMemory = 384MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: current app maxMemory = 384MB
08-19 11:32:23.048 27539 27539 E OOM-TEST: allocate 20MB heap: 18, total 380MB
08-19 11:32:23.061 27539 27553 I com.mediacodec: Clamp target GC heap from 406MB to 384MB
08-19 11:32:23.069 27539 27539 E OOM-TEST: allocate 20MB heap: 19, total 400MB
08-19 11:32:23.069 27539 27539 I com.demo: Starting a blocking GC Alloc
08-19 11:32:23.226 27539 27539 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184" (VmSize 2053220 kB)
08-19 11:32:23.226 27539 27539 E InputEventSender: Exception dispatching finished signal.
08-19 11:32:23.226 27539 27539 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

最后一次请求分配heap memory时,此时因为已经分配了380+MB的内存,如果继续分配20MB显然超过了限制的384MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

3.2 没有足够大小的连续地址空间

这种情况一般是进程中存在大量的内存碎片导致的,其堆栈信息会比第一种OOM堆栈多出一段类似如下格式的信息
:failed due to fragmentation (required continguous free “<< required_bytes << “ bytes for a new buffer where largest contiguous free ” << largest_continuous_free_pages << “ bytes)”
相关的代码在art/runtime/gc/allocator/rosalloc.cc中,如下
void RosAlloc::LogFragmentationAllocFailure(std::ostream& os, size_t failed_alloc_bytes) {
  ...
  if (required_bytes > largest_continuous_free_pages) {
    os << "; failed due to fragmentation ("
       << "required contiguous free " << required_bytes << " bytes" << new_buffer_msg
       << ", largest contiguous free " << largest_continuous_free_pages << " bytes"
       << ", total free pages " << total_free << " bytes"
       << ", space footprint " << footprint_ << " bytes"
       << ", space max capacity " << max_capacity_ << " bytes"
       << ")" << std::endl;
  }
}
这种场景比较难模拟,这里就不做演示了。

四、OOM -- 创建线程失败

Android中线程(Thread)的创建及内存分配过程分析可以参见如下这篇文章:https://blog.csdn.net/u011578734/article/details/109331764

线程创建会消耗大量的系统资源(例如内存),创建过程涉及java层和native的处理。实质工作是在native层完成的,相关代码位于 /art/runtime/thread.cc

void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
    // 此处省略一万字
    
  {
    std::string msg(child_jni_env_ext.get() == nullptr ?
        StringPrintf("Could not allocate JNI Env: %s", error_msg.c_str()) :
        StringPrintf("pthread_create (%s stack) failed: %s",
                                 PrettySize(stack_size).c_str(), strerror(pthread_create_result)));
    ScopedObjectAccess soa(env);
    soa.Self()->ThrowOutOfMemoryError(msg.c_str());
  }
    
}

大概总结如下:下图借鉴了网络上的资料(偷懒了)

979092-20220819155703952-2060822379.jpg

4.1 创建JNI Env 失败

一般有两种原因

1. FD溢出导致JNIEnv创建失败了,一般logcat中可以看到信息 Too many open files ... Could not allocate JNI Env

当进程fd数(可以通过 ls /proc/pid/fd | wc -l 获得)突破 /proc/pid/limits中规定的Max open files时,产生OOM

E/art: ashmem_create_region failed for 'indirect ref table': Too many open files java.lang.OutOfMemoryError:Could not allocate JNI Env at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:730)

2. 虚拟内存不足导致JNIEnv创建失败了,一般logcat中可以看到信息 Could not allocate JNI Env: Failed anonymous mmap

08-19 17:51:50.662  3533  3533 E OOM_TEST: create thread : 1104
08-19 17:51:50.663  3533  3533 W com.demo: Throwing OutOfMemoryError "Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log." (VmSize 2865432 kB)
08-19 17:51:50.663  3533  3533 E InputEventSender: Exception dispatching finished signal.
08-19 17:51:50.663  3533  3533 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.start(Thread.java:887)


08-19 17:51:50.671  3533  3533 E AndroidRuntime: FATAL EXCEPTION: main
08-19 17:51:50.671  3533  3533 E AndroidRuntime: Process: com.demo, PID: 3533
08-19 17:51:50.671  3533  3533 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

4.2 创建线程失败

一般有两种原因

1. 虚拟内存不足导致失败,一般logcat中可以看到信息 mapped space: Out of memory  ... pthread_create (1040KB stack) failed: Out of memory

native层通过FixStackSize设置线程栈大小,默认情况下,线程栈所需内存总大小 = 1M + 8k + 8k,即为1040k。

//  /art/runtime/thread.cc
static size_t FixStackSize(size_t stack_size) {
    // 这里面设置计算 stack_size,一般默认1040KB
}

发生OOM时的典型logcat如下:

W/libc: pthread_create failed: couldn't allocate 1073152-bytes mapped space: Out of memory
W/tch.crowdsourc: Throwing OutOfMemoryError with VmSize  4191668 kB "pthread_create (1040KB stack) failed: Try again"
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
        at java.lang.Thread.nativeCreate(Native Method)
        at java.lang.Thread.start(Thread.java:753)

2. 线程数量超过了限制导致失败,一般logcat中可以看到信息 pthread_create failed: clone failed: Try again

08-19 18:55:07.725 22139 22139 E OOM_TEST: create thread : 54
08-19 18:55:07.725 22139 22139 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.726 22139 22139 W com.demo: Throwing OutOfMemoryError "pthread_create (1040KB stack) failed: Try again" (VmSize 1715684 kB)
08-19 18:55:07.733 22139 22139 E InputEventSender: Exception dispatching finished signal.
08-19 18:55:07.733 22139 22139 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 18:55:07.734 22786 22786 W externalstorag: Using default instruction set features for ARM CPU variant (generic) using conservative defaults
08-19 18:55:07.734 22786 22786 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.735 22786 22786 F externalstorag: thread_pool.cc:66] pthread_create failed for new thread pool worker thread: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
   
    
08-19 18:55:07.739 22139 22139 E AndroidRuntime: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

4.3 debug技巧

  • 对于FD的限制

可以执行 cat /proc/pid/limits来查看Max open files 最大打开的文件数量

979092-20220819192450714-867456330.png

可以执行 ls /proc/pid/fd | wc -l 来查看进程打开的文件数量

979092-20220819192612855-196587532.png

  • 对于线程数量的限制

可以执行cat /proc/sys/kernel/threads-max 查看系统最多可以创建多少线程

可以执行echo 3000 > /proc/sys/kernel/threads-max修改这个值,做测试

查看系统当前的线程数 top -H

979092-20220819193001772-420615337.png

查看当前进程线程数量cat /proc/{pid}/status

979092-20220820135958452-980955298.png
  • 对于虚拟内存使用情况

可以执行 cat /proc/pid/status | grep Vm查看VmSize及VmPeak

979092-20220819193137222-1395377687.png

4.4 OOM演示

可以使用下面这段程序做简单演示,不过不同设备由于参数配置不同,可能会OOM error会有不同

private void testOOMCreatThread() {
    int count = 0;
    while (true) {
        Log.e("OOM_TEST", "create thread : " + ++count);
        new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    Thread.sleep(10000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }, "thread-" + count).start();
    }
}

五、参考及推荐阅读文章

✔️ 【性能优化】大厂OOM优化和监控方案

✔️ Android 创建线程源码与OOM分析

✔️ 不可思议的OOM

✔️ Android应用OutOfMemory -- 1.OOM机制了解 

✔️ 关于虚拟机参数的调整 --- heapgrowthlimit/heapsize的配置

✔️ Android中线程(Thread)的创建及内存分配过程分析


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK