• Android OOM 问题探究 -- 从入门到放弃


    一、前言

    最近客户反馈了一些OOM的问题,很早之前自己也有简单了解过OOM的知识,但时间久远,很多东西都记不清了。

    现在遇到这个OOM问题,也即趁此搜索了一些资料,对OOM问题做一些探究,把资料记录于此,一遍后续查阅。本文内容大量借鉴参考了网络上经典文章的内容,站在巨人的肩膀上登高望远!

    注:以下分析基于 Android R source

     

    二、OOM问题的可能原因

    网络上可以搜索到很多的解释,都很详细,我在此也做一个简单的总结,当然可能不全面,仅供学习参考

     

    Android系统中,OutOfMemoryError这个错误是怎么被系统抛出的?在代码进行搜索可以看到

     

    重点关注下面两点

    ✔️ 堆内存分配失败时的OOM  ==   /art/runtime/gc/heap.cc

    ✔️ 创建线程失败时的OOM     ==   /art/runtime/thread.cc

     

    三、OOM -- 堆内存分配失败

    在source code中我们可以看到,当堆内存分配失败时,会抛出一些典型的log,如下代码

    void Heap::ThrowOutOfMemoryError(Thread* self, size_t byte_count, AllocatorType allocator_type) {
      ...
      std::ostringstream oss;
      size_t total_bytes_free = GetFreeMemory();
      oss << "Failed to allocate a " << byte_count << " byte allocation with " << total_bytes_free
          << " free bytes and " << PrettySize(GetFreeMemoryUntilOOME()) << " until OOM,"
          << " target footprint " << target_footprint_.load(std::memory_order_relaxed)
          << ", growth limit "
          << growth_limit_;
      // If the allocation failed due to fragmentation, print out the largest continuous allocation.
      ...
    }

    在出现OOM问题时,logcat中应该会看到类似下面的信息输出

    08-19 11:34:53.860 28028 28028 E AndroidRuntime: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728

    上面这段logcat的大概解释:想要去分配 20971536 bytes的heap memory,但时app剩余可用的free heap只有6147912 bytes,而且当前app最大可分配的heap是134217728 bytes

     

    堆内存分配失败的原因可以分两种情况:1. 超过APP进程的heap内存上限 与 2. 没有足够大小的连续地址空间

     

    3.1 超过APP进程的内存上限

    Android设备上java虚拟机对单个应用的最大内存分配做了约束,超出这个值就会OOM。由Runtime.getRuntime.MaxMemory()可以得到Android中每个进程被系统分配的内存上限,当进程占用内存达到这个上限时就会发生OOM,这也是Android中最常见的OOM类型。

     


    Android系统有如下约定:

    • /vendor/build.prop有定义属性值来对单个应用的最大内存分配做约束

    dalvik.vm.heapgrowthlimit 常规app使用的参数

    dalvik.vm.heapsize 应用在AndroidManifest.xml设置了android:largeHeap="true",将会变成大应用的设置

     

    • 代码中也可以使用如下API来获取内存限制的信息

    ActivityManager.getMemoryClass() 常规app最大可用的堆内存,对应 dalvik.vm.heapgrowthlimit;

    ActivityManager.getLargeMemoryClass()应用在AndroidManifest.xml设置了android:largeHeap="true",将会变成大应用时最大可用的堆内存,对应dalvik.vm.heapsize;

    Runtime.getRuntime().maxMemory()  可以得到Android中每个进程被系统分配的内存上限,等于上面的两个值之一;


     

    如下是一段简单的代码来演示这种类型的OOM:

    private void testOOMCreatHeap(Context context) {
        ActivityManager activityManager =(ActivityManager)context.getSystemService(Context.ACTIVITY_SERVICE);
        Log.d("OOM_TEST", "app maxMemory = " + activityManager.getMemoryClass() + "MB");
        Log.d("OOM_TEST", "large app maxMemory = " + activityManager.getLargeMemoryClass() + "MB");
        Log.d("OOM_TEST", "current app maxMemory = " + Runtime.getRuntime().maxMemory()/1024/1024 + "MB");
        List<byte[]> bytesList = new ArrayList<>();
        int count = 0;
        while (true) {
            try {
                Thread.sleep(20);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            Log.e("OOM-TEST", "allocate 20MB heap: " + count++ + ", total " + 20*count + "MB");
            // 每次申请20MB内存
            bytesList.add(new byte[1024 * 1024 * 20]);
        }
    }

     

    注:我的测试平台 dalvik.vm.heapgrowthlimit=128MB , dalvik.vm.heapsize=384MB

    上面的测试代码中,我们每次分配20MB的内存

     

    情况一:常规应用,不要在AndroidManifest.xml设置android:largeHeap="true",此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapgrowthlimit=128MB

    看运行结果:

    08-19 11:34:53.555 28028 28028 D OOM_TEST: app maxMemory = 128MB
    08-19 11:34:53.556 28028 28028 D OOM_TEST: large app maxMemory = 384MB
    08-19 11:34:53.556 28028 28028 D OOM_TEST: current app maxMemory = 128MB
    08-19 11:34:53.576 28028 28028 E OOM-TEST: allocate 20MB heap: 0, total 20MB
    08-19 11:34:53.596 28028 28028 E OOM-TEST: allocate 20MB heap: 1, total 40MB
    08-19 11:34:53.617 28028 28028 E OOM-TEST: allocate 20MB heap: 2, total 60MB
    08-19 11:34:53.637 28028 28028 E OOM-TEST: allocate 20MB heap: 3, total 80MB
    08-19 11:34:53.658 28028 28028 E OOM-TEST: allocate 20MB heap: 4, total 100MB
    08-19 11:34:53.678 28028 28028 E OOM-TEST: allocate 20MB heap: 5, total 120MB
    08-19 11:34:53.699 28028 28028 E OOM-TEST: allocate 20MB heap: 6, total 140MB
    08-19 11:34:53.699 28028 28028 I com.demo: Waiting for a blocking GC Alloc
    08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
    08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
    08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.732 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 73us total 19.225ms
    08-19 11:34:53.733 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.766 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.767 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 71us total 33.715ms
    08-19 11:34:53.767 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
    08-19 11:34:53.767 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.792 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.792 28028 28028 I com.demo: Alloc concurrent copying GC freed 1046(50KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 57us total 25.120ms
    08-19 11:34:53.792 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971532 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264080 kB)
    08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.808 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 62us total 15.229ms
    08-19 11:34:53.808 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.835 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.836 28028 28028 I com.demo: Alloc concurrent copying GC freed 3(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 59us total 27.042ms
    08-19 11:34:53.836 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
    08-19 11:34:53.836 28028 28028 I com.demo: Starting a blocking GC Alloc
    08-19 11:34:53.857 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
    08-19 11:34:53.857 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 50us total 21.249ms
    08-19 11:34:53.857 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264016 kB)
    08-19 11:34:53.858 28028 28028 E InputEventSender: Exception dispatching finished signal.
    08-19 11:34:53.858 28028 28028 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
    08-19 11:34:53.859 28028 28028 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728
    08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
    08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

    解释:

    最后一次请求分配heap memory时,此时因为已经分配了120+MB的内存,如果继续分配20MB显然超过了限制的128MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

     

    情况二:常规应用,在AndroidManifest.xml设置android:largeHeap="true",此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapsize=384MB

    看运行结果:

    08-19 11:32:22.660 27539 27539 D OOM_TEST: app maxMemory = 128MB
    08-19 11:32:22.660 27539 27539 D OOM_TEST: large app maxMemory = 384MB
    08-19 11:32:22.660 27539 27539 D OOM_TEST: current app maxMemory = 384MB
    08-19 11:32:23.048 27539 27539 E OOM-TEST: allocate 20MB heap: 18, total 380MB
    08-19 11:32:23.061 27539 27553 I com.mediacodec: Clamp target GC heap from 406MB to 384MB
    08-19 11:32:23.069 27539 27539 E OOM-TEST: allocate 20MB heap: 19, total 400MB
    08-19 11:32:23.069 27539 27539 I com.demo: Starting a blocking GC Alloc
    08-19 11:32:23.226 27539 27539 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184" (VmSize 2053220 kB)
    08-19 11:32:23.226 27539 27539 E InputEventSender: Exception dispatching finished signal.
    08-19 11:32:23.226 27539 27539 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
    08-19 11:32:23.227 27539 27539 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184
    08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
    08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

    解释:

    最后一次请求分配heap memory时,此时因为已经分配了380+MB的内存,如果继续分配20MB显然超过了限制的384MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

     

    3.2 没有足够大小的连续地址空间

    这种情况一般是进程中存在大量的内存碎片导致的,其堆栈信息会比第一种OOM堆栈多出一段类似如下格式的信息
    :failed due to fragmentation (required continguous free “<< required_bytes << “ bytes for a new buffer where largest contiguous free ” << largest_continuous_free_pages << “ bytes)
    相关的代码在art/runtime/gc/allocator/rosalloc.cc中,如下
    void RosAlloc::LogFragmentationAllocFailure(std::ostream& os, size_t failed_alloc_bytes) {
      ...
      if (required_bytes > largest_continuous_free_pages) {
        os << "; failed due to fragmentation ("
           << "required contiguous free " << required_bytes << " bytes" << new_buffer_msg
           << ", largest contiguous free " << largest_continuous_free_pages << " bytes"
           << ", total free pages " << total_free << " bytes"
           << ", space footprint " << footprint_ << " bytes"
           << ", space max capacity " << max_capacity_ << " bytes"
           << ")" << std::endl;
      }
    }
     
    这种场景比较难模拟,这里就不做演示了。
     
     

    四、OOM -- 创建线程失败

    Android中线程(Thread)的创建及内存分配过程分析可以参见如下这篇文章:https://blog.csdn.net/u011578734/article/details/109331764

    线程创建会消耗大量的系统资源(例如内存),创建过程涉及java层和native的处理。实质工作是在native层完成的,相关代码位于 /art/runtime/thread.cc

    void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
        // 此处省略一万字
        
      {
        std::string msg(child_jni_env_ext.get() == nullptr ?
            StringPrintf("Could not allocate JNI Env: %s", error_msg.c_str()) :
            StringPrintf("pthread_create (%s stack) failed: %s",
                                     PrettySize(stack_size).c_str(), strerror(pthread_create_result)));
        ScopedObjectAccess soa(env);
        soa.Self()->ThrowOutOfMemoryError(msg.c_str());
      }
        
    }

    大概总结如下:下图借鉴了网络上的资料(偷懒了)

    4.1 创建JNI Env 失败

    一般有两种原因

    1. FD溢出导致JNIEnv创建失败了,一般logcat中可以看到信息 Too many open files ... Could not allocate JNI Env

    当进程fd数(可以通过 ls /proc/pid/fd | wc -l 获得)突破 /proc/pid/limits中规定的Max open files时,产生OOM

    E/art: ashmem_create_region failed for 'indirect ref table': Too many open files java.lang.OutOfMemoryError:Could not allocate JNI Env at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:730)

    2. 虚拟内存不足导致JNIEnv创建失败了,一般logcat中可以看到信息 Could not allocate JNI Env: Failed anonymous mmap

    08-19 17:51:50.662  3533  3533 E OOM_TEST: create thread : 1104
    08-19 17:51:50.663  3533  3533 W com.demo: Throwing OutOfMemoryError "Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log." (VmSize 2865432 kB)
    08-19 17:51:50.663  3533  3533 E InputEventSender: Exception dispatching finished signal.
    08-19 17:51:50.663  3533  3533 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
    08-19 17:51:50.668  3533  3533 E MessageQueue-JNI: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
    08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
    08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.start(Thread.java:887)
    
    
    08-19 17:51:50.671  3533  3533 E AndroidRuntime: FATAL EXCEPTION: main
    08-19 17:51:50.671  3533  3533 E AndroidRuntime: Process: com.demo, PID: 3533
    08-19 17:51:50.671  3533  3533 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
    08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
    08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

     

    4.2 创建线程失败

    一般有两种原因

    1. 虚拟内存不足导致失败,一般logcat中可以看到信息 mapped space: Out of memory  ... pthread_create (1040KB stack) failed: Out of memory

    native层通过FixStackSize设置线程栈大小,默认情况下,线程栈所需内存总大小 = 1M + 8k + 8k,即为1040k。

    //  /art/runtime/thread.cc
    static size_t FixStackSize(size_t stack_size) {
        // 这里面设置计算 stack_size,一般默认1040KB
    }

    发生OOM时的典型logcat如下:

    W/libc: pthread_create failed: couldn't allocate 1073152-bytes mapped space: Out of memory
    W/tch.crowdsourc: Throwing OutOfMemoryError with VmSize  4191668 kB "pthread_create (1040KB stack) failed: Try again"
    java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
            at java.lang.Thread.nativeCreate(Native Method)
            at java.lang.Thread.start(Thread.java:753)

    2. 线程数量超过了限制导致失败,一般logcat中可以看到信息 pthread_create failed: clone failed: Try again

    08-19 18:55:07.725 22139 22139 E OOM_TEST: create thread : 54
    08-19 18:55:07.725 22139 22139 W libc    : pthread_create failed: clone failed: Try again
    08-19 18:55:07.726 22139 22139 W com.demo: Throwing OutOfMemoryError "pthread_create (1040KB stack) failed: Try again" (VmSize 1715684 kB)
    08-19 18:55:07.733 22139 22139 E InputEventSender: Exception dispatching finished signal.
    08-19 18:55:07.733 22139 22139 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
    08-19 18:55:07.734 22786 22786 W externalstorag: Using default instruction set features for ARM CPU variant (generic) using conservative defaults
    08-19 18:55:07.734 22786 22786 W libc    : pthread_create failed: clone failed: Try again
    08-19 18:55:07.735 22786 22786 F externalstorag: thread_pool.cc:66] pthread_create failed for new thread pool worker thread: Try again
    08-19 18:55:07.737 22139 22139 E MessageQueue-JNI: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
    08-19 18:55:07.737 22139 22139 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
       
        
    08-19 18:55:07.739 22139 22139 E AndroidRuntime: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
    08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
    08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

     

    4.3 debug技巧

    • 对于FD的限制

    可以执行 cat /proc/pid/limits来查看Max open files 最大打开的文件数量

    可以执行 ls /proc/pid/fd | wc -l 来查看进程打开的文件数量

     

    • 对于线程数量的限制

    可以执行cat /proc/sys/kernel/threads-max 查看系统最多可以创建多少线程

    可以执行echo 3000 > /proc/sys/kernel/threads-max修改这个值,做测试

    查看系统当前的线程数 top -H

    查看当前进程线程数量cat /proc/{pid}/status

     

    • 对于虚拟内存使用情况

    可以执行 cat /proc/pid/status | grep Vm查看VmSize及VmPeak

    4.4 OOM演示

    可以使用下面这段程序做简单演示,不过不同设备由于参数配置不同,可能会OOM error会有不同

    private void testOOMCreatThread() {
        int count = 0;
        while (true) {
            Log.e("OOM_TEST", "create thread : " + ++count);
            new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        Thread.sleep(10000);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }, "thread-" + count).start();
        }
    }

     

    --

    五、参考及推荐阅读文章

    ✔️ 【性能优化】大厂OOM优化和监控方案

    ✔️ Android 创建线程源码与OOM分析

    ✔️ 不可思议的OOM

    ✔️ Android应用OutOfMemory -- 1.OOM机制了解 

    ✔️ 关于虚拟机参数的调整 --- heapgrowthlimit/heapsize的配置

    ✔️ Android中线程(Thread)的创建及内存分配过程分析

     

    --

  • 相关阅读:
    day01_matplotlib_demo
    合宙AIR32F103CBT6开发板上手报告
    《Effective C++》条款15
    Git出现“Filename too long”
    SQL注入漏洞 | 数字型
    Golang 本地缓存选型对比及原理总结
    Redis主从复制基础概念
    国内可以免费使用的GPT
    应对反爬虫策略分享
    Shell(5)数组
  • 原文地址:https://www.cnblogs.com/roger-yu/p/16599262.html