5

Android 12 疑难崩溃解决之路

 2 years ago
source link: https://www.51cto.com/article/700901.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Android 12 疑难崩溃解决之路-51CTO.COM
Android 12 疑难崩溃解决之路
作者:林作健 2022-02-09 14:13:18
UC内核在Android 12上发现一个致命的崩溃。约有10%的用户在冷启动的时候会遇到这个问题,严重影响了UC内核的发布。

UC内核在Android 12上发现一个致命的崩溃。约有10%的用户在冷启动的时候会遇到这个问题,严重影响了UC内核的发布。它的调用栈是这样的:

10-12 19:03:21.461  1038  2723 I id.AlipayGphon: Rejecting re-init on previously-failed class java.lang.Class<com.uc.webkit.impl.WebViewChromiumFactoryProvider>: java.lang.VerifyError: Verifier rejected class com.uc.webkit.impl.WebViewChromiumFactoryProvider: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g() failed to verify: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g(): [0x15]  can't resolve returned type 'Unresolved Reference: com.uc.webkit.an' or 'Unresolved Reference: com.uc.webkit.impl.ak' (declaration of 'com.uc.webkit.impl.WebViewChromiumFactoryProvider' appears in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.28.21092218119_64/so/core.jar)
10-12 19:03:21.461  1038  2723 I id.AlipayGphon: (Throwable with empty stack trace)
10-12 19:03:21.464  1038  2723 E WebViewEntry: init error and prepare native crash
10-12 19:03:21.464  1038  2723 E WebViewEntry: java.lang.NoClassDefFoundError: com.uc.webkit.impl.WebViewChromiumFactoryProvider
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at com.uc.webkit.impl.WebViewChromiumFactoryProvider.i(Unknown Source:0)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at com.uc.webkit.WebViewEntry.p(U4Source:193)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at com.uc.webkit.bg.run(Unknown Source:0)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at android.os.Handler.handleCallback(Handler.java:938)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at android.os.Handler.dispatchMessage(Handler.java:99)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at android.os.Looper.loopOnce(Looper.java:201)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at android.os.Looper.loop(Looper.java:288)
10-12 19:03:21.464  1038  2723 E WebViewEntry:     at android.os.HandlerThread.run(HandlerThread.java:67)
10-12 19:03:21.464  1038  2723 E WebViewEntry: Caused by: java.lang.VerifyError: Verifier rejected class com.uc.webkit.impl.WebViewChromiumFactoryProvider: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g() failed to verify: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g(): [0x15]  can't resolve returned type 'Unresolved Reference: com.uc.webkit.an' or 'Unresolved Reference: com.uc.webkit.impl.ak' (declaration of 'com.uc.webkit.impl.WebViewChromiumFactoryProvider' appears in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.28.21092218119_64/so/core.jar)

不解决这个问题我们的内核可能无法在Android 12上启用了,对于内核来说又是一个生死攸关的问题。这个问题正常操作无法重现,只能通过monkey疯狂冷启动才能偶现。

另外一个背景是UC浏览器把sdk level提高到了30才引发这个问题。

调用栈分析

从调用栈的信息我们看到最顶层的Error是NoClassDefFoundError,但他是由下面的VerifyError引起的。这个调用栈显示正在进行正常的启动过程。

Rejecting re-init on previously-failed class 显示com.uc.webkit.impl.WebViewChromiumFactoryProvider应该已经尝试过Verify,但是Error了。按照常理应该还有一个VerifyError的抛出。但找了多个崩溃日志都没有发现第一次VerifyError抛出的位置。

另外,这个VerifyError的 Caused by: java.lang.VerifyError位置应该后面还跟着它第一次Verify的调用栈,但它却显示(Throwable with empty stack trace)。

黑科技分析:手段一

带着上述的诸多疑问,我们发现目前的数据不足以我们进行分析,我们需要更多的和Verify有关的信息才能处理问题。

Android的art虚拟机是带着verbose log的。它是按照模块分类的,平时不会打开。需要启动art的时候通过传参让它打开。

我们尝试了wrapper技术,即在lib目录加上文件wrapper.sh,系统就会用wrapper.sh启动虚拟机,而不是通过Zygote。很遗憾这个手段没有作用,分析了AndroidRuntime.cpp里面的源码后,我们发现wrapper传入的虚拟机参赛会被它过滤掉,完全无视。

我们只能使用正经途径之外的方法了。

e185acf62dce77ae3f3203a0929d3c35536dec.jpg

上图是Verbose log的结构,我们看到有个全局变量gLogVerbosity控制这它们的开关。我们能不能通过修改gLogVerbosity达到启动verbose log的目的?

UC内核有着一系列强大的黑科技组合。适应这种需求的黑科技是symbol_resolver模块。这个技术能够从/proc/self/maps文件里面分析指名的so映射的位置,并通过elf解析拿到所有的符号,然后我们就能够从Key-Value对里面找到想要的符号的位置。

用这个技术我们很快定位了libart.so里面的gLogVerbosity位置,并且当作一个bool数组把verifier和verifier_debug项置为true。于是我们有了新的log:

Verification failed on class org.chromium.ui.base.WindowAndroid in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.31.10191532_64/so/core.jar because: Verifier rejected class org.chromium.ui.base.WindowAndroid: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken() failed to verify: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken(): [0x10]  can't resolve returned type 'Unresolved Reference: android.os.IBinder' or 'Reference: android.os.IBinder'
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x0] : Processing const/4 v1, #+0
0:[Undefined],1:[Undefined],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1] : Processing iget-object v0, v2, Ljava/lang/ref/WeakReference; org.chromium.ui.base.WindowAndroid.e // field@7982
0:[Undefined],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x3] : Processing invoke-virtual {v0}, java.lang.Object java.lang.ref.WeakReference.get() // method@7347
0:[Reference: java.lang.ref.WeakReference],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x6] : Processing move-result-object v0
0:[Reference: java.lang.ref.WeakReference],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x7] : Processing check-cast v0, android.content.Context // type@TypeIndex[61]
0:[Reference: java.lang.Object],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x9] : Processing invoke-static {v0}, android.app.Activity org.chromium.ui.base.WindowAndroid.a(android.content.Context) // method@17017
0:[Reference: android.content.Context],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xc] : Processing move-result-object v0
0:[Reference: android.content.Context],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xd] : Processing if-nez v0, +4
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xf] : Processing move-object v0, v1
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x10] : Processing return-object v0
0:[Zero/null],1:[Conflict],2:[Conflict],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x11] : Processing invoke-virtual {v0}, android.view.Window android.app.Activity.getWindow() // method@26
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x14] : Processing move-result-object v0
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x15] : Processing if-nez v0, +4
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x17] : Processing move-object v0, v1
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x18] : Processing goto -8
0:[Zero/null],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x19] : Processing invoke-virtual {v0}, android.view.View android.view.Window.peekDecorView() // method@1459
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1c] : Processing move-result-object v0
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1d] : Processing if-nez v0, +4
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1f] : Processing move-object v0, v1
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x20] : Processing goto -16
0:[Zero/null],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x21] : Processing invoke-virtual {v0}, android.os.IBinder android.view.View.getWindowToken() // method@1318
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x24] : Processing move-result-object v0
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x25] : Processing goto -21
0:[Reference: android.os.IBinder],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x25] : Merging at [0x25] to [0x10]: 
0:[Zero/null],1:[Conflict],2:[Conflict],  MERGE
0:[Reference: android.os.IBinder],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],  ==
0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x10] : Processing return-object v0
0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
Rejecting opcode return-object v0
Register Types:
  0: Undefined
  1: Conflict
  2: null
  3: Boolean
  4: Byte
  5: Short
  6: Char
  7: Integer
  8: Long (Low Half)
  9: Long (High Half)
  10: Float
  11: Double (Low Half)
  12: Double (High Half)
  13: Precise Constant: -1
  14: Zero/null
  15: Precise Constant: 1
  16: Precise Constant: 2
  17: Precise Constant: 3
  18: Precise Constant: 4
  19: Reference: org.chromium.ui.base.WindowAndroid
  20: Reference: java.lang.Object
  21: Reference: java.lang.ref.WeakReference
  22: Reference: java.lang.ref.Reference
  23: Reference: android.content.Context
  24: Reference: android.app.Activity
  25: Unresolved Reference: android.os.IBinder
  26: Reference: android.view.Window
  27: Reference: android.view.View
  28: Reference: android.os.IBinder
Dumping instructions and register lines:
  0:[Undefined],1:[Undefined],2:[Reference: org.chromium.ui.base.WindowAndroid],
  0x0000: V-O-B-- const/4 v1, #+0
  0x0001: V-O---- iget-object v0, v2, Ljava/lang/ref/WeakReference; org.chromium.ui.base.WindowAndroid.e // field@7982
  0x0003: V-O---- invoke-virtual {v0}, java.lang.Object java.lang.ref.WeakReference.get() // method@7347
  0x0006: V-O---- move-result-object v0
  0x0007: V-O--G- check-cast v0, android.content.Context // type@TypeIndex[61]
  0x0009: V-O---- invoke-static {v0}, android.app.Activity org.chromium.ui.base.WindowAndroid.a(android.content.Context) // method@17017
  0x000c: V-O---- move-result-object v0
  0x000d: V-O---- if-nez v0, +4
  0x000f: V-O---- move-object v0, v1
  0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
  0x0010: VCO-B-R return-object v0
  0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
  0x0011: V-O-B-- invoke-virtual {v0}, android.view.Window android.app.Activity.getWindow() // method@26
  0x0014: V-O---- move-result-object v0
  0x0015: V-O---- if-nez v0, +4
  0x0017: V-O---- move-object v0, v1
  0x0018: V-O---- goto -8
  0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
  0x0019: V-O-B-- invoke-virtual {v0}, android.view.View android.view.Window.peekDecorView() // method@1459
  0x001c: V-O---- move-result-object v0
  0x001d: V-O---- if-nez v0, +4
  0x001f: V-O---- move-object v0, v1
  0x0020: V-O---- goto -16
  0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
  0x0021: V-O-B-- invoke-virtual {v0}, android.os.IBinder android.view.View.getWindowToken() // method@1318
  0x0024: V-O---- move-result-object v0
  0x0025: V-O---- goto -21
Setting org.chromium.ui.base.WindowAndroid to erroneous.

这个log最值得关注的有两点:

1、[0x10] can't resolve returned type 'Unresolved Reference: android.os.IBinder' or 'Reference: android.os.IBinder' VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x0] : Processing const/4 v1, #+0

24e82ab32896ba75fd37843efa4fcb5abbdde6.gif68add591835491a8129949c23cf30d58bc039d.jpg

根据打log的代码,我们看到return_type对应着'Unresolved Reference: android.os.IBinder'。

但return_type的来源是:

c5a62a3304f1abe9c6663672c4ed538d66b482.jpg

而GetMethodReturnType:

92070c223cb4a82e1e455977f35597a450c843.jpg

会调用FromDescriptor:

c188a7c10c184e50ab5681ed5fd3dd8995f5d0.jpg

会调用ResolveClass,ResolveClass会调用ClassLinker::FindClass,FindClass有个显而易见的失败前提是:

f7f0d8b31b3b8e9942b086c5c73f5f1ece9d46.jpg

也就是在当前线程是RuntimeThread的时候,会拒绝FindClass。因为这可能会导致class进入初始化过程,导致它调用class里面static block中的class初始化函数。在RuntimeThread缺少允许java 函数的环境,不能允许它这么做。

难道由于当前线程是Runtime Thread吗?是的话这个Thread是哪个Runtime Thread?难道是gc thread吗?

2、对这个日志前后的Verify动作进行分析。发现正常能Verify过的线程,都有load class的日志。但出问题的这条线程一条load class的日志都没有,后面它还因为同样的原因Verify失败了好几个class。这更加肯定失败的线程是一个Runtime Thread。另外前面提到的VerifyError没有调用栈记录的现象也在侧面印证这是个Runtime Thread。因为Runtime Thread没有Java环境,不能调用Java函数,所以没有记录。但我们还是需要找到这个线程是什么。为此我们动用了第二个黑科技。

黑科技分析:手段二

通过观察代码,我们发现VerifyError都是通过同一个函数抛出的:

f2d0513274dbab8512a9406d19b8f2e4c60060.jpg

我们也能找到它的全局符号,所以我们只需要在这个符号的位置加上执行马上崩溃的代码,然后让monkey触发这个问题就能处理它了。

这里有个问题:android为了安全的原因禁止我们把代码段的权限改为可写。

如何安全的把代码段改了呢?我们使用了/prof/self/mem技术:打开/proc/self/mem文件,然后用pwrite api往符号的位置写入必崩代码。

这样我们就发现了Verify失败的那个线程:

d394fca854eea9323e54186482882c1ccfa211.jpg

744172397014d985862966de37dbf8babadc74.jpg

根本原因分析

我们拿到了线程名Verification th。也拿到了线程启动的调用栈。他是从ThreadPool启动的,ThreadPool中的Thread都是RuntimeThread,坐实了之前的猜测。线程运行的任务是BackgroundVerificationTask。可以迅速找到它启动的位置:

d71949927d39857508970519f925c13bfda5c1.jpg

再找一下是这个提交出的问题:

commit 0d5f6402ff925ac1385ccb349f8a2798a4816458 Author: Nicolas Geoffray [email protected] Date: Tue Apr 13 13:05:36 2021 +0100

Only run background verification when dexPathList is set.

Otherwise, the runtime will not be able to find the classes.

Test: 692-vdex-secondary-loader
Bug: 185088679
Change-Id: Idd39eabe00faa017aa5254f7188e7adbcaa23c74

diff --git a/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java b/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
index 710a88cc6d0..afbc9ec9de7 100644
--- a/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
+++ b/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
@@ -128,6 +128,9 @@ public class BaseDexClassLoader extends ClassLoader {
                 : Arrays.copyOf(sharedLibraryLoaders, sharedLibraryLoaders.length);
         this.pathList = new DexPathList(this, dexPath, librarySearchPath, null, isTrusted);
 
+        // Run background verification after having set 'pathList'.
+        this.pathList.maybeRunBackgroundVerification(this);
+
         reportClassLoaderChain();
     }
 
@@ -186,6 +189,8 @@ public class BaseDexClassLoader extends ClassLoader {
         this.sharedLibraryLoaders = null;
         this.pathList = new DexPathList(this, librarySearchPath);
         this.pathList.initByteBufferDexPath(dexFiles);
+        // Run background verification after having set 'pathList'.
+        this.pathList.maybeRunBackgroundVerification(this);
     }
 
     @Override

用git tag --contain命令找了下,发现确实是android 12 beta版开始带上的。

除了向谷歌报告问题,抱怨一通之外我们还是要找到解决方案。谷歌说他们下一版android 12的12月更新就会解决这个问题,但很多老机器根本不更新,所以他们是指望不上的了。

我们必须从OatFileManager::RunBackgroundVerification函数里面找到逼迫它不要启动后台验证线程的方法。我们的目光很快落在了:

c1c5cc8823c65d517b7652f5b05c60ba3427ca.jpg

上面。因为我们还是能控制文件名的。前面的逻辑也有判断sdk level,只要sdk level<=29也不会启动这个线程,但UC浏览器已经把sdk level打开到30了(这也印证了背景提到UC浏览器把sdk level提高到30才出现)。

观察了函数DexLocationToOdexFilename,发现一行很有帮助:

// Get the base part of the file without the extension.
  std::string file = location.substr(pos+1);
  pos = file.rfind('.');
  if (pos == std::string::npos) {
    *error_msg = "Dex location " + location + " has no extension.";
    return false;
  }

只要我们让它找不到suffix separator "."就能迫使它退出了。

对android 12使用了软链接core.jar为corejar的方法后, 这个问题就消失了。威胁UC内核的怪兽被打败了,世界又恢复往日的和平。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK