22

简单分析 App 进程 Crash 机制

 4 years ago
source link: http://mp.weixin.qq.com/s?__biz=MzIxNzU1Nzk3OQ%3D%3D&%3Bmid=2247490234&%3Bidx=1&%3Bsn=68ac2714ac67d1b05ef73047585d6bd0
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

code小生 一个专注大前端领域的技术平台 公众号回复 Android 加入安卓技术群

作者:杰杰_88

链接:https://www.jianshu.com/p/ecd52cd90a4b

声明:本文已获 杰杰_88 授权发表,转发等请联系原作者授权

结论:App进程Crash,不是真正意义上的进程崩溃(对比native代码崩溃),是java代码运行抛出没人处理的异常后,App自己把自己Kill掉了。

工作中遇到后台Service挂掉后(弹出停止运行),很久没有重启,分析log发现进程抛出FATAL EXCEPTION后并没有被杀,很久后才被杀掉重启,迷惑,遂看看具体的App挂掉流程是什么样的。

表象

当一个Android App进程因为各种原因抛出异常而没有被catch处理的时候,在用户看来,就会看到一个“某某已停止运行”的对话框,之前我一般认为该app进程已经挂掉。

实际上

以前在看到“某某已停止运行”时,一直认为对应进程也同时结束,没有仔细分析过整个App停止运行的机制,其实,停止运行对话框弹出的时候,进程还没有完全退出,真正的退出是进程将自己kill掉的时候。下面就记录下从App抛出没有catch的异常到该进程真正灰飞烟灭的整个过程。

App进程的创建

要分析一个app进程是怎么没的,先看看app进程是怎么来的。

关键代码

App进程创建流程:

Afyqeim.png!web App进程启动流程.png

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

startResult = Process.start(entryPoint,

                        app.processName, uid, uid, gids, debugFlags, mountExternal,

                        app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet,

                        app.info.dataDir, invokeWith, entryPointArgs);

frameworks/base/core/java/android/os/ZygoteProcess.java

//ZygoteState维护了与Zygote进程通过Socket的连接

    private ZygoteState openZygoteSocketIfNeeded(String abi) throws ZygoteStartFailedEx {

        Preconditions.checkState(Thread.holdsLock(mLock), "ZygoteProcess lock not held");

        if (primaryZygoteState == null || primaryZygoteState.isClosed()) {

            try {

                primaryZygoteState = ZygoteState.connect(mSocket);

            } catch (IOException ioe) {

                throw new ZygoteStartFailedEx("Error connecting to primary zygote", ioe);

            }

        }

        if (primaryZygoteState.matches(abi)) {

            return primaryZygoteState;

        }

        // The primary zygote didn't match. Try the secondary.

        if (secondaryZygoteState == null || secondaryZygoteState.isClosed()) {

            try {

                secondaryZygoteState = ZygoteState.connect(mSecondarySocket);

            } catch (IOException ioe) {

                throw new ZygoteStartFailedEx("Error connecting to secondary zygote", ioe);

            }

        }

        if (secondaryZygoteState.matches(abi)) {

            return secondaryZygoteState;

        }

        throw new ZygoteStartFailedEx("Unsupported zygote ABI: " + abi);

    }


    private static Process.ProcessStartResult zygoteSendArgsAndGetResult(

            ZygoteState zygoteState, ArrayList<String> args)

            throws ZygoteStartFailedEx {

        try {

            // Throw early if any of the arguments are malformed. This means we can

            // avoid writing a partial response to the zygote.

            int sz = args.size();

            for (int i = 0; i < sz; i++) {

                if (args.get(i).indexOf('\n') >= 0) {

                    throw new ZygoteStartFailedEx("embedded newlines not allowed");

                }

            }

            /**

            * See com.android.internal.os.SystemZygoteInit.readArgumentList()

            * Presently the wire format to the zygote process is:

            * a) a count of arguments (argc, in essence)

            * b) a number of newline-separated argument strings equal to count

            *

            * After the zygote process reads these it will write the pid of

            * the child or -1 on failure, followed by boolean to

            * indicate whether a wrapper process was used.

            */

            final BufferedWriter writer = zygoteState.writer;

            final DataInputStream inputStream = zygoteState.inputStream;

            writer.write(Integer.toString(args.size()));

            writer.newLine();

            for (int i = 0; i < sz; i++) {

                String arg = args.get(i);

                writer.write(arg);

                writer.newLine();

            }

            writer.flush();

            // Should there be a timeout on this?

            Process.ProcessStartResult result = new Process.ProcessStartResult();

            // Always read the entire result from the input stream to avoid leaving

            // bytes in the stream for future process starts to accidentally stumble

            // upon.

            result.pid = inputStream.readInt();

            result.usingWrapper = inputStream.readBoolean();

            if (result.pid < 0) {

                throw new ZygoteStartFailedEx("fork() failed");

            }

            return result;

        } catch (IOException ex) {

            zygoteState.close();

            throw new ZygoteStartFailedEx(ex);

        }

    }

zygoteSendArgsAndGetResult方法通过LocalSocket发送的命令被Zygote接收到:

frameworks/base/core/java/com/android/internal/os/ZygoteConnection.java

pid = Zygote.forkAndSpecialize(parsedArgs.uid, parsedArgs.gid, parsedArgs.gids,

                parsedArgs.debugFlags, rlimits, parsedArgs.mountExternal, parsedArgs.seInfo,

                parsedArgs.niceName, fdsToClose, fdsToIgnore, parsedArgs.instructionSet,

                parsedArgs.appDataDir);

此处fork出真正的app进程,然后在fork出的子进程中执行命令:

ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs,

                    null /* classLoader */);

执行的命令:

最终会从ActivityThread.java 的main函数进入,开始App的生命周期

*RuntimeInit.commonInit()

上面流程中,App进程fork出来后,执行此函数:

RuntimeInit.commonInit()

其中:

    Thread.setUncaughtExceptionPreHandler(new LoggingHandler());

    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());
    /**

    * Dispatch an uncaught exception to the handler. This method is

    * intended to be called only by the runtime and by tests.

    *

    * @hide

    */

    // @VisibleForTesting (would be private if not for tests)

    public final void dispatchUncaughtException(Throwable e) {

        Thread.UncaughtExceptionHandler initialUeh =

                Thread.getUncaughtExceptionPreHandler();

        if (initialUeh != null) {

            try {

                initialUeh.uncaughtException(this, e);

            } catch (RuntimeException | Error ignored) {

                // Throwables thrown by the initial handler are ignored

            }

        }

        getUncaughtExceptionHandler().uncaughtException(this, e);

    }

setUncaughtExceptionPreHandler设置“未捕获异常预处理程序”为loggingHandler,setDefaultUncaughtExceptionHandler设置真正的“未捕获异常默认处理程序”为KillApplicationHandler,按字面意思以及函数dispatchUncaughtException理解,发生异常时,先调用loggingHandler处理异常,再调用KillApplicationHandler处理。loggingHandler就是用来打印FATAL EXCEPTION以及trace的:

E AndroidRuntime: FATAL EXCEPTION: main

KillApplicationHandler:

    /**

    * Handle application death from an uncaught exception.  The framework

    * catches these for the main threads, so this should only matter for

    * threads created by applications.  Before this method runs,

    * {@link LoggingHandler} will already have logged details.

    */

    private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {

        public void uncaughtException(Thread t, Throwable e) {

            try {

                // Don't re-enter -- avoid infinite loops if crash-reporting crashes.

                if (mCrashing) return;

                mCrashing = true;

                // Try to end profiling. If a profiler is running at this point, and we kill the

                // process (below), the in-memory buffer will be lost. So try to stop, which will

                // flush the buffer. (This makes method trace profiling useful to debug crashes.)

                if (ActivityThread.currentActivityThread() != null) {

                    ActivityThread.currentActivityThread().stopProfiling();

                }

                final String processName = ActivityThread.currentProcessName();

                if (processName != null) {

                    if (Build.IS_USERDEBUG && processName.equals(SystemProperties.get("persist.debug.process")))  {

                        Log.w(TAG, "process: " + processName + " crash message is skip");

                        return;

                    }

                }

                // Bring up crash dialog, wait for it to be dismissed

                ActivityManager.getService().handleApplicationCrash(

                        mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));

            } catch (Throwable t2) {

                if (t2 instanceof DeadObjectException) {

                    // System process is dead; ignore

                } else {

                    try {

                        Clog_e(TAG, "Error reporting crash", t2);

                    } catch (Throwable t3) {

                        // Even Clog_e() fails!  Oh well.

                    }

                }

            } finally {

                // Try everything to make sure this process goes away.

                Process.killProcess(Process.myPid());

                System.exit(10);

            }

        }

    }

这里通过如下代码和ActivityManagerService交互弹出“停止运行”对话框,注意注释,对话框消失后才会继续往下执行。

// Bring up crash dialog, wait for it to be dismissed

ActivityManager.getService().handleApplicationCrash(

                        mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));

在ActivityManagerService,最终会停在如下代码处:

AppErrors.java crashApplicationInner():
synchronized (mService) {

            /**

            * If crash is handled by instance of {@link android.app.IActivityController},

            * finish now and don't show the app error dialog.

            */

            if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,

                    timeMillis, callingPid, callingUid)) {

                return;

            }

            /**

            * If this process was running instrumentation, finish now - it will be handled in

            * {@link ActivityManagerService#handleAppDiedLocked}.

            */

            if (r != null && r.instr != null) {

                return;

            }

            // Log crash in battery stats.

            if (r != null) {

                mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);

            }

            AppErrorDialog.Data data = new AppErrorDialog.Data();

            data.result = result;

            data.proc = r;

            // If we can't identify the process or it's already exceeded its crash quota,

            // quit right away without showing a crash dialog.

            if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {

                return;

            }

            final Message msg = Message.obtain();

            msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

            task = data.task;

            msg.obj = data;

            mService.mUiHandler.sendMessage(msg);

        }

        int res = result.get();

result为AppErrorResult类型,result.get()会wait(),block当前Binder调用,等待对应的notify;前面的代码就是弹出“停止运行”的对话框:AppErrorDialog,result会随data传入AppErrorDialog,dismiss时调用result.set(),唤醒刚才Binder线程的wait:

AppErrorResult
final class AppErrorResult {

    public void set(int res) {

        synchronized (this) {

            mHasResult = true;

            mResult = res;

            notifyAll();

        }

    }

    public int get() {

        synchronized (this) {

            while (!mHasResult) {

                try {

                    wait();

                } catch (InterruptedException e) {

                }

            }

        }

        return mResult;

    }

    boolean mHasResult = false;

    int mResult;

}

然后进行后面的处理Binder调用返回后,App进程中才最终会杀死自己:

finally {

    // Try everything to make sure this process goes away.

    Process.killProcess(Process.myPid());

    System.exit(10);

}

注意到,在AppErrorDialog构造函数中:

// After the timeout, pretend the user clicked the quit button

mHandler.sendMessageDelayed(

        mHandler.obtainMessage(TIMEOUT),

        DISMISS_TIMEOUT)

如果用户一直没有理睬,会在5分钟后返回,可以注意如下log:

Slog.w(TAG, "handleApplicationStrictModeViolation; res=" + res);

在超时后才返回,就会导致 app 进程在 crash 状态下存在 5 分钟之久,除了异常的线程,其他线程还会努力工作,有可能会有些奇怪的事情发生。应该挂掉重启的,由于进程没有被杀死, ActivityManagerService 收不到 binderDied 消息,也会在超时之前一直得不到重启。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK