本文分析基于Android 13 (T)
异常,是程序未按预设逻辑运行的一种提示。
Java中的异常输出通常包含一句提示语和其发生时的调用栈。多数情况下,这些提示是直接且清晰的。但如果我们将异常捕获后封装一下重新抛出,或者让它发生在跨进程通信的过程中,那么此时的调用栈信息将会变得复杂,甚至会干扰我们对最终原因的判断。以下将详解几种不同形式的异常调用栈。
以下是剥离了时间、pid、tid和tag后的输出。
*** FATAL EXCEPTION IN SYSTEM PROCESS: main
java.lang.RuntimeException: Error receiving broadcast Intent { act=android.intent.action.NEW_OUTGOING_CALL flg=0x11000010 (has extras) } in com.android.server.location.injector.SystemEmergencyHelper$1@42d2813
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1800)
at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at com.android.server.SystemServer.run(SystemServer.java:966)
at com.android.server.SystemServer.main(SystemServer.java:651)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)
Caused by: java.lang.IllegalStateException: telephony service is null.
at android.telephony.TelephonyManager.isEmergencyNumber(TelephonyManager.java:14136)
at com.android.server.location.injector.SystemEmergencyHelper$1.onReceive(SystemEmergencyHelper.java:70)
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1790)
... 10 more
可以发现其中有两段不同的调用栈,由"Caused by"字段进行分隔。结合以下代码,我们可以分析出此异常的转换过程:
广播处理会进入到receiver.onReceive中,其中发生了"telephony service is null"的IllegalStateException。此异常向上抛出,最终被如下代码的1791行捕获。捕获之后的异常会在1800行进行重新封装,原始异常e将会作为第二个参数参与RuntimeException的构造(赋值给cause字段)。因此,这个RuntimeException是导致进程退出的直接原因,而原始异常IllegalStateException则是根本原因。
1781 try {
1782 ClassLoader cl = mReceiver.getClass().getClassLoader();
1783 intent.setExtrasClassLoader(cl);
1784 // TODO: determine at registration time if caller is
1785 // protecting themselves with signature permission
1786 intent.prepareToEnterProcess(ActivityThread.isProtectedBroadcast(intent),
1787 mContext.getAttributionSource());
1788 setExtrasClassLoader(cl);
1789 receiver.setPendingResult(this);
1790 receiver.onReceive(mContext, intent);
1791 } catch (Exception e) {
1792 if (mRegistered && ordered) {
1793 if (ActivityThread.DEBUG_BROADCAST) Slog.i(ActivityThread.TAG,
1794 "Finishing failed broadcast to " + mReceiver);
1795 sendFinished(mgr);
1796 }
1797 if (mInstrumentation == null ||
1798 !mInstrumentation.onException(mReceiver, e)) {
1799 Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
1800 throw new RuntimeException(
1801 "Error receiving broadcast " + intent
1802 + " in " + mReceiver, e);
1803 }
1804 }
306 public Throwable(String message, Throwable cause) { //第二个参数赋值给cause字段
307 fillInStackTrace();
308 detailMessage = message;
309 this.cause = cause;
310 }
从调用栈的打印来看,它会首先将直接导致崩溃的异常调用栈打印出来,之后会递归地将cause的异常调用栈打印出来(因为cause也可能有自己的cause)。
另外需要注意的是,IllegalStateException的调用栈最下方有"… 10 more"的字样。它表示的其实就是RuntimeException的调用栈(除去最后一帧)。因为异常在向上抛出的过程中被捕获,因此捕获位置往上的调用栈是不变的。我们把这10帧补齐,IllegalStateException的完整调用栈便如下所示。
Caused by: java.lang.IllegalStateException: telephony service is null.
at android.telephony.TelephonyManager.isEmergencyNumber(TelephonyManager.java:14136)
at com.android.server.location.injector.SystemEmergencyHelper$1.onReceive(SystemEmergencyHelper.java:70)
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app- LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1790)
at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at com.android.server.SystemServer.run(SystemServer.java:966)
at com.android.server.SystemServer.main(SystemServer.java:651)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:920)
以下是剥离了时间、pid、tid和tag后的输出。
FATAL EXCEPTION: main
PID: 3264
java.lang.NullPointerException: Attempt to invoke virtual method 'int java.lang.String.hashCode()' on a null object reference
at android.os.Parcel.createExceptionOrNull(Parcel.java:3017)
at android.os.Parcel.createException(Parcel.java:2995)
at android.os.Parcel.readException(Parcel.java:2978)
at android.os.Parcel.readException(Parcel.java:2920)
at android.app.IActivityManager$Stub$Proxy.attachApplication(IActivityManager.java:5148)
at android.app.ActivityThread.attach(ActivityThread.java:7644)
at android.app.ActivityThread.main(ActivityThread.java:7943)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)
Caused by: android.os.RemoteException: Remote stack trace:
at com.android.server.am.HostingRecord.getHostingTypeIdStatsd(HostingRecord.java:234)
at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:5102)
at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:5115)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2339)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2655)
这种调用栈中有Parcel.readException字样,且"Caused by"后面跟的是"Remote stack trace",它们通常是由Binder同步通信时对端进程中的异常所导致。对端进程发生的异常拆分为3个部分,序列化地发回给本进程:
这3部分信息在本进程中组合成了两个Exception对象。一个由code和msg构造,如下2978行所示,它是造成进程退出的直接原因;另一个由remoteStackTrace构造,如下2981行所示,它是造成进程退出的根本原因(2983行将它赋值给e的cause)。
2972 public final void readException(int code, String msg) {
2973 String remoteStackTrace = null;
2974 final int remoteStackPayloadSize = readInt();
2975 if (remoteStackPayloadSize > 0) {
2976 remoteStackTrace = readString();
2977 }
2978 Exception e = createException(code, msg);
2979 // Attach remote stack trace if availalble
2980 if (remoteStackTrace != null) {
2981 RemoteException cause = new RemoteException(
2982 "Remote stack trace:\n" + remoteStackTrace, null, false, false);
2983 ExceptionUtils.appendCause(e, cause);
2984 }
2985 SneakyThrow.sneakyThrow(e);
2986 }
回到上面这个例子,它真实的含义是:
本App进程希望通过attachApplication接口和system_server进程通信,但是system_server在处理这个请求时,发生了NullPointerException。System_server将这个异常发回给App进程,最终导致了App进程的退出。
其实我觉得现有的调用栈输出是有瑕疵的。它将原本属于同一个异常的msg和stackTrace拆分开来,会给开发者带来困扰。按照正确的理解,上面的调用栈显示为如下格式会更加清晰。
android.os.RemoteException: Binder transaction failed
at android.os.Parcel.createExceptionOrNull(Parcel.java:3017)
at android.os.Parcel.createException(Parcel.java:2995)
at android.os.Parcel.readException(Parcel.java:2978)
at android.os.Parcel.readException(Parcel.java:2920)
at android.app.IActivityManager$Stub$Proxy.attachApplication(IActivityManager.java:5148)
at android.app.ActivityThread.attach(ActivityThread.java:7644)
at android.app.ActivityThread.main(ActivityThread.java:7943)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)
Caused by: java.lang.NullPointerException in remote process: Attempt to invoke virtual method 'int java.lang.String.hashCode()' on a null object reference
at com.android.server.am.HostingRecord.getHostingTypeIdStatsd(HostingRecord.java:234)
at com.android.server.am.ActivityManagerService.attachApplicationLocked(ActivityManagerService.java:5102)
at com.android.server.am.ActivityManagerService.attachApplication(ActivityManagerService.java:5115)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2339)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2655)
不过需要注意,并非对端进程处理binder通信时发生的任何异常都可以传回,只有如下这9类异常可以。
Exception | Code |
---|---|
Parcelable Exceptions in BootClassLoader | EX_PARCELABLE |
SecurityException | EX_SECURITY |
BadParcelableException | EX_BAD_PARCELABLE |
IllegalArgumentException | EX_ILLEGAL_ARGUMENT |
NullPointerException | EX_NULL_POINTER |
IllegalStateException | EX_ILLEGAL_STATE |
NetworkOnMainThreadException | EX_NETWORK_MAIN_THREAD |
UnsupportedOperationException | EX_UNSUPPORTED_OPERATION |
ServiceSpecificException | EX_SERVICE_SPECIFIC |
当对端进程将异常传回后,对端进程恢复正常。仔细思考这样设计也是很合理的。作为Server进程,它在什么时候执行,该执行些什么都不由自己掌控,而是由Client进程发起。因此抛出异常本质上与Client进程相关,让一个Client进程的行为导致Server进程退出显然是不合理的。此外,Server进程可能关联着多个Client,不能由于一个Client的错误行为而影响本可以正常获取服务的其他Client。
除了上述9种异常以外,其余的异常将由对端进程的JavaBBinder::onTransact来处理,最终会通过LOGE将该异常输出。值得注意的是,异常中的Exception输出完后进程恢复,而Error则会导致进程退出。
410 jboolean res = env->CallBooleanMethod(mObject, gBinderOffsets.mExecTransact,
411 code, reinterpret_cast(&data), reinterpret_cast(reply), flags);
412
413 if (env->ExceptionCheck()) {
414 ScopedLocalRef excep(env, env->ExceptionOccurred());
415 binder_report_exception(env, excep.get(),
416 "*** Uncaught remote exception! "
417 "(Exceptions are not yet supported across processes.)");
418 res = JNI_FALSE;
419 }
*** Uncaught remote exception! (Exceptions are not yet supported across processes.)
java.lang.OutOfMemoryError: Failed to allocate a 280361534 byte allocation with 25165820 free bytes and 258MB until OOM, target footprint 291421224, growth limit 536870912
at java.util.Arrays.copyOf(Arrays.java:3136)
at java.util.Arrays.copyOf(Arrays.java:3106)
at java.util.ArrayList.grow(ArrayList.java:275)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:249)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:241)
at java.util.ArrayList.add(ArrayList.java:467)
at android.os.Parcel.readStringList(Parcel.java:3093)
at android.content.IntentFilter.(IntentFilter.java:2377)
at android.content.IntentFilter$1.createFromParcel(IntentFilter.java:2269)
at android.content.IntentFilter$1.createFromParcel(IntentFilter.java:2267)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2241)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2669)
at android.os.Binder.execTransactInternal(Binder.java:1221)
at android.os.Binder.execTransact(Binder.java:1163)
对于普通的异步通信,Client进程发送完后就不会再管了,所以Server端在收到通信后处理时发生的异常不会回传。最终所有的异常都会交由JavaBBinder::onTransact进行处理,处理的原则和上面一样:Exception输出完后进程恢复,Error则会导致进程退出。
不过有一类Binder异步通信的异常非常隐晦,如果不了解内部原理基本无法理解。示例如下。
FATAL EXCEPTION: Thread-3
Process: com.android.systemui, PID: 31695
java.lang.RuntimeException: Error receiving broadcast Intent { act=android.bluetooth.device.action.BOND_STATE_CHANGED flg=0x10 (has extras) } in com.android.bluetooth.BluetoothManager$BluetoothBroadcastReceiver@c5b7352
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0$android-app-LoadedApk$ReceiverDispatcher$Args(LoadedApk.java:1920)
at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:240)
at android.os.Looper.loop(Looper.java:351)
at android.os.HandlerThread.run(HandlerThread.java:67)
Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'android.os.Looper android.os.HandlerThread.getLooper()' on a null object reference
at com.android.bluetooth.a2dp.A2dpService.getOrCreateStateMachine(A2dpService.java:2101)
at com.android.bluetooth.a2dp.A2dpService.connect(A2dpService.java:515)
at com.android.bluetooth.btservice.AdapterService.connectEnabledProfiles(AdapterService.java:1548)
at com.android.bluetooth.btservice.AdapterService.connectAllEnabledProfiles(AdapterService.java:4962)
at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3076)
at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3057)
at android.bluetooth.IBluetooth$Stub.onTransact(IBluetooth.java:1750)
at android.os.Binder.execTransactInternal(Binder.java:1331)
at android.os.Binder.execTransact(Binder.java:1268)
其实同步通信除了通过Binder同步模式实现,还可以通过两个Binder异步通信实现。而这也正是上面调用栈形成的原因。
Systemui进程接收到广播后,会执行相应广播的onReceive方法。此次广播处理会尝试连接蓝牙。而异常发生的关键点,就在如下代码中。
final SynchronousResultReceiver recv = SynchronousResultReceiver.get();
service.connectAllEnabledProfiles(this, mAttributionSource, recv);
return recv.awaitResultNoInterrupt(getSyncTimeout()).getValue(defaultValue);
connectAllEnabledProfiles
是异步的Binder请求,它的通信对端是com.android.bluetooth进程。Systemui发送完connectAllEnabledProfiles的异步请求后会继续往下执行,但是awaitResultNoInterrupt会将线程挂起,等待对端进程的回复。对端进程的回复同样是一个异步通信,这样程序便通过SynchronousResultReceiver和两次异步通信,模仿了同步通信的过程。
com.android.bluetooth进程接收到异步请求后,会执行如下代码。如果程序没有异常,最终receiver.send会将返回值发回给systemui进程。但如果程序发生了RuntimeException,receiver.propagateException会将异常发回给systemui。
public void connectAllEnabledProfiles(BluetoothDevice device,
AttributionSource source, SynchronousResultReceiver receiver) {
try {
receiver.send(connectAllEnabledProfiles(device, source));
} catch (RuntimeException e) {
receiver.propagateException(e);
}
}
发回给systemui的异常最终会在哪里抛出呢?答案是上面代码的getValue
方法中。
public T getValue(T defaultValue) {
if (mException != null) {
throw mException;
}
if (mObject == null) {
return defaultValue;
}
return mObject;
}
至此我们可以知道,上述调用栈中的caused by部分(截取如下)其实是Binder异步通信后对端进程(com.android.bluetooth)发生的异常。而整个调用栈中没有任何的remote字样,所以非常容易让人误以为是systemui进程中发生的异常。大家以后碰到这种调用栈时,一定要小心。
Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'android.os.Looper android.os.HandlerThread.getLooper()' on a null object reference
at com.android.bluetooth.a2dp.A2dpService.getOrCreateStateMachine(A2dpService.java:2101)
at com.android.bluetooth.a2dp.A2dpService.connect(A2dpService.java:515)
at com.android.bluetooth.btservice.AdapterService.connectEnabledProfiles(AdapterService.java:1548)
at com.android.bluetooth.btservice.AdapterService.connectAllEnabledProfiles(AdapterService.java:4962)
at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3076)
at com.android.bluetooth.btservice.AdapterService$AdapterServiceBinder.connectAllEnabledProfiles(AdapterService.java:3057)
at android.bluetooth.IBluetooth$Stub.onTransact(IBluetooth.java:1750)
at android.os.Binder.execTransactInternal(Binder.java:1331)
at android.os.Binder.execTransact(Binder.java:1268)
本文属于一个很小的知识点。但再小的知识点,都有值得深挖的必要。只有一次次深入地挖凿,才能构筑起坚实的技术堡垒。