Skip to content

Android ART Native Crash sometimes occurs when changing a threads name #2234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adecker89 opened this issue Sep 9, 2020 · 25 comments
Open
Assignees

Comments

@adecker89
Copy link


*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> com.- <<<

backtrace:
  #00  pc 00000000000830f0  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #00  pc 00000000004b977c  /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2280)
  #00  pc 000000000000b458  /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+580)
  #00  pc 00000000005108a0  /apex/com.android.runtime/lib64/libart.so (art::ThreadSuspendByPeerWarning(art::Thread*, android::base::LogSeverity, char const*, _jobject*)+656)
  #00  pc 000000000050fa80  /apex/com.android.runtime/lib64/libart.so (art::ThreadList::SuspendThreadByPeer(_jobject*, bool, art::SuspendReason, bool*)+1120)
  #00  pc 000000000042fa88  /apex/com.android.runtime/lib64/libart.so (art::Thread_setNativeName(_JNIEnv*, _jobject*, _jstring*)+1076)
  #00  pc 00000000000ba7e8  /system/framework/arm64/boot.oat (art_jni_trampoline+152)
  #00  pc 00000000001a5468  /system/framework/arm64/boot.oat (java.lang.Thread.setName+120)
  #00  pc 00000000002ff26c  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.setIndexInArray+284)
  #00  pc 0000000000137334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 0000000000145fec  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244)
  #00  pc 00000000002e3948  /apex/com.android.runtime/lib64/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+384)
  #00  pc 00000000002deba8  /apex/com.android.runtime/lib64/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+892)
  #00  pc 00000000005a728c  /apex/com.android.runtime/lib64/libart.so (MterpInvokeVirtualQuick+616)
  #00  pc 0000000000135594  /apex/com.android.runtime/lib64/libart.so (mterp_op_invoke_virtual_quick+20)
  #00  pc 00000000017d7a5e  /data/app/com.-==/oat/arm64/base.vdex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.tryTerminateWorker+174)
  #00  pc 00000000002b4c5c  /apex/com.android.runtime/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.llvm.4579747295037710581)+240)
  #00  pc 0000000000592688  /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+1032)
  #00  pc 0000000000140468  /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88)
  #00  pc 00000000002fcd94  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.park+180)
  #00  pc 00000000002fe314  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.tryPark+628)
  #00  pc 00000000002fd984  /data/app/com-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker+228)
  #00  pc 00000000002ff128  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run+40)
  #00  pc 0000000000137334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 0000000000145fec  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244)
  #00  pc 00000000004b103c  /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
  #00  pc 00000000004b2150  /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416)
  #00  pc 00000000004f2b18  /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1176)
  #00  pc 00000000000e6890  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #00  pc 0000000000084b6c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Looking at the the art source it seems like this timeout when trying to set the thread name may be the culprit
https://android.googlesource.com/platform/art/+/master/runtime/thread_list.cc#970

It's not really clear to me whether this timeout would be triggered based on a specific state of the Thread or if it's just a race condition that is sometimes triggered.

It seems like the thread name changing is mostly cosmetic? I wonder if an option to disable or reduce thread name changes might be a quick solution--as much as I would hate giving that up for debug purposes :(

@qwwdfsad
Copy link
Collaborator

Hi, this is clearly an Android bug that should be reported to Google bug tracker, we cannot do much on our side here.

I wonder if an option to disable or reduce thread name changes might be a quick solution

Yes, you can disable debug mode to workaround that problem. If this problem becomes too frequent and/or widespread, we'll consider adding a separate flag to disable thread renaming only.

@adecker89
Copy link
Author

I do agree that it's ultimately an android issue and plan to file a bug there as well. But, since such a fix would take quite some time to propagat through the ecosystem, I think that a solution on the library side is sorta needed just due to practicality.

I'll take a look at disabling debug mode and see if it resolves our problem (I've only seen this crash reported in the wild). Whether a separate flag is worth the effort may be a matter of how much pain we feel when debugging other production crash reports with the debug flag off.

Thanks for your help!

@qwwdfsad
Copy link
Collaborator

Thanks!

Please keep us informed about this problem. Let's leave it open for a while to see if there are other similar reports

@larssn
Copy link

larssn commented Dec 15, 2020

Dunno if related, I also get a hard crash, but only from an app deployed to the app store. If the app is built locally it works fine (both in debug/release mode). My code uses coroutines.

backtrace:
  #00  pc 00000000000603ca  /apex/com.android.runtime/lib/bionic/libc.so (abort+166)
  #00  pc 0000000000155cb7  /data/app/com.nuvopoint.tillty-Qdr1nI2XUQzzDA_hin4rlg==/split_config.armeabi_v7a.apk!lib/armeabi-v7a/libflutter.so (offset 0x1401000)
  #00  pc 00000000002d1c1f  /apex/com.android.runtime/lib/libart.so (art::JNI::SetBooleanArrayRegion(_JNIEnv*, _jbooleanArray*, int, int, unsigned char const*)+1650)

@dan15h
Copy link

dan15h commented Dec 15, 2020

I think it's same crash
backtrace:
#00 pc 0000000000083860 /apex/com.android.runtime/lib64/bionic/libc.so (abort+176)
#00 pc 00000000004b997c /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2172)
#00 pc 000000000000c650 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+608)
#00 pc 00000000004475e8 /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280)
#00 pc 000000000044eed8 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40)
#00 pc 000000000045a578 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+376)
#00 pc 00000000004c70a8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+104)
#00 pc 00000000004db2e4 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
#00 pc 00000000004da334 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252)
#00 pc 00000000000e5614 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
#00 pc 0000000000085458 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

@larssn
Copy link

larssn commented Dec 15, 2020

Mine is related to the R8 minifier (three shaking code it shouldnt), so I think my crash is unrelated to this.
EDIT: It was: some custom proguard rules fixed it.

@viviant1224
Copy link

Mine is related to the R8 minifier (three shaking code it shouldnt), so I think my crash is unrelated to this.
EDIT: It was: some custom proguard rules fixed it.

Did you solve the problem?How did you confirm that the problem was introduced by R8?

@nimeacuerdo
Copy link

nimeacuerdo commented Jan 11, 2021

I think it's same crash
backtrace:
#00 pc 0000000000083860 /apex/com.android.runtime/lib64/bionic/libc.so (abort+176)
#00 pc 00000000004b997c /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2172)
#00 pc 000000000000c650 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+608)
#00 pc 00000000004475e8 /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280)
#00 pc 000000000044eed8 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40)
#00 pc 000000000045a578 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+376)
#00 pc 00000000004c70a8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+104)
#00 pc 00000000004db2e4 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
#00 pc 00000000004da334 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252)
#00 pc 00000000000e5614 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
#00 pc 0000000000085458 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

We have the same stack trace, without using coroutines and only for Android 10 devices:

backtrace:
  #00  pc 000000000006f06c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #00  pc 00000000004d08d0  /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2512)
  #00  pc 000000000000c600  /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+684)
  #00  pc 000000000045c978  /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280)
  #00  pc 00000000004643ec  /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40)
  #00  pc 00000000004725c8  /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+376)
  #00  pc 00000000004ddd34  /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+104)
  #00  pc 00000000004f1c4c  /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
  #00  pc 00000000004f0ce0  /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252)
  #00  pc 00000000000ce1b0  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #00  pc 0000000000070ba8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

In fact, it was reported months ago here but according to this other issue comment, it looks like something that was solved for Android 11 onwards.

@Turalllb
Copy link

Exactly the same crash as nimeacuerdo. Android 10 - 99.8 % Android 11 - 0.2

@Aravindh1505
Copy link

I am also facing the same not able to identify the issue. It occurs in production.

@qwwdfsad
Copy link
Collaborator

Thanks for pointing it out, it seems it's worth adding a feature flag to disable thread name change.

As a workaround, it's possible to disable the debug mode at all in the production builds to avoid that crash

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Aug 4, 2022

One more harmful evidence about this feature enabled by default, the following code is purely dominated by setName calls:

@Test
  fun test() {
    (0..100).map {
      runBlocking(Dispatchers.Default) {
        val n = 100_000
        measureTime {
          val (s1: SendChannel<Int>, r1: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          val (s2: SendChannel<Int>, r2: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          val (s3: SendChannel<Int>, r3: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          launch {
            for (i in 0..n) {
              s1.send(i)
            }
          }.invokeOnCompletion {
            s1.close(it)
          }
          launch {
            r1.consumeEach {
              s2.send(it)
            }
          }.invokeOnCompletion {
            s2.close(it)
          }
          launch {
            r2.consumeEach {
              s3.send(it)
            }
          }.invokeOnCompletion {
            s3.close(it)
          }
          assertEquals(n, r3.consumeAsFlow().last())
        }
      }
    }.toList().let {
      print("TIMES: $it")
    }
  }
Flamegraph image

@NB-Fred
Copy link

NB-Fred commented Sep 2, 2022

Only occurs on android 10.

Check failed: key_value != nullptr compiler-filter not found in oat header:
#00 pc 0x00000000000830f0 /apex/com.android.runtime/lib64/bionic/libc.so (abort)
#00 pc 0x00000000004b8358 /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*))
#00 pc 0x000000000000b458 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage())
#00 pc 0x000000000044528c /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const)
#00 pc 0x000000000044cbb4 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const)
#00 pc 0x00000000004582b8 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&))
#00 pc 0x00000000004c5718 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&))
#00 pc 0x00000000004d9bb0 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit())
#00 pc 0x00000000004d8c5c /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*))
#00 pc 0x00000000000e6ce0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*))
#00 pc 0x0000000000084b6c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread)

@mtwalli
Copy link

mtwalli commented Oct 9, 2022

Thread suspension timed out: 0x<sanitized>:DefaultDispatcher-worker-4

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
backtrace:
  #00  pc 0x000000000008be20  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168)
  #01  pc 0x00000000006d29ec  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+704)
  #02  pc 0x0000000000016ea8  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_3::__invoke(char const*)+80)
  #03  pc 0x0000000000016450  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+352)
  #04  pc 0x00000000006fe104  /apex/com.android.art/lib64/libart.so (art::ThreadSuspendByPeerWarning(art::Thread*, android::base::LogSeverity, char const*, _jobject*) (.__uniq.215660552210357940630679712151551015321)+552)
  #05  pc 0x0000000000374048  /apex/com.android.art/lib64/libart.so (art::ThreadList::SuspendThreadByPeer(_jobject*, art::SuspendReason, bool*)+4016)
  #06  pc 0x000000000060a90c  /apex/com.android.art/lib64/libart.so (art::Thread_setNativeName(_JNIEnv*, _jobject*, _jstring*) (.__uniq.300150332875289415499171563183413458937)+492)
  #07  pc 0x00000000003ff1c0  /data/misc/apexdata/com.android.art/dalvik-cache/arm64/boot.oat (art_jni_trampoline+128)

Happens on many Android 12 and 12L devices

@qwwdfsad
Copy link
Collaborator

We'll provide either a workaround or will change the default behaviour in the next major version, thanks everyone for the reports!

@douglasjunior
Copy link

Thanks for pointing it out, it seems it's worth adding a feature flag to disable thread name change.

As a workaround, it's possible to disable the debug mode at all in the production builds to avoid that crash

@qwwdfsad how to disable this debug mode in prodution build?

@lirongwei
Copy link

@qwwdfsad is this fixed ?

@arenaq
Copy link

arenaq commented Mar 13, 2023

Any updates?
Happens only on Android 10, various devices, 96% occurences when app in background. Only visible via Google Play console (~2k occurences per day).

backtrace:
  #00  pc 0x00000000005bb9f0  /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+880)
  #01  pc 0x0000000000150468  /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88)
  #02  pc 0x00000000003d8d08  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor.runWorker+984)
  #03  pc 0x00000000003d5880  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor$Worker.run+64)
  #04  pc 0x0000000000217fc8  /system/framework/arm64/boot.oat (java.lang.Thread.run+72)
  #05  pc 0x0000000000147334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #06  pc 0x00000000001561b4  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+252)
  #07  pc 0x00000000004d8208  /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
  #08  pc 0x00000000004d929c  /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416)
  #09  pc 0x000000000051c470  /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1232)
  #10  pc 0x00000000000cf7c0  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #11  pc 0x00000000000721a8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Note: Also, this is not the only backtrace. Lines #02 and #03 are changing based on what thread is running inside - we use facebook sdk as well so a lot of times these 2 lines look like this (the rest remains the same):

  #02  pc 0x0000000000afd354  /data/app/obfuckated_package_name-9JniyMgIWMHr25wQXKTrVw==/oat/arm64/base.odex (com.facebook.ads.internal.dynamicloading.DynamicLoaderFactory.doCallInitialize+372)
  #03  pc 0x0000000000afd004  /data/app/obfuckated_package_name-9JniyMgIWMHr25wQXKTrVw==/oat/arm64/base.odex (com.facebook.ads.internal.dynamicloading.DynamicLoaderFactory$1.run+980)

@qwwdfsad qwwdfsad self-assigned this Mar 13, 2023
@arenaq
Copy link

arenaq commented Mar 13, 2023

Hi, this is clearly an Android bug that should be reported to Google bug tracker, we cannot do much on our side here.

I wonder if an option to disable or reduce thread name changes might be a quick solution

Yes, you can disable debug mode to workaround that problem. If this problem becomes too frequent and/or widespread, we'll consider adding a separate flag to disable thread renaming only.

Going trough the link provided. It seems like you use a ProGuard rules to remove some debug modules. Is there a way to disable the debug mode without using ProGuard?

@londhep
Copy link

londhep commented Aug 14, 2023

Hello @qwwdfsad, #2234 (comment) is this issue fixed?

@qwwdfsad qwwdfsad added the debug label Apr 9, 2024
@HOTFIGHTER
Copy link

is it solved

@huangShan-gitHub
Copy link

这个问题解决了吗?

@DFedonnikov
Copy link

is it solved

No, it is not

@DFedonnikov
Copy link

Thanks for pointing it out, it seems it's worth adding a feature flag to disable thread name change.

As a workaround, it's possible to disable the debug mode at all in the production builds to avoid that crash

How to disable it in production build?

@dkhalanskyjb
Copy link
Collaborator

Here's the instruction on how to enable or disable the debug mode: https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/topics/debugging.md#debug-mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests