Skip to content

NoSuchFieldException with AtomicReferenceFieldUpdater on Samsung Android 5.0.x devices #490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asfdfdfd opened this issue Aug 13, 2018 · 52 comments

Comments

@asfdfdfd
Copy link

asfdfdfd commented Aug 13, 2018

This is known error.

Some libraries have not done anything (google/gson#924) other ones have implemented workaround (ReactiveX/RxJava#3459).

Couple of stacktraces from my project:
com.mirrorai.app_issue_crash_5D5E4CFA0353000119B5FCFF34F93B11_DNE_0_v2.txt
com.mirrorai.app_issue_crash_5D61DBF600A40001134DD9BDD9BD8B29_DNE_0_v2.txt

I'm not convincing you to implement workaround like RxJava did but i think that at least it will be convenient to keep this issue to prevent questions from other users of coroutines library.

Proguard is turned off.

Coroutines version is 0.24.0.
Kotlin version is 1.2.60.

@elizarov
Copy link
Contributor

elizarov commented Aug 13, 2018

Let's discuss how we might approach working around this problem. We don't directly use Atomic*FieldUpdater classes. We rely on https://github.com/kotlin/kotlinx.atomicfu to "include" them via bytecode transformation. It is relatively straightforward for us to create a new transformation variant that changes atomic fields to a regular Atomic* classes. But how do we deliver the corresponding binaries? I see two options:

  1. Deliver a single multi-release jar that have "Samsung-friendly" classes (using only Atomic*) in the main code and VarHandle version in META-INF/version/9 for server-side folks running on JDK 9+.

Pro: Everything works for everybody by default.
Con: Anyone running on JDK 8 or anyone targeting non-affected Android >5.0 will get slightly worse performance than they have now(sic!), because of additional dereference in Atomic* classes versus optimized Atomic*FieldUpdater code that we have now.

  1. Deliver a single multi-release jar that have Atomic*FieldUpdater classes in the main code and VarHandle version in META-INF/version/9 and a separate "Samsung-fiendly" jar file with a different classifier.

Pro: By default if you depend on kotlinx-coroutines-core:<version> you get the best possible performance for your platform.
Con: If you target Android <= 5.0 with affected devices, you'll have to specify kotlinx-coroutines-core:<version>:samsung-workaround in your dependencies (might need a better name for a classifier).

What do you think?

@elizarov
Copy link
Contributor

elizarov commented Aug 13, 2018

Additional question: Anyone knows more details on the conditions under which Samsung devices crash with Atomic*FieldUpdater? What if we make those fields public, for example? Would it help? Are there any details on what exactly is broken on those Samsung Android 5.x devices?

@LouisCAD
Copy link
Contributor

LouisCAD commented Aug 13, 2018

It seems it happens on Samsung Android 5.0.x, not 5.x. If I'm not mistaken, the issue title should be edited accordingly.

Regardless, I think @digitalbuddha may have some additional information (or not) about this issue, he said he reached out to Samsung.

@digitalbuddha
Copy link
Contributor

I was never able to get any info from Samsung as to why this issue occurred. Once rxjava applied their fix we never saw issue again

@LouisCAD
Copy link
Contributor

I strongly dislike the idea of making kotlinx.coroutines performance worse on all devices and JDK < 9 projects just because Samsung f*****d up.

Android app bundles may, probably with Google's collaboration, support different compiled code for different android versions. If this can be done in some way, then apps published with app bundles may use a version free of problematic code for API 21 devices, and use the regular version for all other versions (API 20 and lower, and API 22 and up).

This would work with the second option @elizarov mentioned.

I know this is more complex than the RxJava approach, but it's less compromising.

Another solution is to compile a version of the app with the samsung workaround with a minSdk and a maxSdk of 21, and publish two other versions, one for maxSdk 19 (20 being Android Wear…), and one for minSdk 22. But here again, I think it should be automated using app bundles (which can probably be extended since they use protobuf internally), so we don't have to manage three different version codes.

Where I'm starting to wonder how a samsung api 21 workaround version would work though, is when it comes to libraries. I don't want to have to publish multiple versions of my android libraries that use kotlinx.coroutines because of Samsung, and I would even less want to do it on a non Android library. Would there be a way to avoid transitive dependencies hell when using the samsung workaround?

@fvasco
Copy link
Contributor

fvasco commented Aug 13, 2018

In the case 2, the kotlinx-coroutines-core:<version>:samsung-workaroundis the only viable option for an Android developer

@JakeWharton
Copy link
Contributor

JakeWharton commented Aug 13, 2018

I continue to vote for doing nothing. This is not a library's problem.

If someone wants to fix this, write an Android Gradle plugin transform that bytecode-rewrites these into the slower format such that you fix all libraries at once and can opt-in and out based on things like minSdkVersion, splits, and bundles.

@LouisCAD
Copy link
Contributor

@JakeWharton I think you're right. I just submitted an issue on Android's issue tracker for this fix to be integrated into Android Gradle plugin: https://issuetracker.google.com/issues/112526256

@JakeWharton
Copy link
Contributor

That's not quite what I said... AGP already has a pluggable bytecode transformation model so this doesn't need to (and likely won't) be implemented by Google. Anyone can do this who is willing. Samsung should do it, actually.

@LouisCAD
Copy link
Contributor

Google certified these devices, and Samsung didn't even respond to Mike when he was working for New York Times.

So "anyone" willing can do it (as long as you are fluent in bytcode among other things), yes.
However, I think at the moment, only Google can integrate such a solution targeting only certain devices with app bundles, so other devices (or other API levels at least) don't pay the performance hit.

@asfdfdfd asfdfdfd changed the title NoSuchFieldException with AtomicReferenceFieldUpdater on Samsung Android 5.x devices NoSuchFieldException with AtomicReferenceFieldUpdater on Samsung Android 5.0.x devices Aug 13, 2018
@JakeWharton
Copy link
Contributor

Transforms have access to variant information so again I disagree and hope the Android Gradle team doesn't do it. Variants have the API information which is the best you'll get in terms of targeting. It has nothing to do with app bundles other than that makes it easier to deal with the API splits.

@LouisCAD
Copy link
Contributor

hope the Android Gradle team doesn't do it

Why so?

@LouisCAD
Copy link
Contributor

How would you make such a bytecode transform targeting only API 21 Samsung devices work in practice without app bundles? I mean, without the burden of having to setup the build manually to produce multiple apks.

And if using an app bundle to separate the dex files having the fix for Samsung API 21 and the ones not needing it, how would it work in practice? I know zero documentation that explains this, and I don't even know if current app bundle spec supports it.

@JakeWharton
Copy link
Contributor

It doesn't, which is why I said it wasn't related to bundles. If the variant has a minSdk lower than 23 you fix the bug. That's it. All of this can be done with variants and transforms already. the AGP team has far more impactful work to do. This would have already been fixed in AGP years ago if the problem was pervasive enough. It isn't, and I'm actually helping your case by not fixing libraries which further proves the rarity of occurrence and justifies their ignorance of the problem.

@LouisCAD
Copy link
Contributor

It's minSdk lower than 22 actually, this issue affects only Android 5.0.x (API 21) Samsung devices. I added a comment to the issue on Android's issue tracker to request ability to split per API levels for bundle built apps, so anyone can make a gradle plugin fixing this, without making the performance worse for non affected API levels.

@elizarov
Copy link
Contributor

The problem is that it is hard (and in general impossible) to write a universal bytecode transformer that takes code using Atomic*FieldUpdater classes and rewrites it into Atomic* classes. However, it is not that hard for us as a part of atomicfu project, because we control what kind of code patterns are allowed around atomics, so we can define a new transformation variant and produce a separate version of the jar for those users of kotlinx.coroutines that need to target buggy Samsung 5.0.x devices. Doing nothing is not a great option for us, since we don't want to let our Android users down, so reading this feedback I'm inclining to go with option two.

We can even call the alternative classifier android and setup up dependencies in such a way that when you depend on kotlinx-coroutines-android:<version>, then it transitively brings kotlinx-coroutines-core:<version>:android with it.

Yes, yes, I know that Android did nothing wrong and it is all Samsung's fault, but I don't think that a bit of performance degradation due to replacement of Atomic*FieldUpdater with a separate Atomic* instances would be really noticeable in a typical Android coroutine usage scenarios.

@LouisCAD
Copy link
Contributor

We can even call the alternative classifier android and setup up dependencies in such a way that when you depend on kotlinx-coroutines-android:<version>, then it transitively brings kotlinx-coroutines-core:<version>:android with it.

I have trouble understanding that part.

Do you mean that we'll be able to choose whether we use the version with the fix or not (depending on whether we support the devices in the build), and it'll automatically edit the transitive dependencies of libraries relying on kotlinx.coroutines?

@elizarov
Copy link
Contributor

@LouisCAD Editing dependencies of other libraries is tricky. You'll have to manually exclude the dependency this library has and include the dependency you want. That is the downside of "alternative classifier" version. Maybe we can automate this part by writing a small custom Gradle plugin that does this replacement or maybe we can use Gradle metadata feature to do it. We'll need to study it better.

@LouisCAD
Copy link
Contributor

LouisCAD commented Aug 14, 2018 via email

@LouisCAD
Copy link
Contributor

So, I just got someone trying CameraCoroutines which uses kotlinx.coroutines 0.24.0 on a Samsung Galaxy S5 (SM-G900F) running Android 5.0, and it didn't crash with the error from this report, so it seems it doesn't affect all Samsung devices running Android 5.0.x.

Therefore, we need additional info to know which device models have been seen affected by this bug. @asfdfdfd and @digitalbuddha, could you give affected device models?

@elizarov
Copy link
Contributor

Same story here. We've found an in house Samsung Android 5.0.x device and we had not managed to make it crash so far.

@digitalbuddha
Copy link
Contributor

This was 3+ years ago. Only one I have reference to is SM-T805Y

@asfdfdfd
Copy link
Author

I have no device that i could use to reproduce this crash but i've received it via Crashlytics from these devices:
Galaxy S4
Galaxy Note3
Galaxy S5
Galaxy Alpha
Galaxy Tab4 10.0

@LouisCAD
Copy link
Contributor

@asfdfdfd You said you received it on Galaxy S5 devices, but we couldn't reproduce on one running Android 5.0 (model SM-G900F).
Since Samsung makes multiple variants of their devices, could you share the exact model numbers?

Also, if you have any clue about which calls to kotlinx.coroutines cause the issue, please tell us.

@LouisCAD
Copy link
Contributor

So, I got someone else test the same CameraCoroutines project using kotlinx.coroutines 0.24.0 on a Samsung Galaxy S4 (GT-I9505), running Android 5.0.1, and there was no issue on it, so I'm starting to wonder if all of kotlinx.coroutines is affected, or if only some part of it is, or if it's that Samsung was not consistent in its bug distribution.

If anyone has a clue about what call leads to the failure (whether on the Atomic*FieldUpdater or kotlinx.coroutines library), please tell so a reproducer can be setup.

@asfdfdfd
Copy link
Author

@LouisCAD

could you share the exact model numbers

Crashlytics has not any additional info about devices, but i was able to find something in Google Play Console:

Samsung Galaxy S4 (ks01ltelgt), Android 5.0
Samsung Galaxy S5 (klte), Android 5.0
Samsung Galaxy Note3 (ha3g), Android 5.0
Samsung Galaxy S4 (jflte), Android 5.0

if you have any clue about which calls to kotlinx.coroutines cause the issue

I have absolutely no idea. I've looked at all these places one more time and found nothing suspicious. One of crashes comes from something like:

class MyClass {
    init {
        launch { loadData() }
    }

   private fun loadData {
      // Non-coroutine code here.
   }
}

@Hazer
Copy link

Hazer commented Aug 19, 2018

I do really think build variants are the way to go as JakeWharton said. There are other issues related to Samsung that I am already using build variants to fix. It's so common in Android to have per API variants, that it is in the docs.

It may be solved using some plugin to do transformation or maybe just multiple jars, excluding transitive dependency with Gradle and choosing one jar for each variant, but the thing is, Android already has tools to deal with it.

App bundles don't look like it has anything to do with transforming and compiling itself, also it's quite new and most people I know don't use it yet.

Also, anyone has any statistics on, probably, how many people is affected right now? Maybe, I hope, it isn't worth the effort anymore.

@Tolriq
Copy link

Tolriq commented Dec 4, 2018

Bumping this to see if there's still something in the work?

This is now my first crash sources it's pretty rare but got dozens of variations of crashes on those devices.

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Dec 4, 2018

There is nothing in the work.
We don't mind to workaround this, if it is possible, but we need a stable repro and we failed to find a device which crashes with AFU.

Note that we need a reproducer in order to make workaround with AtomicReferenceFieldUpdater (e.g. by making target fields public), we are not going to get rid of A*FU

@Tolriq
Copy link

Tolriq commented Dec 4, 2018

@qwwdfsad I can't reproduce and the issue is certainly a very rare one got about 50 crashes in 10 different stack traces on a week timeline. Can't give all numbers but I'm still at more than 99,99% crash free sessions.

I have a tons of Samsung users but unfortunately I do not have the way to have cross linked stats about phone model and OS version :( And most of those devices also runs 4.x and 5.1. But I'm pretty confident that this is not a 100% repro thing.

There's many different crashes in many different places. Don't know if gather all the crashes could help.

I still min SDK 16 as a large user base with old phones as remotes, so can't move to min 22, but I totally can have a special build for SDK 16 -> 21 that is slower via an external library, and a nominal one for API 22.

Don't know if can be related but see my comment here: #830 (comment) despite using both workaround there was still some race, with hope the issues are somehow related.

@asfdfdfd
Copy link
Author

asfdfdfd commented Dec 4, 2018

@LouisCAD i was not able to reproduce this issue in Samsung Remote Test Lab. Probably i'll have to buy Samsung device. 🤔

@LouisCAD
Copy link
Contributor

LouisCAD commented Dec 4, 2018 via email

@asfdfdfd
Copy link
Author

asfdfdfd commented Dec 5, 2018

Yeah, but i have more strange Samsung bugs (not coroutines related) so it will be useful anyway.

@Tolriq
Copy link

Tolriq commented Jan 25, 2019

@elizarov Is there something special about coroutine Mutex and atomicFu?

I have introduced a Mutex as a temporary solution and only use simples mutex.lock() mutex.unlock().

On some samsung devices Android 5 it seems that this does not trigger the normal coroutine random crash we know about but just wait for ever on the first lock. Is there a way to have the crash happen?

@Tolriq
Copy link

Tolriq commented Jan 26, 2019

So after a few more hours in prod, Mutex don't work on some (or all hard to tell) Samsung Android 5 at 100%, not random like this issue, 100% reproductible on those devices (that unfortunately I don't own).

@qwwdfsad
Copy link
Collaborator

There is nothing special about Mutex and AFU; Mutex implementation uses AFU under the hood

@fondesa
Copy link

fondesa commented Mar 28, 2019

We have the same issue on around 150/200 of our users on some Samsung devices in the latest week.
This decreases a bit our QoS.
The ratio between users and crash is 1:~1

We can't reproduce the problem locally and it happens when the first class which references AFU is instantiated (in our case a Channel).

We'd like to continue to use the coroutines features in our code but we'd like to find a solution to this problem.
Considering Samsung probably won't ever release a fix for these Atomic*FieldUpdater, have you planned to provide a separate dependency which replaces Atomic*FieldUpdater with Atomic*?

If not, do you think it's a viable solution to use DexClassLoader to load the Atomic*FieldUpdater classes written by us with the same methods if the Atomic*FieldUpdater classes are not found when the app starts?
All the calls to java.misc.Unsafe inside Atomic*FieldUpdater classes can be done through reflection.

@elizarov
Copy link
Contributor

@fondesa We have not find time yet to implement Atomic*FieldUpdater -> Atomic* transformation in our atomicfu project: https://github.com/Kotlin/kotlinx.atomicfu

@vizsatiz
Copy link

Has anyone found any workarounds or patches that I can apply?

@mikezliu
Copy link

Workaround - use Rx. Lol.

As a side note - it does not seem a 100% crash rate for affected users (seems to just happen once time per user), so you could just ignore the issue.

@fondesa
Copy link

fondesa commented Aug 28, 2019

I confirm it happens only once per user.

@asfdfdfd
Copy link
Author

asfdfdfd commented Aug 28, 2019

Today i've noticed Samsung users with Android 7 that has java.lang.NoSuchMethodError

com.mirrorai.app_issue_crash_5D64FE1501DE000129234B4EA7EF93F5_DNE_0_v2.txt

@LouisCAD
Copy link
Contributor

@asfdfdfd Samsung Galaxy A3 2016 on Android 7 more specifically. Do you have this device at hand?

Here's what I can tell from my users:
Doesn't happen on A5 2016 on Android 7.
Doesn't happen on A3 2017 on Android 8.

@asfdfdfd
Copy link
Author

@LouisCAD

Do you have this device at hand?

No.

@MichaelRocks
Copy link

We built a version of kotlinx.coroutines that doesn't rely on AFU by disabling bytecode transformation in atomicfu plugin. And then we replaced all coroutines dependencies with non-AFU ones using the following code:

configurations.all {
  resolutionStrategy {
    dependencySubstitution {
      all { DependencySubstitution dependency ->
        if (dependency.requested instanceof ModuleComponentSelector) {
          if (dependency.requested.group == 'org.jetbrains.kotlinx' && dependency.requested.module.startsWith('kotlinx-coroutines-')) {
            dependency.useTarget "$dependency.requested.group:$dependency.requested.module:$kotlinxCoroutinesVersion-noafu"
          }
        }
      }
    }
  }

The artifacts we built aren't published anywhere but you can easily build them yourself.

@digitalbuddha
Copy link
Contributor

And its happened to me now (2nd time, first with rxjava 3 years ago). Dropbox is crashing when using ConflatedBroadcastChannel on samsung 5.0, 5.01, 5.02 devices. Would appreciate a fix (pretty please)

@jiurchuk
Copy link

Any updates/workaround on this?

ParaskP7 added a commit to wordpress-mobile/WordPress-Android that referenced this issue Sep 27, 2022
Warning Messages: "@synchronized annotation is not applicable to suspend
functions and lambdas"

It turns out that this '@synchronized' solution were already replacing
the previous (d5b9948) 'Mutex'
solution, and this, due to the fact that the existing (back then)
'Mutex' solution were causing crashes on Samsung devices running
Android 5 (API Level 21). However, this repo, and thus both, the
WordPress and Jetpack apps, are no longer supporting these devices as
the current 'minSdkVersion' is '24' (Android 7). Thus, this
'@synchronized' annotation can be now safely removed and the
'Mutex' solution reverting back.

For more info see:
- Fix crash in UploadStarter on Samsung devices with Android 5 #10877
(Introduced In): https://github.com/wordpress-mobile/WordPress-Android/
pull/10877/
- ExceptionInInitializerError #10827 (Crash Report):
#10827
- NoSuchFieldException with AtomicReferenceFieldUpdater on Samsung
Android 5.0.x devices #490 (Coroutines Issue):
Kotlin/kotlinx.coroutines#490
@cpeterso
Copy link

I haven't reproduced this crash myself, but Mozilla receives hundreds of crash reports daily from Firefox Android users for variations of this NoSuchFieldException crash on Samsung devices running Android 5.0.

An example crash report: https://crash-stats.mozilla.org/report/index/84c74d64-d667-4f96-9c08-dacb30231228

Bug reports:

https://bugzilla.mozilla.org/show_bug.cgi?id=1804115
https://bugzilla.mozilla.org/show_bug.cgi?id=1844964
https://bugzilla.mozilla.org/show_bug.cgi?id=1851704
https://bugzilla.mozilla.org/show_bug.cgi?id=1872192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests