Skip to content

Firelog sdk is using JobScheduler which is not recommended now #6147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrober opened this issue Aug 1, 2024 · 34 comments
Closed

Firelog sdk is using JobScheduler which is not recommended now #6147

mrober opened this issue Aug 1, 2024 · 34 comments

Comments

@mrober
Copy link
Contributor

mrober commented Aug 1, 2024

From @ankit92:

com.google.android.datatransport/transport-runtime library is using JobScheduler which is not recommended now as mentioned in android documentation: https://developer.android.com/about/versions/14/behavior-changes-14

_JobScheduler reinforces callback and network behavior
Since its introduction, JobScheduler expects your app to return from onStartJob or onStopJob within a few seconds. Prior to Android 14, if a job runs too long, it stops and fails silently. If your app targets Android 14 (API level 34) or higher and exceeds the granted time on the main thread, the app triggers an ANR with the error message "No response to onStartJob" or "No response to onStopJob". Consider migrating to WorkManager, which provides support for asynchronous processing or migrating any heavy work into a background thread._

All ANR's which spiked in our app are on Android 14 , as mentioned in above chat threads, and mostly related JobScheduler.

Strangely 75% of events are on OnePlus devices and rest 25% are on OPPO devices

Here are the logs:

art::ConditionVariable::WaitHoldingLocks
ANR triggered by thread waiting for a binder transaction

main (native)
tid=1 systid=30996
Triggered ANR
Root blocking
0
libc.so
syscall + 28
1
libart.so
art::ConditionVariable::WaitHoldingLocks + 140
2
libart.so
artJniMethodEnd + 336
3
libart.so
art_jni_method_end + 12
android.os.BinderProxy.transactNative (Native method)
This Binder call may be taking too long, causing the main thread to wait, and triggering the ANR.
android.os.BinderProxy.transact (BinderProxy.java:685)
android.app.job.IJobCallback$Stub$Proxy.acknowledgeStartMessage (IJobCallback.java:434)
android.app.job.JobServiceEngine$JobHandler.ackStartMessage (JobServiceEngine.java:384)
android.app.job.JobServiceEngine$JobHandler.handleMessage (JobServiceEngine.java:196)
android.os.Handler.dispatchMessage (Handler.java:106)
android.os.Looper.loopOnce (Looper.java:257)
android.os.Looper.loop (Looper.java:368)
android.app.ActivityThread.main (ActivityThread.java:8839)
java.lang.reflect.Method.invoke (Native method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:572)
com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1049)

This is becoming more critical now and shaken all our app's performance metrics in terms of ANR's

Raised this to help you so that you can get more insights and request you to please provide us with the fix as soon as possible .
Thanks

Originally posted by @ankit92 in #4345 (comment)

@davidmotson
Copy link
Collaborator

This is being worked on along with other improvements for API level 34 (android 14), and it should be rolled out across all Firebase Android SDKs soon.

@stepushchik
Copy link

The same:

         main (native):tid=1 systid=20068 
#00 pc 0x99ccc libc.so (syscall + 28) (BuildId: a87e89fc2c0a5753053f536add4d7ae1)
#01 pc 0x232670 libart.so (art::ConditionVariable::WaitHoldingLocks + 140) (BuildId: 2452917c4ff69cbb6e75e5512260946b)
#02 pc 0x45a5e4 libart.so (artJniMethodEnd + 336) (BuildId: 2452917c4ff69cbb6e75e5512260946b)
#03 pc 0x5be67c libart.so (art_jni_method_end + 12) (BuildId: 2452917c4ff69cbb6e75e5512260946b)
       at android.os.BinderProxy.transactNative(Native method)
       at android.os.BinderProxy.transact(BinderProxy.java:685)
       at android.app.job.IJobCallback$Stub$Proxy.acknowledgeStartMessage(IJobCallback.java:434)
       at android.app.job.JobServiceEngine$JobHandler.ackStartMessage(JobServiceEngine.java:384)
       at android.app.job.JobServiceEngine$JobHandler.handleMessage(JobServiceEngine.java:196)
       at android.os.Handler.dispatchMessage(Handler.java:106)
       at android.os.Looper.loopOnce(Looper.java:257)
       at android.os.Looper.loop(Looper.java:368)
Screenshot 2024-08-20 at 13 50 31

@ankit92
Copy link

ankit92 commented Aug 21, 2024

This is being worked on along with other improvements for API level 34 (android 14), and it should be rolled out across all Firebase Android SDKs soon.

@davidmotson Is there any schedule or planned date for next Firebase Android BOM release which includes these fixes? These ANR's are becoming critical now for our app.

@ankit92
Copy link

ankit92 commented Aug 26, 2024

@davidmotson @mrober Does this issue got fixed in BOM version 33.2.0?

@mandeep1999
Copy link

Is there any timeline when the fix for this will be released?

@RelappsStudio
Copy link

RelappsStudio commented Sep 2, 2024

Same issue here:

art::ConditionVariable::WaitHoldingLocks
ANR triggered by thread waiting for a binder transaction

          main (native):tid=1 systid=17264 
#00 pc 0x99ccc libc.so (syscall + 28) (BuildId: a87e89fc2c0a5753053f536add4d7ae1)
#01 pc 0x23247c libart.so (art::ConditionVariable::WaitHoldingLocks + 140) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#02 pc 0x45b218 libart.so (artJniMethodEnd + 336) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#03 pc 0x5bef7c libart.so (art_jni_method_end + 12) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
       at android.os.BinderProxy.transactNative(Native method)
       at android.os.BinderProxy.transact(BinderProxy.java:685)
       at android.app.job.IJobCallback$Stub$Proxy.acknowledgeStartMessage(IJobCallback.java:434)
       at android.app.job.JobServiceEngine$JobHandler.ackStartMessage(JobServiceEngine.java:384)
       at android.app.job.JobServiceEngine$JobHandler.handleMessage(JobServiceEngine.java:196)
       at android.os.Handler.dispatchMessage(Handler.java:106)
       at android.os.Looper.loopOnce(Looper.java:257)
       at android.os.Looper.loop(Looper.java:368)
       at android.app.ActivityThread.main(ActivityThread.java:8815)
       at java.lang.reflect.Method.invoke(Native method)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:572)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1049)

Surprisingly only on android 14, only ColorOS supported devices
image

Any bypass suggestions before fix? Anyone managed to find some workaround?

@stepushchik
Copy link

Relates to
https://developer.android.com/jetpack/androidx/releases/work#2.10.0-alpha03

  • Fix foreground workers of type ‘short service’ and ‘data sync’ timing out and causing an ANR when WorkManager didn’t call stopSelf(). This fix only applies to devices with API 34 and 35 where foreground service types were introduced. (ca06b2, b/364508145)

@Krishnasony
Copy link

Relates to https://developer.android.com/jetpack/androidx/releases/work#2.10.0-alpha03

  • Fix foreground workers of type ‘short service’ and ‘data sync’ timing out and causing an ANR when WorkManager didn’t call stopSelf(). This fix only applies to devices with API 34 and 35 where foreground service types were introduced. (ca06b2, b/364508145)

Is this Fixed The background ANR coming on Android 14?

@davidmotson
Copy link
Collaborator

davidmotson commented Sep 11, 2024

@stepushchik @RelappsStudio @mandeep1999

Can you provide more details on the error you're experiencing? For example:

Are there other, similar ANRs being reported for your app? (same error, binder transaction, but a different stack trace)
Have you been able to reproduce these personally? If so, under what conditions?
Are you also seeing this mainly/exclusively on oppo/one+ devices?

@Gowthamr2130
Copy link

Hi @davidmotson Any Update on this: #6264

And let us know if there are any temporary solutions for this.

@Deepakkast
Copy link

any update on the above
Screenshot 2024-09-19 at 3 18 41 PM

@lehcar09
Copy link
Contributor

Hi folks! sharing this update here from the original issue (#4345).

From mrober:

Hey everyone,

Sorry for the delay. We have been working on this, and have landed large changes to the SDK.

Here is a beta version of Crashlytics SDK 19.2.0 which resolves several of the issues that have been causing ANRs. This version is different from 19.1.0 which will be released in the next day or so. 19.1.0 only includes some of these changes, but has gone through more validation.

If you want to try the 19.2.0 beta, follow these steps:

Download the artifact m2repository.zip and extract it to a local repo directory.

Add something like the following to your app's gradle build files:

// settings.gradle.kts
dependencyResolutionManagement {
  repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
  repositories {
    maven(uri("/path/to/local/repo/"))
    google()
    mavenCentral()
  }
}
// app/build.gradle.kts
dependencies {
  implementation("com.google.firebase:firebase-crashlytics:19.2.0-beta01")
  // ...
}

The CHANGELOG is:

* [feature] Added the `isCrashlyticsCollectionEnabled` API to check if Crashlytics collection is enabled.
* [fixed] Ensure that on-demand fatal events are never processed on the main thread.
* [fixed] Improved data consistency for rapid user actions.
* [changed] Internal changes to improve startup time.
* [changed] Internal changes to the way session IDs are generated.
* [changed] Internal changes to the way background tasks are scheduled.
* [changed] Migrated SDK to use standard Firebase executors.

If you don't want to try the beta, you can wait for the stable 19.1.0 release in the next few days, which includes some of these changes. The full 19.2.0 release is undergoing further internal validation.

@JonathanKranz
Copy link

WX20240924-101502@2x
WX20240924-101522@2x
WX20240924-101536@2x

@Deepakkast
Copy link

Any update on the above

@dhruvkaushal11
Copy link

Hi Team, we are facing this same issue, ie on Android 14 & numbers are in millions.
Can someone share the expected ETA for this?

@hakanbagci
Copy link

Hi Team,

This issue has become our top ANR and it is mostly happening for Android 14 Oppo Devices.

Screenshot 2024-10-04 at 09 41 52 Screenshot 2024-10-04 at 09 42 09

@mrober Could you please share an expected ETA about the fix if possible?

@wangjianpeng
Copy link

when ANR will be solved ?

@stepushchik
Copy link

Relates to https://developer.android.com/jetpack/androidx/releases/work#2.10.0-alpha03

  • Fix foreground workers of type ‘short service’ and ‘data sync’ timing out and causing an ANR when WorkManager didn’t call stopSelf(). This fix only applies to devices with API 34 and 35 where foreground service types were introduced. (ca06b2, b/364508145)

Is this Fixed The background ANR coming on Android 14?

Unfortunately, ANRs still present after updating to
androidx.work:work-runtime:2.10.0-alpha04
and
com.google.firebase:firebase-crashlytics:19.2.0

Please check in your project if you can.

@khaykov
Copy link

khaykov commented Oct 10, 2024

@stepushchik Fix for Android 14 specific ANR's is not a part of 19.2.0, according to this, so we have to wait.

@ashishkrishnan
Copy link

ashishkrishnan commented Oct 15, 2024

Folks do we have ETA for the fix? Will this be fixed on the Firebase libraries?

@davidmotson
Copy link
Collaborator

davidmotson commented Oct 16, 2024

After some investigation here are the results:

  1. Previous to Android 14, these app closures still occurred, but where not reported as ANRs, so while it may be worrying to see this ANR spike in Android 14, your user's behavior is not degraded.
  2. This ANR is also not caused by hitting the timeout behavior change in Android 14, Firelog's use of JobScheduler does very little on the main thread, and in a test we ran swapping to WorkManager didn't improve the situation as WorkManager is implemented on top of JobScheduler, however performance was heavier weight, so we decided to not ship that change.
  3. The vast majority (>99%) of these ANRs are background ANRs. It is important to note that background ANRs are not user visible. The user experience of a background ANR is that the app takes slightly longer to turn on the next time they launch it. Also, background ANRs are relatively common, because a backgrounded app will be running with very limited access to memory and CPU time.

@khaykov
Copy link

khaykov commented Oct 16, 2024

@davidmotson While the user experience might not be visibly affected, Google Play will penalize the apps with excessive crash rates, which this issue significantly contributes to. I believe it should not be closed, but fixed instead.

@davidmotson
Copy link
Collaborator

davidmotson commented Oct 16, 2024

@davidmotson While the user experience might not be visibly affected, Google Play will penalize the apps with excessive crash rates, which this issue significantly contributes to. I believe it should not be closed, but fixed instead.

According to this only user perceived ANRs should penalize your app.

@Deepakkast
Copy link

this should be fix otherwise it will create performance issue on play console

@hakanbagci
Copy link

hakanbagci commented Oct 18, 2024

@davidmotson While the user experience might not be visibly affected, Google Play will penalize the apps with excessive crash rates, which this issue significantly contributes to. I believe it should not be closed, but fixed instead.

According to this only user perceived ANRs should penalize your app.

Hi @davidmotson,

Unfortunately a significant number of this issue is user perceived ANR. Please check the screenshot from Google Play Console below. This screenshot clearly indicates that there are significant number of user perceived ANRs from this issue.

Having this issue happening before Android 14 but not being reported should not be a reason for closing the issue. I think our take should be the opposite. Since the issue is identified and has clear affect on the users, we should plan to fix it.

There are 14 down votes to the message for the decision of not fixing the issue. This clearly shows that there is a clear frustration from this ANR for several products/teams. For our 9 applications on Google Play with millions of users, this ANR is always in top 3 in Google Play user perceived ANRs.

Could you please reassess the issue and open it again?

Screenshot 2024-10-18 at 23 27 59

@ashishkrishnan
Copy link

ashishkrishnan commented Oct 19, 2024

It is important to note that background ANRs are not user visible. The user experience of a background ANR is that the app takes slightly longer to turn on the next time they launch it.

@davidmotson Finding this rationale very odd - given the experience of the users will get impacted. Cold start/warm starts will be high and is not a good experience. I am not just worried about the metric, penalty and indirect cost in loss of user base - I am worried about the experience it has on end users.

Open to brainstorm ideas for a fix. Could you give more details on the challenges?

@JonathanKranz
Copy link

Hi @davidmotson

I carefully read the information provided by @hakanbagci, I also encountered this problem, and it is TOP 2
Firebase's vision is to help developers make better applications, but now it seems that this is not the case; this problem has obviously affected the quality and exposure of the App. The Google Player market will punish these applications, but now it is so careless, which goes against the original intention of the Firebase team; if this is the case, developers have to choose other alternatives.
WX20241021-104553@2x

@urbeastfriend
Copy link

Same stuff

@dhruvkaushal11
Copy link

9.04K ANR and User Perceived.

image

@davidmotson
Copy link
Collaborator

davidmotson commented Oct 21, 2024

@hakanbagci
@JonathanKranz
@dhruvkaushal11
@urbeastfriend

This would definitely be new information. Can you provide more details?
I notice in your screenshot the commonality of this ANR is in ART, is there a significant presence of datatransport in reporting stacktraces as well?

@JonathanKranz
@hakanbagci
In your previous screenshots I see some references to JobScheduler, but nothing firebase related, is there more information you can provide?

@ankit92
Copy link

ankit92 commented Oct 22, 2024

Hi folks! sharing this update here from the original issue (#4345).

From mrober:

Hey everyone,
Sorry for the delay. We have been working on this, and have landed large changes to the SDK.
Here is a beta version of Crashlytics SDK 19.2.0 which resolves several of the issues that have been causing ANRs. This version is different from 19.1.0 which will be released in the next day or so. 19.1.0 only includes some of these changes, but has gone through more validation.
If you want to try the 19.2.0 beta, follow these steps:
Download the artifact m2repository.zip and extract it to a local repo directory.
Add something like the following to your app's gradle build files:

// settings.gradle.kts
dependencyResolutionManagement {
  repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
  repositories {
    maven(uri("/path/to/local/repo/"))
    google()
    mavenCentral()
  }
}
// app/build.gradle.kts
dependencies {
  implementation("com.google.firebase:firebase-crashlytics:19.2.0-beta01")
  // ...
}

The CHANGELOG is:

* [feature] Added the `isCrashlyticsCollectionEnabled` API to check if Crashlytics collection is enabled.
* [fixed] Ensure that on-demand fatal events are never processed on the main thread.
* [fixed] Improved data consistency for rapid user actions.
* [changed] Internal changes to improve startup time.
* [changed] Internal changes to the way session IDs are generated.
* [changed] Internal changes to the way background tasks are scheduled.
* [changed] Migrated SDK to use standard Firebase executors.

If you don't want to try the beta, you can wait for the stable 19.1.0 release in the next few days, which includes some of these changes. The full 19.2.0 release is undergoing further internal validation.

@davidmotson @lehcar09 @mrober , yesterday Firebase BOM - 33.5.0 is released that also includes crashlytics NDK version 19.2.1. Does this release contains fix for Android 14 ANR's as mentioned in above comment ?

@hakanbagci
Copy link

hakanbagci commented Oct 22, 2024

@hakanbagci @JonathanKranz @dhruvkaushal11 @urbeastfriend

This would definitely be new information. Can you provide more details? I notice in your screenshot the commonality of this ANR is in ART, is there a significant presence of datatransport in reporting stacktraces as well?

@JonathanKranz @hakanbagci In your previous screenshots I see some references to JobScheduler, but nothing firebase related, is there more information you can provide?

Hi @davidmotson,

After some more investigation I found out that the main thread is blocked because QueuedWork.processPendingWork is forcing SharedPrefencesImpl to process waiting preferences persistence task to be done in sync. Here is an example from Google Play stack trace for that kind of ANR.

Hypothesis:

Large data operations or complex object serialization within Shared Preferences can lead to significant processing time on the main thread, especially when apply() is used incorrectly or excessively. apply() asynchronously writes to disk, but QueuedWork.processPendingWork might force these pending writes to be completed synchronously (waiting them to complete with a lock) when activity is paused etc, leading to the observed ANRs.

Following this hypothesis maybe we can ask the question: Does Firelog or Firebase Crashlytics rely heavily on Shared Preferences for internal operations, such as caching log data or crash reports? Could their internal usage patterns be contributing to the ANR occurrences?

Screenshot 2024-10-22 at 14 12 21

@davidmotson
Copy link
Collaborator

@hakanbagci Thank you for you response. I don't think anything in this strack trace points to something in firebase being an issue. Firelog and Crashlytics don't use SharedPreferences in a heavyweight way. Our code is also open sourced, so if you want to use that to verify assumptions or otherwise assist your debugging process definitely feel free to do so. Regardless, it seems to have nothing to do with JobScheduler, so if you find more info on this please open another issue.

@ankit92
Please see this comment for an update.

@hakanbagci
Copy link

hakanbagci commented Oct 23, 2024

@hakanbagci Thank you for you response. I don't think anything in this strack trace points to something in firebase being an issue. Firelog and Crashlytics don't use SharedPreferences in a heavyweight way. Our code is also open sourced, so if you want to use that to verify assumptions or otherwise assist your debugging process definitely feel free to do so. Regardless, it seems to have nothing to do with JobScheduler, so if you find more info on this please open another issue.

@ankit92 Please see this comment for an update.

Opened a new issue for firebase performance. It looks like this issue (at least the issue that I shared the stack trace) is most probably related to trace function from firebase performance where each trace function call is causing shared preference to write to a file. This should work on IO in theory, but QueuedWork forces it to work on main thread in some scenarios like activity is stopped and there are shared preferences waiting to be committed. In our app, firebase performance was excessively calling shared preferences apply function (In 5 seconds almost ~450 times, our app start to interactive flow). Then the pending apply tasks where being executed in serial when activity is stopped. So the trace function itself is not performant and has a side affect on performance.

Please check if this is the case for you as well.
#6407

@mikelhm
Copy link

mikelhm commented Nov 8, 2024

this is perceptible ANR in GP,

there are two cases that use JobScheduler:

1)com.google.android.datatransport.runtime.scheduling.jobscheduling.JobInfoScheduler
2)com.google.firebase:firebase-analytics -> com.google.android.gms:play-services-measurement->zzks.class AppMeasurementJobService->JobScheduler"

Firebase should do something instead of close this issue @davidmotson

@firebase firebase locked and limited conversation to collaborators Nov 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests