-
Notifications
You must be signed in to change notification settings - Fork 617
Implemented exponential backoff and max retry with resumable uploads #4087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage Report 1Affected Products
Test Logs |
Size Report 1Affected Products
Test Logs |
// With the 5 second keepalive, after 5 seconds, the thread will get killed and eventually a new | ||
// one will be created. | ||
// Therefore causing many of the tests to fail | ||
StorageTaskScheduler.setCallbackQueueKeepAlive(90, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add additional context:
We call attachListeners
https://github.com/firebase/firebase-android-sdk/blob/ed10eb5ebe998dc23ead46419a2b4dcbb5e2482c/firebase-storage/src/testUtil/java/com/google/firebase/storage/TestUploadHelper.java#L49 to set up the listeners and then those listeners call ControllableSchedulerHelper
to verify the callback thread that it initially was called with is the same one that is called in subsequent events
Line 63 in ed10eb5
public void verifyCallbackThread() { |
However, in cases where we have long backoffs, this won't be the case, as the thread pool will terminate any threads that are not executed within the keepalive time, which is 5 seconds.
Our alternative is to set the keepAlive, or disable the threads from being killed. A third option would be to remove this verification check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why this check is necessary? I'm not familiar with the tradeoffs of keeping threads around for an extended period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in case the wrong executor gets called by UploadTask
.
Here's a quick StackOverflow answer that talks about the tradeoffs:
https://stackoverflow.com/a/18225863
firebase-storage/src/main/java/com/google/firebase/storage/UploadTask.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, overall looks good!
/*package*/ static Sleeper sleeper = new SleeperImpl(); | ||
/*package*/ static Clock clock = DefaultClock.getInstance(); | ||
private int sleepTime = | ||
0; // TODO(mtewani): Make it so that the send is 0,1,2,4,8,... and start at 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/*package*/ static Clock clock = DefaultClock.getInstance(); | ||
private int sleepTime = | ||
0; // TODO(mtewani): Make it so that the send is 0,1,2,4,8,... and start at 0 | ||
private final int sleepInterval = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is essentially a minimum sleep time for retries right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider renaming to minimumSleepInterval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
firebase-storage/src/main/java/com/google/firebase/storage/UploadTask.java
Show resolved
Hide resolved
@@ -32,6 +32,12 @@ | |||
public class TestUtil { | |||
|
|||
static FirebaseApp createApp() { | |||
// Many tests require you to call the callback on the same thread that was initially | |||
// instantiated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment formatting
firebase-storage/src/test/java/com/google/firebase/storage/TestUtil.java
Show resolved
Hide resolved
@@ -130,7 +160,7 @@ public void cantUploadToRoot() throws Exception { | |||
}); | |||
|
|||
// TODO(mrschmidt): Lower the timeout | |||
TestUtil.await(task, 300, TimeUnit.SECONDS); | |||
TestUtil.await(task, 7, TimeUnit.MINUTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems at odds with the line comment above😅 why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up changing this to 1 minute. The 7 was a precaution :-)
} | ||
return false; | ||
sleepTime = Math.max(sleepTime * 2, sleepTime + (minimumSleepInterval * 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want
Math.max(sleepTime * 2, minimumSleepInterval)
The way this is written, the first sleep time == 2000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. Nice catch! Will change
* Deflake firebase_common HeartBeat tests. (#4083) The tests relied on `TestOnCompleteListener` that was not safe to call more than once since it was based on a count down latch. So reusing it multiple times would cause await() to return immediately. This change makes it so that a new latch is created for every await() call, making all await() calls work. Fixes: http://b/245956774 * Add Javadoc support to the DackkaPlugin (#4082) * Add util method for copying directories * Add javadoc support to our dackka plugin * Remove the extension check on fromDirectory * Add a note about cache compliance and the javadoc task * Add reference to kotlin stdlib package list (#4093) * Add appcheck-ktx to bom config (#4081) * Removing getRunningAppProcesses since the process_name isn't used (#4057) * Fix Documentation classpath (#4099) b/241795594 * Add projectSpecificSources back to the DackkaPlugin (#4110) * Added extra method for TaskProviders * Added specificSources method back * Revert to dependsOn for docstubs dep * Revamp test harness for macrobenchmark tests (#4071) * Fix dependabot security alerts (#4123) * Make firesite transform cacheable (#4124) * add coroutines-play-services as a transitive dep to firebase-common-ktx (#4044) * add kotlinx-coroutines-play-services as a transitive dep to firebase-common-ktx * Update to Coroutines 1.6.4 * database-ktx: add callbackFlow for eventlisteners (#4012) * add callbackFlow for RTDB's ValueEventListener * add callbackFlow for RTDB's ChildEventListener * delegate trySendBlocking to DefaultRunLoop * add group to ktx.gradle * update api.txt file * Update released versions (#4135) * Upgrade dackkaConfig (#4141) * Add names to all Firebase components (#4117) * Add appcheck's ktx artifact back to package list file (#4142) * Add strict mode testing in firebase-messaging (#4095) * Add gralde property to instrument Fireperf E2E test (#4144) The perf gradle PR is #334 in the gradle repo. b/246802885 * Resolve StrictMode violation in App Check. (#4085) * Resolve StrictMode violation in App Check. * Attempt to fix some tests. * Fix unit tests. * Make `retrieveStoredAppCheckTokenInBackground` private instead of package-private. * Move listener invocations back to the main thread while keeping disk write on background thread. * Refactor to use lambda syntax. * Implemented exponential backoff and max retry with resumable uploads (#4087) * storage-ktx: add callbackFlow for upload/download progress (#4139) * add kotlin flows to storage * update api.txt file * add group to storage/ktx.gradle * Make a best effort attempt to flush reports at crash time (#4112) This should allow us to upload reports for start-up crashes. * Public Count (#4130) * Public Count * Disable prod testing * Long to long * Api.txt * Backfill changelog * Add PR * Fix assertEquals error * Re-write API javadocs for COUNT API (#4143) Co-authored-by: Denver Coneybeare <[email protected]> * [Fireperf][AASA] send `_experiment_app_start_ttid` trace, controlled by RC flag (#4114) * log _experiment_as_ttid * send event and RRC mitigation * add RC wip * modified save to cache when RC fetches * dev-app manifest override * unit test for RCc cache saving * better name and comments * better formatting remoteconfigmanagertest * better comments and added local RC lookup back * Specify unique ref tags in Dackka output (#4149) * Add util methods for gradle projects * Disable Javadoc generation on empty projects * Fix ref path generation in Dackka output * Add documentation for util methods * Update the DackkaPlugin docs * Reduced path to relative from tenant * Reduced ref head path even more * Fixed ref tag path to working solution * Disabled publishJavadoc by default for tests It should be enabled explicitly when being tested anyhow, and causes issues otherwise. Instead of disabling it for the tests that don't need it- this is much quicker and easier to manager. * Add strict mode tests to inappmessaging and inappmessaging-display (#4136) * Fix strict mode violations for appcheck (#4148) * Fix strict mode violations for appcheck * Formatting * Add copyright header * Populate SDKs changelog files (#4070) * first try seeding changelogs * Added unreleased section to CHANGELOG * Fix empty lines between sections * Add missing entries for abt * Update data to include latest releases * Update CHANGELOG.md * Add missing line in unreleased section for perf. * Enable CHANGELOG check globally (#4084) * Enable CHANGELOG check globally * Simplify conditional. * Enable COUNT integration tests, now that backend support has rolled out (#4163) * Remove separation of kotlin directories in dackka (#4166) * Deprecate App Check SafetyNet SDK (#4187) * Add `@Deprecated` annotations to Firebase App Check SafetyNet SDK public API. * Add `@deprecated` tag in the Javadoc as well. * Remove stale entries from Unreleased section. (#4185) * Assign ConfigContainer Builder return values. (#4194) * update bom (#4155) * update bom * update * update * feat(perf-ktx): add trace(name, block) extension function (#4180) * Remove smoke test for app indexing (#4219) App Indexing is deprecated starting BoM 31.0.0 . https://firebase.google.com/support/release-notes/android#bom_v31-0-0 * Bump Robolectric to 4.9 (#4161) * Add plexus-utils for firebase-database tests Looks like firebase-database tests use plexus-utils dependency of Robolectric directly. But this dependency was removed by Robolectric. So this CL adds plexus-utils explictly for firebase-database tests. Signed-off-by: utzcoz <[email protected]> * Bump Robolectric to 4.9 1. Use legacy LooperMode for tests explicitly, because recent Robolectric releases switch to use PAUSED mode default. Before these tests migrate to PAUSED mode, they use LEGACY mode to pass tests. 2. Migrate Assert.assertThat to Truth.assertThat to avoid using removed APIs. 3. All build.gradle use the same robolectricVersion except transport-backend-cct because Robolectric 4.8+ has compatibility problem for TelephonyManager with low compile/targetSdkversion. To keep httpclient compatibility, transport-backend-ccts continues to use Robolectric 4.3.1. 4. Remove unused exclude protobuf-java from Robolectric. 5. Add necessary protobuf-lite dependency on classpath for some ktx modules' tests. Signed-off-by: utzcoz <[email protected]> Signed-off-by: utzcoz <[email protected]> * return exception if modelname is empty (#4226) * Add "create release PR" github action (#4236) This implementation: - Creates the base branch (name is based in user input) - Creates the release branch (name is based in user input) - Creates the release.cfg file in the release branch without adding any SDK (module) to it. It can create the branches based on any existing branch of the repo. * Sync spec tests from web SDK to Android SDK (#4230) * Update versions (#4238) * Update versions * Exclude .github dir from `firebaseContinuousIntegration` paths (#4239) * Performing IN expansion (#4221) * WIP: `in` expansion. * Add composite filter in-expansion test. * Fix formatting. * Run in-expansion as part of DNF computation and add tests. * Add test with nested IN filters with CSI. * Add tests for other cases. * typo fix (#4237) * Firestore: Add test that verifies count query error message when missing index (#4232) * refactor(functions): update firebase-iid to 21.1.0 (#4225) * refactor(functions): update firebase-iid to 21.1.0 * Update CHANGELOG.md * Update CHANGELOG.md * bump firebase-iid-interop to 17.1.0 * exclude firebase-components from firebase-iid dependency Signed-off-by: utzcoz <[email protected]> Co-authored-by: Vladimir Kryachko <[email protected]> Co-authored-by: Daymon <[email protected]> Co-authored-by: Raymond Lam <[email protected]> Co-authored-by: Yifan Yang <[email protected]> Co-authored-by: Rosário Pereira Fernandes <[email protected]> Co-authored-by: emilypgoogle <[email protected]> Co-authored-by: Jeremy Jiang <[email protected]> Co-authored-by: Rosalyn Tan <[email protected]> Co-authored-by: Maneesh Tewani <[email protected]> Co-authored-by: Matthew Robertson <[email protected]> Co-authored-by: wu-hui <[email protected]> Co-authored-by: Denver Coneybeare <[email protected]> Co-authored-by: Leo Zhan <[email protected]> Co-authored-by: Rodrigo Lazo <[email protected]> Co-authored-by: Dana Silver <[email protected]> Co-authored-by: Vinay Guthal <[email protected]> Co-authored-by: utzcoz <[email protected]> Co-authored-by: argzdev <[email protected]> Co-authored-by: Mila <[email protected]> Co-authored-by: Ehsan <[email protected]> Co-authored-by: cherylEnkidu <[email protected]>
We have exponential backoff implemented, but not for
uploadChunk
. IfuploadChunk
fails, we check if we are in a recoverable state, and then request the status of the upload, and then retry the upload immediately if the request was successful.Instead of immediately retrying the upload, we should wait until the appropriate number of seconds, and then retry again. And we should only do so a max number of times. This PR implements both - exponential backoff and max retries.