-
Notifications
You must be signed in to change notification settings - Fork 616
Fix Android Connectivity Monitor (v2) #1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dk into bc/reconnect
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some (mostly high-level) feedback / questions...
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/util/AsyncQueue.java
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/AbstractStream.java
Outdated
Show resolved
Hide resolved
@@ -174,6 +174,9 @@ public void run() { | |||
/** The time a stream stays open after it is marked idle. */ | |||
private static final long IDLE_TIMEOUT_MS = TimeUnit.MINUTES.toMillis(1); | |||
|
|||
/** Maximum backoff time when reconnecting. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"when reconnecting" doesn't really do anything to distinguish this from the existing BACKOFF_MAX_DELAY_MS which is also used when reconnecting.
I think the intention is that this max backoff delay is used when we're 100% sure that the connection failed on the client-side and didn't even reach the server. In practice, I think the only case we can be sure of that is DNS failures (see my other comment about ConnectException). If that's the case, then maybe we can call this DNS_FAILURE_BACKOFF_MAX_DELAY_MS or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about CLIENT_NETWORK_FAILURE_BACKOFF_MAX_DELAY_MS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that BACKOFF should maybe be first (to group this with the other backoff-related constants). So: BACKOFF_CLIENT_NETWORK_FAILURE_MAX_DELAY_MS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW- I don't feel strongly. Just trying to figure out how to make the name parse more sensibly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Michael's suggestion.
Did another pass through, moved logic exclusively to gRPC side and removed the markChannelIdle() code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, I think I'm tentatively a fan!
I'd recommend doing a pass over my feedback (if there's anything that seems time consuming, you can defer for now) and then run this by Gil to see what he thinks generally. I don't 100% know if he'll be onboard with recreating the channel. If we can get timing information on that to prove to ourselves that it isn't too expensive, that would be helpful.
We might finally want to run it by the gRPC team to see if we can get their blessing on this approach (or maybe they can suggest something better). If nothing else, it may be useful feedback to them regarding what our needs are, to better inform their solution for grpc/grpc-java#1943.
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/util/AsyncQueue.java
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/OnlineStateTracker.java
Outdated
Show resolved
Hide resolved
|
||
private void initChannelTask() { | ||
// We execute network initialization on a separate thread to not block operations that depend on | ||
// the AsyncQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential concern with recreating the channel is it could be expensive. We're already doing it on a background thread for performance reasons, so it's possible this adds some meaningful delay.
As part of your logging, I'd recommend logging the start/finish of this so that we can get timing data and see how long this takes each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's taking around 15-40ms to reset the channel from what I've seen based on the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. That is longer than I hoped. I don't know if it's a problem or not. I suggest getting input from Gil and gRPC folks (and perhaps point out this delay to gRPC and see if they have other suggestions for implementing a connection timeout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we're spending an extra 40 ms to bring the reaction time on failed reconnects down from two minutes to 15 seconds. While 40 ms isn't exactly cheap it creates a big enough win that it seems worthwhile.
Also note that this only kicks in when Android's own network transition logic isn't kicking in. That we're not being inundated with requests for this feature suggests that it's going to be fairly rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
private void initChannelTask() { | ||
// We execute network initialization on a separate thread to not block operations that depend on | ||
// the AsyncQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we're spending an extra 40 ms to bring the reaction time on failed reconnects down from two minutes to 15 seconds. While 40 ms isn't exactly cheap it creates a big enough win that it seems worthwhile.
Also note that this only kicks in when Android's own network transition logic isn't kicking in. That we're not being inundated with requests for this feature suggests that it's going to be fairly rare.
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/AbstractStream.java
Outdated
Show resolved
Hide resolved
@@ -174,6 +174,9 @@ public void run() { | |||
/** The time a stream stays open after it is marked idle. */ | |||
private static final long IDLE_TIMEOUT_MS = TimeUnit.MINUTES.toMillis(1); | |||
|
|||
/** Maximum backoff time when reconnecting. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about CLIENT_NETWORK_FAILURE_BACKOFF_MAX_DELAY_MS
?
@@ -53,6 +55,11 @@ | |||
private static final String X_GOOG_API_CLIENT_VALUE = | |||
"gl-java/ fire/" + BuildConfig.VERSION_NAME + " grpc/"; | |||
|
|||
// This timeout is used when attempting to establish a connection. If a connection attempt | |||
// does not succeed in time, we close the stream and restart the connection, rather than having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer callback does not seem to "close the stream" in any way that I can tell. Based on comments in markChannelIdle, the first stream continues? Does this actually close the stream, or is this a side effect?
From what I can tell, the connectivityAttemptTimer merely starts a new clientCall (through a ~recursive invocation of runBidiStreamingRpc). What prevents the observer from getting callbacks from the old stream once that fails ~2 minutes later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're looking at an old version of the PR? 😬 This code is gone in the latest version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the outdated version. The new one can be found here.
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/OnlineStateTracker.java
Outdated
Show resolved
Hide resolved
@@ -151,6 +161,9 @@ public void backoffAndRun(Runnable task) { | |||
} else if (currentBaseMs > maxDelayMs) { | |||
currentBaseMs = maxDelayMs; | |||
} | |||
|
|||
// Reset max delay to the default. | |||
maxDelayMs = DEFAULT_BACKOFF_MAX_DELAY_MS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignored the comments I made on the last version of this PR.
The max delay was configured in the constructor and may not actually be DEFAULT_BACKOFF_MAX_DELAY_MS
. You can't read this constant here.
Instead, save two max delay fields in the constructor. One should be final, as maxDelayMs is today and it should be used here as the value to which to reset. The second value should be the one you actually bounce around based on which kind of error is in effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (attempted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a final suggestion.
|
||
/** | ||
* The maximum backoff time used when calculating the next backoff. This value can be changed for | ||
* a single backoffAndRun call, after which it resets to maxDelayMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one remaining concern.
I still think it may be worth running this by the gRPC folks (on the existing email thread) before we submit it so that they have a chance to sanity-check our approach and so that they're aware that we're going to these lengths to work around gRPC's current behavior. Thanks!
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/GrpcCallProvider.java
Outdated
Show resolved
Hide resolved
|
||
private void initChannelTask() { | ||
// We execute network initialization on a separate thread to not block operations that depend on | ||
// the AsyncQueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Continuation of #937 but instead of binding connectivity attempts to the WatchStream, this version bases connectivity attempts on grpc's ConnectivityState.
Note that this implementation also depends on two ManagedChannel APIs that are labeled as
@ExperimentalApi
and require a minimum grpc versionv1.6.1.
:ManagedChannel.getState()
andManagedChannel.notifyWhenStateChanged()
.