Skip to content

Add timeout to OnlineState tracking. #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 3, 2018
6 changes: 3 additions & 3 deletions packages/firestore/src/core/event_manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -248,9 +248,9 @@ export class QueryListener {
return true;
}

// NOTE: We consider OnlineState.Unknown as online (it should become Failed
// NOTE: We consider OnlineState.Unknown as online (it should become Offline
// or Online if we wait long enough).
const maybeOnline = onlineState !== OnlineState.Failed;
const maybeOnline = onlineState !== OnlineState.Offline;
// Don't raise the event if we're online, aren't synced yet (checked
// above) and are waiting for a sync.
if (this.options.waitForSyncWhenOnline && maybeOnline) {
Expand All @@ -262,7 +262,7 @@ export class QueryListener {
}

// Raise data from cache if we have any documents or we are offline
return !snap.docs.isEmpty() || onlineState === OnlineState.Failed;
return !snap.docs.isEmpty() || onlineState === OnlineState.Offline;
}

private shouldRaiseEvent(snap: ViewSnapshot): boolean {
Expand Down
1 change: 1 addition & 0 deletions packages/firestore/src/core/firestore_client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ export class FirestoreClient {
this.remoteStore = new RemoteStore(
this.localStore,
datastore,
this.asyncQueue,
onlineStateChangedHandler
);

Expand Down
4 changes: 2 additions & 2 deletions packages/firestore/src/core/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,12 @@ export enum OnlineState {
* reached after a successful connection and there has been at least one
* successful message received from the backends.
*/
Healthy,
Online,

/**
* The client is either trying to establish a connection but failing, or it
* has been explicitly marked offline via a call to disableNetwork().
* Higher-level components should operate in offline mode.
*/
Failed
Offline
}
2 changes: 1 addition & 1 deletion packages/firestore/src/core/view.ts
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ export class View {
* ViewChange if the view's syncState changes as a result.
*/
applyOnlineStateChange(onlineState: OnlineState): ViewChange {
if (this.current && onlineState === OnlineState.Failed) {
if (this.current && onlineState === OnlineState.Offline) {
// If we're offline, set `current` to false and then call applyChanges()
// to refresh our syncState and generate a ViewChange as appropriate. We
// are guaranteed to get a new TargetChange that sets `current` back to
Expand Down
166 changes: 166 additions & 0 deletions packages/firestore/src/remote/online_state_tracker.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
/**
* Copyright 2018 Google Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import { OnlineState } from '../core/types';
import * as log from '../util/log';
import { assert } from '../util/assert';
import { AsyncQueue, TimerId } from '../util/async_queue';
import { CancelablePromise } from '../util/promise';

const LOG_TAG = 'OnlineStateTracker';

// To deal with transient failures, we allow multiple stream attempts before
// giving up and transitioning to Offline.
const MAX_WATCH_STREAM_FAILURES = 2;

// To deal with stream attempts that don't succeed or fail in a timely manner,
// we have a maximum timeout we'll wait for the stream to either succeed or fail
// MAX_WATCH_STREAM_FAILURES times, else we revert to OnlineState.Offline.
const MAX_WATCH_STREAM_TIMEOUT_MS = 10 * 1000;

/**
* A component used by the RemoteStore to track the OnlineState (that is,
* whether or not the client as a whole should be considered to be online or
* offline), implementing the appropriate heuristics.
*
* In particular, when the client is trying to connect to the backend, we
* allow up to MAX_WATCH_STREAM_FAILURES within MAX_WATCH_STREAM_TIMEOUT_MS for
* a connection to succeed. If we have too many failures or the timeout elapses,
* then we set the OnlineState to Offline, and the client will behave as if
* it is offline (get()s will return cached data, etc.).
*/
export class OnlineStateTracker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any way it would happen but just in case: are there any interactions between this class and the idle timeout timers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be since this timer goes away once you're connected. And the idle timers start once you're connected... And with the improved testing, if both timers were active at once, runDelayedOperationsEarly() would run both so we'd be exercising them together, at least in theory...

/** The current OnlineState. */
private state = OnlineState.Unknown;

/**
* A count of consecutive failures to open the stream. If it reaches the
* maximum defined by MAX_WATCH_STREAM_FAILURES, we'll set the OnlineState to
* Offline.
*/
private watchStreamFailures = 0;

/**
* A timer that elapses after MAX_WATCH_STREAM_TIMEOUT_MS, at which point we
* revert to OnlineState.Offline without waiting for the stream to actually
* fail (MAX_WATCH_STREAM_FAILURES times).
*/
private watchStreamTimerPromise: CancelablePromise<void> | null = null;

/**
* Whether the client should log a warning message if it fails to connect to
* the backend (initially true, cleared after a successful stream, or if we've
* logged the message already).
*/
private shouldWarnClientIsOffline = true;

constructor(
private asyncQueue: AsyncQueue,
private onlineStateHandler: (onlineState: OnlineState) => void
) {}

/**
* Called by RemoteStore when a watch stream is started.
*
* It sets the OnlineState to Unknown and starts a MAX_WATCH_STREAM_TIMEOUT_MS
* timer if necessary.
*/
handleWatchStreamStart(): void {
this.setAndBroadcast(OnlineState.Unknown);

if (this.watchStreamTimerPromise === null) {
this.watchStreamTimerPromise = this.asyncQueue.enqueueAfterDelay(
TimerId.OnlineStateTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this cluster of names confusingly dissimlar. Why is this an "OnlineStateTimeout", the promise is a "watchStreamTimerPromise", and the timeout a "MAX_WATCH_STREAM_TIMEOUT_MS"?

These all seem related to the handling the initial connection but:

  • OnlineStateTimeout seems like it might be applicable to other transitions
  • watchStreamTimer seems wholly unrelated to that purpose
  • MAX_WATCH_STREAM_TIMEOUT_MS seems just as likely to apply to an idle timeout. Also, why "MAX"? Isn't this just the timeout duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

I renamed:

  • MAX_WATCH_STREAM_TIMEOUT_MS => ONLINE_STATE_TIMEOUT_MS
  • watchStreamTimerPromise => onlineStateTimer

And I clarified some comments. I agree that OnlineStateTimeout is a somewhat imprecise name (could be applicable to other transitions). We could do "OnlineStateUnknownTimeout", but I don't think that's much clearer (and makes all of these names longer) and so I'm inclined to just lean on comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a marked improvement and LGTM as-is. Thanks. Agree that leaning on comments can be the way to go in cases like these.

I'm still torn about this because the connection timer seems very much related to what we're doing here: namely if we're ever connecting our state starts at unknown. If we logically fail to connect (i.e. fail to transition out of unknown) then this timer fires. It seems like, for all intents and purposes, OnlineStateUnknownTimeout is the real connection timeout. Could that other thing we have be some lower-level concept?

Or maybe the name of OnlineStateUnknownTimeout itself is caught in my craw. Our other timeouts are named in terms of the end state, not the starting state. IOW: how long until some state, not how long in some state. How long until idle, how long until connected, etc.

This comment isn't super actionable but I'm still uneasy, and if history has shown anything: this code is fiddly, keeps changing, and we keep misunderstanding it over time, so if you have the energy I'd like to try to find a way to make it more immediately obvious and less fuzzy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that our timers have been fiddly / confusing and getting this right is worthwhile. :-)

So right now our timers are: ListenStreamIdle / ListenStreamConnection, WriteStreamIdle / WriteStreamConnection, OnlineStateTimeout

The Connection timers are used to schedule delayed stream attempts subject to exponential backoff... Not all stream attempts are delayed in which case the "Connection" timer doesn't come into play (e.g the first attempt once a stream becomes necessary, or the first attempt after an established stream encounters an error, etc.).

But the OnlineStateTimeout is still relevant when the stream attempt isn't delayed (e.g. it'll fire if the first stream attempt doesn't succeed / fail within 10 seconds, triggering us to OnlineState.Offline while we wait).

And the Connection timers will continue to be used even once the OnlineStateTimeout has triggered (because we'll continue scheduling stream attempts with backoff even after we've entered OnlineState.Offline).

So I think they're not actually that related, and we probably have a naming problem that makes them seem related. We could try to be more verbose, e.g. ListenStreamDelayedRestart instead of ListenStreamConnection.

In naming these, it's worth thinking about their usage from tests, e.g.: runDelayedOperationsUntil(TimerId.ListenStreamDelayedRestart) to make a test skip the exponential backoff delay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK could the connection timer ids really be backoff timer ids? This is essentially where we use them:

    this.backoff = new ExponentialBackoff(
      queue,
      connectionTimerId,
      BACKOFF_INITIAL_DELAY_MS,
      BACKOFF_FACTOR,
      BACKOFF_MAX_DELAY_MS
    );

TimerId.ListenStreamBackoff => runDelayedOperationsUntil(TimerId.ListenStreamBackoff)?

I don't think the fact that the online state timeout is for succeeding or failing much matters. Connection timeouts pretty much always work this way: they introduce a client-generated error if the connection hasn't already succeeded or failed on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what they are, except "backoff" describes an implementation detail about how the timers are scheduled rather than what they actually are for. You could hypothetically imagine that the stream could have multiple timers, for different purposes, that happen to use exponential backoff scheduling, at which point "ListenStreamBackoff" would be meaningless (as opposed to "ListenStreamDelayedRestart" or whatever).

That said, we probably conflate "backoff" with the act of reconnecting already, and so I'd be okay with ListenStreamBackoff if you think that's clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I still do think there's a meaningful distinction between our "online state timeout" and a "connection timeout" because we explicitly don't cancel the existing stream attempt and start a new one... Instead we're relying on grpc's internal timeout / retry connection logic.

MAX_WATCH_STREAM_TIMEOUT_MS,
() => {
this.watchStreamTimerPromise = null;
assert(
this.state === OnlineState.Unknown,
'Timer should be canceled if we transitioned to a different state.'
);
log.debug(
LOG_TAG,
`Watch stream didn't reach online or offline within ` +
`${MAX_WATCH_STREAM_TIMEOUT_MS}ms. Considering client offline.`
);
this.logClientOfflineWarningIfNecessary();
this.setAndBroadcast(OnlineState.Offline);
return Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this failure to connect in time bump the number of watch stream failures? If not could you please add comments why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

}
);
}
}

/**
* Updates our OnlineState as appropriate after the watch stream reports a
* failure. The first failure moves us to the 'Unknown' state. We then may
* allow multiple failures (based on MAX_WATCH_STREAM_FAILURES) before we
* actually transition to the 'Offline' state.
*/
handleWatchStreamFailure(): void {
if (this.state === OnlineState.Online) {
this.setAndBroadcast(OnlineState.Unknown);
} else {
this.watchStreamFailures++;
if (this.watchStreamFailures >= MAX_WATCH_STREAM_FAILURES) {
this.clearWatchStreamTimer();
this.logClientOfflineWarningIfNecessary();
this.setAndBroadcast(OnlineState.Offline);
}
}
}

/**
* Explicitly sets the OnlineState to the specified state.
*
* Note that this resets our timers / failure counters, etc. used by our
* Offline heuristics, so must not be used in place of
* handleWatchStreamStart() and handleWatchStreamFailure().
*/
set(newState: OnlineState): void {
this.clearWatchStreamTimer();
this.watchStreamFailures = 0;

if (newState === OnlineState.Online) {
// We've connected to watch at least once. Don't warn the developer
// about being offline going forward.
this.shouldWarnClientIsOffline = false;
}

this.setAndBroadcast(newState);
}

private setAndBroadcast(newState: OnlineState): void {
if (newState !== this.state) {
this.state = newState;
this.onlineStateHandler(newState);
}
}

private logClientOfflineWarningIfNecessary(): void {
if (this.shouldWarnClientIsOffline) {
log.error('Could not reach Firestore backend.');
this.shouldWarnClientIsOffline = false;
}
}

private clearWatchStreamTimer(): void {
if (this.watchStreamTimerPromise !== null) {
this.watchStreamTimerPromise.cancel();
this.watchStreamTimerPromise = null;
}
}
}
11 changes: 0 additions & 11 deletions packages/firestore/src/remote/persistent_stream.ts
Original file line number Diff line number Diff line change
Expand Up @@ -265,17 +265,6 @@ export abstract class PersistentStream<
IDLE_TIMEOUT_MS,
() => this.handleIdleCloseTimer()
);

this.inactivityTimerPromise.catch((err: FirestoreError) => {
// When the AsyncQueue gets drained during testing, pending Promises
// (including these idle checks) will get rejected. We special-case
// these cancelled idle checks to make sure that these specific Promise
// rejections are not considered unhandled.
assert(
err.code === Code.CANCELLED,
`Received unexpected error in idle timeout closure. Expected CANCELLED, but was: ${err}`
);
});
}
}

Expand Down
Loading