Skip to content

[Multi-Tab] Recover from unexpected primary lease loss #984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 25, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 109 additions & 65 deletions packages/firestore/src/core/sync_engine.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import { RemoteEvent, TargetChange } from '../remote/remote_event';
import { RemoteStore } from '../remote/remote_store';
import { RemoteSyncer } from '../remote/remote_syncer';
import { assert, fail } from '../util/assert';
import { Code, FirestoreError } from '../util/error';
import { FirestoreError } from '../util/error';
import * as log from '../util/log';
import { AnyJs, primitiveComparator } from '../util/misc';
import { ObjectMap } from '../util/obj_map';
Expand All @@ -58,6 +58,7 @@ import {
LimboDocumentChange,
RemovedLimboDocument,
View,
ViewChange,
ViewDocumentChanges
} from './view';
import { ViewSnapshot } from './view_snapshot';
Expand All @@ -68,6 +69,7 @@ import {
import { ClientId, SharedClientState } from '../local/shared_client_state';
import { SortedSet } from '../util/sorted_set';
import * as objUtils from '../util/obj';
import { isPrimaryLeaseLostError } from '../local/indexeddb_persistence';

const LOG_TAG = 'SyncEngine';

Expand Down Expand Up @@ -208,7 +210,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
queryData.targetId
);
targetId = queryData.targetId;
viewSnapshot = await this.initializeViewAndComputeInitialSnapshot(
viewSnapshot = await this.initializeViewAndComputeSnapshot(
queryData,
status === 'current'
);
Expand All @@ -221,7 +223,11 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
return targetId;
}

private initializeViewAndComputeInitialSnapshot(
/**
* Registers a view for a previously unknown query and computes its initial
* snapshot.
*/
private initializeViewAndComputeSnapshot(
queryData: QueryData,
current: boolean
): Promise<ViewSnapshot> {
Expand Down Expand Up @@ -265,18 +271,33 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
}

/**
* Reconcile the list of synced documents in the local views with those from
* persistence.
* Reconcile the list of synced documents in an existing view with those
* from persistence.
*/
// PORTING NOTE: Multi-tab only.
private async synchronizeLocalView(targetId: TargetId): Promise<void> {
return this.localStore
.remoteDocumentKeys(targetId)
.then(async remoteKeys => {
const queryView = this.queryViewsByTarget[targetId];
assert(!!queryView, 'Expected queryView to be defined');
queryView.view.synchronizeWithRemoteKeys(remoteKeys);
});
private synchronizeViewAndComputeSnapshot(
query: Query,
targetId: TargetId
): Promise<ViewChange> {
return this.localStore.executeQuery(query).then(async docs => {
return this.localStore
.remoteDocumentKeys(targetId)
.then(async remoteKeys => {
const queryView = this.queryViewsByTarget[targetId];
assert(!!queryView, 'Cannot reconcile missing view');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think it'd be pretty straightforward to just pass in the queryView (as the only argument since it contains the query, targetId, and view) and then we don't need this assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This also helped me realize that I am doing one extra IndexedDb lookup for the query when I already have the QueryView (in synchronizeQueryViewsAndRaiseSnapshots).

const viewChange = queryView.view.synchronizeWithPersistedState(
docs,
remoteKeys,
/* resetLimboDocuments= */ this.isPrimary,
/* resetCurrent= */ this.isPrimary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding what these flags do. I wonder if "reset" should be "recompute" or "update" (reset makes it sound like they'll just be cleared, which I don't think is the case)... And if we pass false does that mean current and limboDocuments will be left 100% as-is? I guess ultimately I'm wondering why we can't just unconditionally recompute these. I also wonder if we could sensibly combine these into a single argument. Since current depends on limboDocuments, it probably never makes sense to recompute current without computing limbo documents?

Anyway, at minimum it will probably take some comments to appease me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified this a little to unconditionally re-compute the list of limbo documents. In clients that are in-sync this is a no-op, and in clients that are out of sync, this will bring them back into sync. The only bit that I still need to provide is whether to set current to false. I added some documentation to clear this up a little.

Notice that I also added a TODO to mark the queries as non-current in SharedClientState. That's probably a three line change but I wanted to keep this PR from exploding even further.

);
await this.updateTrackedLimbos(
queryView.targetId,
viewChange.limboChanges
);
return viewChange;
});
});
}

/** Stops listening to the query. */
Expand Down Expand Up @@ -489,7 +510,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
.releaseQuery(queryView.query, /* keepPersistedQueryData */ false)
.then(() => this.removeAndCleanupQuery(queryView))
.catch(err => this.tryRecoverFromPrimaryLeaseLoss(err));
this.errorHandler!(queryView.query, err);
this.syncEngineListener!.onWatchError(queryView.query, err);
}
}

Expand Down Expand Up @@ -520,17 +541,6 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
// connection is disabled.
await this.remoteStore.fillWritePipeline();
} else if (batchState === 'acknowledged' || batchState === 'rejected') {
if (this.isPrimary) {
// If we receive a notification of an `acknowledged` or `rejected` batch
// via Web Storage, we are either already secondary or another tab has
// taken the primary lease.
log.debug(
LOG_TAG,
'Unexpectedly received mutation batch notification when already primary. Releasing primary lease.'
);
await this.applyPrimaryState(false);
}

// NOTE: Both these methods are no-ops for batches that originated from
// other clients.
this.sharedClientState.removeLocalPendingMutation(batchId);
Expand Down Expand Up @@ -776,12 +786,11 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
private async tryRecoverFromPrimaryLeaseLoss(
err: FirestoreError
): Promise<void> {
if (err.code === Code.FAILED_PRECONDITION) {
if (isPrimaryLeaseLostError(err)) {
log.debug(
LOG_TAG,
'Unexpectedly lost primary lease, reverting to secondary'
'Unexpectedly lost primary lease, attempting to recover'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We log "attempting to recover" and then do nothing... Seems like maybe this method should now be ignorePrimaryLeaseLoss() and the log message should say "ignoring"

... alternatively, as-is will there be a long (up to 4s) delay before we officially notice we're not primary? Maybe this should schedule a lease refresh or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed "attempting to recover" altogether and didn't replace it with anything.

Note that the persistence layer schedules a lease refresh when it throws this error, so the delay should be fairly minimal.

);
return this.applyPrimaryState(false);
} else {
throw err;
}
Expand Down Expand Up @@ -815,7 +824,6 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
async applyPrimaryState(isPrimary: boolean): Promise<void> {
if (isPrimary === true && this.isPrimary !== true) {
this.isPrimary = true;

await this.remoteStore.applyPrimaryState(true);

// Secondary tabs only maintain Views for their local listeners and the
Expand All @@ -824,41 +832,80 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
// server considers to be in the target). So when a secondary becomes
// primary, we need to need to make sure that all views for all targets
// match the state on disk.
let p = Promise.resolve();
const activeTargets = this.sharedClientState.getAllActiveQueryTargets();
activeTargets.forEach(targetId => {
p = p.then(async () => {
let queryData;
const query = await this.localStore.getQueryForTarget(targetId);
if (this.queryViewsByTarget[targetId] === undefined) {
// For queries that never executed on this client, we need to
// allocate the query in LocalStore and initialize a new View.
queryData = await this.localStore.allocateQuery(query);
await this.initializeViewAndComputeInitialSnapshot(
queryData,
false
);
} else {
// For queries that have a local View, we need to update their state
// in LocalStore (as the resume token and the snapshot version
// might have changed) and reconcile their views with the persisted
// state (the list of syncedDocuments may have gotten out of sync).
await this.localStore.releaseQuery(query, true);
queryData = await this.localStore.allocateQuery(query);
await this.synchronizeLocalView(targetId);
}
this.remoteStore.listen(queryData);
});
});
await p;
const activeQueries = await this.synchronizeQueryViewsAndRaiseSnapshots(
activeTargets.toArray()
);
for (const queryData of activeQueries) {
this.remoteStore.listen(queryData);
}
} else if (isPrimary === false && this.isPrimary !== false) {
this.isPrimary = false;
await this.remoteStore.disableNetwork();
objUtils.forEachNumber(this.queryViewsByTarget, targetId => {
const activeQueries = await this.synchronizeQueryViewsAndRaiseSnapshots(
objUtils.indices(this.queryViewsByTarget)
);
this.resetLimboDocuments();
for (const queryData of activeQueries) {
// TODO(multitab): Remove query views for non-local queries.
this.remoteStore.unlisten(targetId);
this.remoteStore.unlisten(queryData.targetId);
}
await this.remoteStore.applyPrimaryState(false);
}
}

// PORTING NOTE: Multi-tab only.
private resetLimboDocuments(): void {
objUtils.forEachNumber(this.limboKeysByTarget, targetId => {
this.remoteStore.unlisten(targetId);
});
this.limboKeysByTarget = [];
this.limboTargetsByKey = new SortedMap<DocumentKey, TargetId>(
DocumentKey.comparator
);
}

/**
* Reconcile the local query views with those the state from persistence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those the?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesssss. Fixed.

* Raises snapshots for any changes that affect the current client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and "Returns the set of queries for which we have local views." or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method actually returns the query data for the targets that are passed in. I updated the comment to:

  * Reconcile the query views of the provided query targets with the state from
  * persistence. Raises snapshots for any changes that affect the current
  * client and returns the updated state of all target's query data.

*/
// PORTING NOTE: Multi-tab only.
private synchronizeQueryViewsAndRaiseSnapshots(
targets: TargetId[]
): Promise<QueryData[]> {
let p = Promise.resolve();
const activeQueries: QueryData[] = [];
for (const targetId of targets) {
p = p.then(async () => {
let queryData;
const query = await this.localStore.getQueryForTarget(targetId);
if (this.queryViewsByTarget[targetId] === undefined) {
assert(
this.isPrimary,
'A secondary tab should never have an active query without an active view.'
);
// For queries that never executed on this client, we need to
// allocate the query in LocalStore and initialize a new View.
queryData = await this.localStore.allocateQuery(query);
await this.initializeViewAndComputeSnapshot(queryData, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/current=/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
// For queries that have a local View, we need to update their state
// in LocalStore (as the resume token and the snapshot version
// might have changed) and reconcile their views with the persisted
// state (the list of syncedDocuments may have gotten out of sync).
await this.localStore.releaseQuery(query, true);
queryData = await this.localStore.allocateQuery(query);
const viewChange = await this.synchronizeViewAndComputeSnapshot(
query,
queryData.targetId
);
if (viewChange.snapshot) {
this.syncEngineListener!.onWatchChange([viewChange.snapshot]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we batch up all the snapshots and call onWatchChange() only once?

In practice I'm not sure how much it matters right now, but since all of the snapshots were the result of a single event, I think we should send them to onWatchChange() as a single batch. In the future we might expose an API like:

firebase.firestore().onGlobalSnapshot(() => { /* I know that all listeners are in a consistent state. */ });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing this at an earlier stage of this PR, but then ultimately decided to remove it to save one line of code. I put it back for now. I am wondering though if there are ever non-current snapshots for active queries that are not globally consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your question, but I'm hoping it's a response to my poor job of explaining the onGlobalSnapshot() thing.

The basic idea is that if you had lots of listeners that were related (e.g. for whose turn it is in a turn-based game and for the board state), you could delay your handling of updates until an onGlobalSnapshot() callback [e.g. so you don't try to take your turn until you know it's your turn and you have the updated board state]. Said differently: if somebody does multiple writes atomically in a batch write, you could use onGlobalSnapshot() to make sure all of your listeners have seen all of the writes.

So that's the motivation for onWatchChange() accepting a list of snapshots rather than just a single one... I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it. It would allow us to know that we have received all events from the last consistent global snapshot.

}
}
activeQueries.push(queryData);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... When we become primary we're going to add queryViews for targets we don't have local listeners for.
Here, when we lose primary, we'll stop listening to those queries, but we'll keep the queryViews.

So when you transition non-primary=>primary=>non-primary, you'll end up with a bunch of extra queryViews. While perhaps not harmful, I'd prefer if this were more deterministic. So should we remove no-longer-needed queryViews here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's unfortunately no API yet that allows me to determine whether a view is "local". SharedClientState does have this information. But since this will require some plumbing, I turned this part of the cleanup into a TODO.

}
return p.then(() => activeQueries);
}

// PORTING NOTE: Multi-tab only
Expand All @@ -873,13 +920,10 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
error?: FirestoreError
): Promise<void> {
if (this.isPrimary) {
// If we receive a target state notification via Web Storage, we are
// If we receive a target state notification via WebStorage, we are
// either already secondary or another tab has taken the primary lease.
log.debug(
LOG_TAG,
'Unexpectedly received query state notification when already primary. Releasing primary lease.'
);
await this.applyPrimaryState(false);
log.debug(LOG_TAG, 'Ignoring unexpected query state notification.');
return;
}

if (this.queryViewsByTarget[targetId]) {
Expand Down Expand Up @@ -929,7 +973,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer {
const query = await this.localStore.getQueryForTarget(targetId);
assert(!!query, `Query data for active target ${targetId} not found`);
const queryData = await this.localStore.allocateQuery(query);
await this.initializeViewAndComputeInitialSnapshot(
await this.initializeViewAndComputeSnapshot(
queryData,
/*current=*/ false
);
Expand Down
17 changes: 16 additions & 1 deletion packages/firestore/src/core/view.ts
Original file line number Diff line number Diff line change
Expand Up @@ -384,8 +384,23 @@ export class View {
}

// PORTING NOTE: Multi-tab only.
synchronizeWithRemoteKeys(remoteKeys: DocumentKeySet): void {
synchronizeWithPersistedState(
localDocs: MaybeDocumentMap,
remoteKeys: DocumentKeySet,
resetLimboDocuments: boolean,
resetCurrent: boolean
): ViewChange {
if (resetLimboDocuments) {
this.limboDocuments = documentKeySet();
}

if (resetCurrent) {
this.current = false;
}

this._syncedDocuments = remoteKeys;
const docChanges = this.computeDocChanges(localDocs);
return this.applyChanges(docChanges, resetLimboDocuments);
}

/**
Expand Down
28 changes: 19 additions & 9 deletions packages/firestore/src/local/indexeddb_persistence.ts
Original file line number Diff line number Diff line change
Expand Up @@ -241,16 +241,18 @@ export class IndexedDbPersistence implements Persistence {
)
.next(() => this.canActAsPrimary(txn))
.next(canActAsPrimary => {
const wasPrimary = this.isPrimary;
this.isPrimary = canActAsPrimary;
// Always call the primary state listener, since SyncEngine may have
// changed the primary state to 'false'.
this.queue.enqueue(async () => {
// Verify that `shutdown()` hasn't been called yet by the time
// we invoke the `primaryStateListener`.
if (this.started) {
return this.primaryStateListener(this.isPrimary);
}
});

if (wasPrimary !== this.isPrimary) {
this.queue.enqueue(async () => {
// Verify that `shutdown()` hasn't been called yet by the time
// we invoke the `primaryStateListener`.
if (this.started) {
return this.primaryStateListener(this.isPrimary);
}
});
}

if (wasPrimary && !this.isPrimary) {
return this.releasePrimaryLeaseIfHeld(txn);
Expand Down Expand Up @@ -476,6 +478,8 @@ export class IndexedDbPersistence implements Persistence {
log.error(
`Failed to obtain primary lease for action '${action}'.`
);
this.isPrimary = false;
this.queue.enqueue(() => this.primaryStateListener(false));
throw new FirestoreError(
Code.FAILED_PRECONDITION,
PRIMARY_LEASE_LOST_ERROR_MSG
Expand Down Expand Up @@ -732,6 +736,12 @@ export class IndexedDbPersistence implements Persistence {
}
}

export function isPrimaryLeaseLostError(err: FirestoreError): boolean {
return (
err.code === Code.FAILED_PRECONDITION &&
err.message === PRIMARY_LEASE_LOST_ERROR_MSG
);
}
/**
* Helper to get a typed SimpleDbStore for the owner object store.
*/
Expand Down
6 changes: 3 additions & 3 deletions packages/firestore/src/local/persistence.ts
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,9 @@ export interface Persistence {
shutdown(deleteData?: boolean): Promise<void>;

/**
* Registers a listener that gets called immediately wih the current primary
* state and then periodically as the lease is verified (likely with the same
* state).
* Registers a listener that gets called when the primary state of the
* instance changes. Upon registering, this listener is invoked immediately
* with the current primary state.
*
* PORTING NOTE: This is only used for Web multi-tab.
*/
Expand Down
Loading