-
Notifications
You must be signed in to change notification settings - Fork 940
[Multi-Tab] Recover from unexpected primary lease loss #984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
c40ff09
e51265b
a7c8497
f9c7872
34d52ae
692b789
3de1af4
c243b26
ee94d57
3da6855
cbd58e4
72aa4a9
7f685f9
b1c7e8d
6d3628d
794e9f7
e203796
5c56479
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,7 +33,7 @@ import { RemoteEvent, TargetChange } from '../remote/remote_event'; | |
import { RemoteStore } from '../remote/remote_store'; | ||
import { RemoteSyncer } from '../remote/remote_syncer'; | ||
import { assert, fail } from '../util/assert'; | ||
import { Code, FirestoreError } from '../util/error'; | ||
import { FirestoreError } from '../util/error'; | ||
import * as log from '../util/log'; | ||
import { AnyJs, primitiveComparator } from '../util/misc'; | ||
import { ObjectMap } from '../util/obj_map'; | ||
|
@@ -58,6 +58,7 @@ import { | |
LimboDocumentChange, | ||
RemovedLimboDocument, | ||
View, | ||
ViewChange, | ||
ViewDocumentChanges | ||
} from './view'; | ||
import { ViewSnapshot } from './view_snapshot'; | ||
|
@@ -68,6 +69,7 @@ import { | |
import { ClientId, SharedClientState } from '../local/shared_client_state'; | ||
import { SortedSet } from '../util/sorted_set'; | ||
import * as objUtils from '../util/obj'; | ||
import { isPrimaryLeaseLostError } from '../local/indexeddb_persistence'; | ||
|
||
const LOG_TAG = 'SyncEngine'; | ||
|
||
|
@@ -208,7 +210,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
queryData.targetId | ||
); | ||
targetId = queryData.targetId; | ||
viewSnapshot = await this.initializeViewAndComputeInitialSnapshot( | ||
viewSnapshot = await this.initializeViewAndComputeSnapshot( | ||
queryData, | ||
status === 'current' | ||
); | ||
|
@@ -221,7 +223,11 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
return targetId; | ||
} | ||
|
||
private initializeViewAndComputeInitialSnapshot( | ||
/** | ||
* Registers a view for a previously unknown query and computes its initial | ||
* snapshot. | ||
*/ | ||
private initializeViewAndComputeSnapshot( | ||
queryData: QueryData, | ||
current: boolean | ||
): Promise<ViewSnapshot> { | ||
|
@@ -265,18 +271,33 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
} | ||
|
||
/** | ||
* Reconcile the list of synced documents in the local views with those from | ||
* persistence. | ||
* Reconcile the list of synced documents in an existing view with those | ||
* from persistence. | ||
*/ | ||
// PORTING NOTE: Multi-tab only. | ||
private async synchronizeLocalView(targetId: TargetId): Promise<void> { | ||
return this.localStore | ||
.remoteDocumentKeys(targetId) | ||
.then(async remoteKeys => { | ||
const queryView = this.queryViewsByTarget[targetId]; | ||
assert(!!queryView, 'Expected queryView to be defined'); | ||
queryView.view.synchronizeWithRemoteKeys(remoteKeys); | ||
}); | ||
private synchronizeViewAndComputeSnapshot( | ||
query: Query, | ||
targetId: TargetId | ||
): Promise<ViewChange> { | ||
return this.localStore.executeQuery(query).then(async docs => { | ||
return this.localStore | ||
.remoteDocumentKeys(targetId) | ||
.then(async remoteKeys => { | ||
const queryView = this.queryViewsByTarget[targetId]; | ||
assert(!!queryView, 'Cannot reconcile missing view'); | ||
const viewChange = queryView.view.synchronizeWithPersistedState( | ||
docs, | ||
remoteKeys, | ||
/* resetLimboDocuments= */ this.isPrimary, | ||
/* resetCurrent= */ this.isPrimary | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm having trouble understanding what these flags do. I wonder if "reset" should be "recompute" or "update" (reset makes it sound like they'll just be cleared, which I don't think is the case)... And if we pass false does that mean current and limboDocuments will be left 100% as-is? I guess ultimately I'm wondering why we can't just unconditionally recompute these. I also wonder if we could sensibly combine these into a single argument. Since current depends on limboDocuments, it probably never makes sense to recompute current without computing limbo documents? Anyway, at minimum it will probably take some comments to appease me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I simplified this a little to unconditionally re-compute the list of limbo documents. In clients that are in-sync this is a no-op, and in clients that are out of sync, this will bring them back into sync. The only bit that I still need to provide is whether to set Notice that I also added a TODO to mark the queries as non-current in SharedClientState. That's probably a three line change but I wanted to keep this PR from exploding even further. |
||
); | ||
await this.updateTrackedLimbos( | ||
queryView.targetId, | ||
viewChange.limboChanges | ||
); | ||
return viewChange; | ||
}); | ||
}); | ||
} | ||
|
||
/** Stops listening to the query. */ | ||
|
@@ -489,7 +510,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
.releaseQuery(queryView.query, /* keepPersistedQueryData */ false) | ||
.then(() => this.removeAndCleanupQuery(queryView)) | ||
.catch(err => this.tryRecoverFromPrimaryLeaseLoss(err)); | ||
this.errorHandler!(queryView.query, err); | ||
this.syncEngineListener!.onWatchError(queryView.query, err); | ||
} | ||
} | ||
|
||
|
@@ -520,17 +541,6 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
// connection is disabled. | ||
await this.remoteStore.fillWritePipeline(); | ||
} else if (batchState === 'acknowledged' || batchState === 'rejected') { | ||
if (this.isPrimary) { | ||
// If we receive a notification of an `acknowledged` or `rejected` batch | ||
// via Web Storage, we are either already secondary or another tab has | ||
// taken the primary lease. | ||
log.debug( | ||
LOG_TAG, | ||
'Unexpectedly received mutation batch notification when already primary. Releasing primary lease.' | ||
); | ||
await this.applyPrimaryState(false); | ||
} | ||
|
||
// NOTE: Both these methods are no-ops for batches that originated from | ||
// other clients. | ||
this.sharedClientState.removeLocalPendingMutation(batchId); | ||
|
@@ -776,12 +786,11 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
private async tryRecoverFromPrimaryLeaseLoss( | ||
err: FirestoreError | ||
): Promise<void> { | ||
if (err.code === Code.FAILED_PRECONDITION) { | ||
if (isPrimaryLeaseLostError(err)) { | ||
log.debug( | ||
LOG_TAG, | ||
'Unexpectedly lost primary lease, reverting to secondary' | ||
'Unexpectedly lost primary lease, attempting to recover' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We log "attempting to recover" and then do nothing... Seems like maybe this method should now be ignorePrimaryLeaseLoss() and the log message should say "ignoring" ... alternatively, as-is will there be a long (up to 4s) delay before we officially notice we're not primary? Maybe this should schedule a lease refresh or something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed "attempting to recover" altogether and didn't replace it with anything. Note that the persistence layer schedules a lease refresh when it throws this error, so the delay should be fairly minimal. |
||
); | ||
return this.applyPrimaryState(false); | ||
} else { | ||
throw err; | ||
} | ||
|
@@ -815,7 +824,6 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
async applyPrimaryState(isPrimary: boolean): Promise<void> { | ||
if (isPrimary === true && this.isPrimary !== true) { | ||
this.isPrimary = true; | ||
|
||
await this.remoteStore.applyPrimaryState(true); | ||
|
||
// Secondary tabs only maintain Views for their local listeners and the | ||
|
@@ -824,41 +832,80 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
// server considers to be in the target). So when a secondary becomes | ||
// primary, we need to need to make sure that all views for all targets | ||
// match the state on disk. | ||
let p = Promise.resolve(); | ||
const activeTargets = this.sharedClientState.getAllActiveQueryTargets(); | ||
activeTargets.forEach(targetId => { | ||
p = p.then(async () => { | ||
let queryData; | ||
const query = await this.localStore.getQueryForTarget(targetId); | ||
if (this.queryViewsByTarget[targetId] === undefined) { | ||
// For queries that never executed on this client, we need to | ||
// allocate the query in LocalStore and initialize a new View. | ||
queryData = await this.localStore.allocateQuery(query); | ||
await this.initializeViewAndComputeInitialSnapshot( | ||
queryData, | ||
false | ||
); | ||
} else { | ||
// For queries that have a local View, we need to update their state | ||
// in LocalStore (as the resume token and the snapshot version | ||
// might have changed) and reconcile their views with the persisted | ||
// state (the list of syncedDocuments may have gotten out of sync). | ||
await this.localStore.releaseQuery(query, true); | ||
queryData = await this.localStore.allocateQuery(query); | ||
await this.synchronizeLocalView(targetId); | ||
} | ||
this.remoteStore.listen(queryData); | ||
}); | ||
}); | ||
await p; | ||
const activeQueries = await this.synchronizeQueryViewsAndRaiseSnapshots( | ||
activeTargets.toArray() | ||
); | ||
for (const queryData of activeQueries) { | ||
this.remoteStore.listen(queryData); | ||
} | ||
} else if (isPrimary === false && this.isPrimary !== false) { | ||
this.isPrimary = false; | ||
await this.remoteStore.disableNetwork(); | ||
objUtils.forEachNumber(this.queryViewsByTarget, targetId => { | ||
const activeQueries = await this.synchronizeQueryViewsAndRaiseSnapshots( | ||
objUtils.indices(this.queryViewsByTarget) | ||
); | ||
this.resetLimboDocuments(); | ||
for (const queryData of activeQueries) { | ||
// TODO(multitab): Remove query views for non-local queries. | ||
this.remoteStore.unlisten(targetId); | ||
this.remoteStore.unlisten(queryData.targetId); | ||
} | ||
await this.remoteStore.applyPrimaryState(false); | ||
} | ||
} | ||
|
||
// PORTING NOTE: Multi-tab only. | ||
private resetLimboDocuments(): void { | ||
objUtils.forEachNumber(this.limboKeysByTarget, targetId => { | ||
this.remoteStore.unlisten(targetId); | ||
}); | ||
this.limboKeysByTarget = []; | ||
this.limboTargetsByKey = new SortedMap<DocumentKey, TargetId>( | ||
DocumentKey.comparator | ||
); | ||
} | ||
|
||
/** | ||
* Reconcile the local query views with those the state from persistence. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. those the? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yesssss. Fixed. |
||
* Raises snapshots for any changes that affect the current client. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and "Returns the set of queries for which we have local views." or something? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method actually returns the query data for the targets that are passed in. I updated the comment to:
|
||
*/ | ||
// PORTING NOTE: Multi-tab only. | ||
private synchronizeQueryViewsAndRaiseSnapshots( | ||
targets: TargetId[] | ||
): Promise<QueryData[]> { | ||
let p = Promise.resolve(); | ||
const activeQueries: QueryData[] = []; | ||
for (const targetId of targets) { | ||
p = p.then(async () => { | ||
let queryData; | ||
const query = await this.localStore.getQueryForTarget(targetId); | ||
if (this.queryViewsByTarget[targetId] === undefined) { | ||
assert( | ||
this.isPrimary, | ||
'A secondary tab should never have an active query without an active view.' | ||
); | ||
// For queries that never executed on this client, we need to | ||
// allocate the query in LocalStore and initialize a new View. | ||
queryData = await this.localStore.allocateQuery(query); | ||
await this.initializeViewAndComputeSnapshot(queryData, false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /current=/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
} else { | ||
// For queries that have a local View, we need to update their state | ||
// in LocalStore (as the resume token and the snapshot version | ||
// might have changed) and reconcile their views with the persisted | ||
// state (the list of syncedDocuments may have gotten out of sync). | ||
await this.localStore.releaseQuery(query, true); | ||
queryData = await this.localStore.allocateQuery(query); | ||
const viewChange = await this.synchronizeViewAndComputeSnapshot( | ||
query, | ||
queryData.targetId | ||
); | ||
if (viewChange.snapshot) { | ||
this.syncEngineListener!.onWatchChange([viewChange.snapshot]); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we batch up all the snapshots and call onWatchChange() only once? In practice I'm not sure how much it matters right now, but since all of the snapshots were the result of a single event, I think we should send them to onWatchChange() as a single batch. In the future we might expose an API like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was doing this at an earlier stage of this PR, but then ultimately decided to remove it to save one line of code. I put it back for now. I am wondering though if there are ever non-current snapshots for active queries that are not globally consistent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand your question, but I'm hoping it's a response to my poor job of explaining the onGlobalSnapshot() thing. The basic idea is that if you had lots of listeners that were related (e.g. for whose turn it is in a turn-based game and for the board state), you could delay your handling of updates until an onGlobalSnapshot() callback [e.g. so you don't try to take your turn until you know it's your turn and you have the updated board state]. Said differently: if somebody does multiple writes atomically in a batch write, you could use onGlobalSnapshot() to make sure all of your listeners have seen all of the writes. So that's the motivation for onWatchChange() accepting a list of snapshots rather than just a single one... I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, got it. It would allow us to know that we have received all events from the last consistent global snapshot. |
||
} | ||
} | ||
activeQueries.push(queryData); | ||
}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hrm... When we become primary we're going to add queryViews for targets we don't have local listeners for. So when you transition non-primary=>primary=>non-primary, you'll end up with a bunch of extra queryViews. While perhaps not harmful, I'd prefer if this were more deterministic. So should we remove no-longer-needed queryViews here too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's unfortunately no API yet that allows me to determine whether a view is "local". SharedClientState does have this information. But since this will require some plumbing, I turned this part of the cleanup into a TODO. |
||
} | ||
return p.then(() => activeQueries); | ||
} | ||
|
||
// PORTING NOTE: Multi-tab only | ||
|
@@ -873,13 +920,10 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
error?: FirestoreError | ||
): Promise<void> { | ||
if (this.isPrimary) { | ||
// If we receive a target state notification via Web Storage, we are | ||
// If we receive a target state notification via WebStorage, we are | ||
// either already secondary or another tab has taken the primary lease. | ||
log.debug( | ||
LOG_TAG, | ||
'Unexpectedly received query state notification when already primary. Releasing primary lease.' | ||
); | ||
await this.applyPrimaryState(false); | ||
log.debug(LOG_TAG, 'Ignoring unexpected query state notification.'); | ||
return; | ||
} | ||
|
||
if (this.queryViewsByTarget[targetId]) { | ||
|
@@ -929,7 +973,7 @@ export class SyncEngine implements RemoteSyncer, SharedClientStateSyncer { | |
const query = await this.localStore.getQueryForTarget(targetId); | ||
assert(!!query, `Query data for active target ${targetId} not found`); | ||
const queryData = await this.localStore.allocateQuery(query); | ||
await this.initializeViewAndComputeInitialSnapshot( | ||
await this.initializeViewAndComputeSnapshot( | ||
queryData, | ||
/*current=*/ false | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it'd be pretty straightforward to just pass in the queryView (as the only argument since it contains the query, targetId, and view) and then we don't need this assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This also helped me realize that I am doing one extra IndexedDb lookup for the query when I already have the QueryView (in
synchronizeQueryViewsAndRaiseSnapshots
).