Skip to content

Implement limbo resolution listen throttling #5310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 15, 2020
Merged

Conversation

dconeybe
Copy link
Contributor

@dconeybe dconeybe commented Apr 8, 2020

This is a port of firebase/firebase-js-sdk#2790 and firebase/firebase-android-sdk#1374 to iOS.

The following description was copied from the first of those two PRs and pasted here for your convenience:

This change was motivated by the following bug report: firebase/firebase-js-sdk#2683. In summary, the issue was that when a large number of documents went into limbo (in this case 15,000 documents) then this would cause 15,000 document listens, which would result in "resource-exhausted" errors. The client would then back off for a bit but would try the listens again, which would again result in "resource-exhausted" errors. And the cycle would continue. Worst of all, the customer was being billed for all of these reads that had no beneficial effects for the clients.

In order to avoid this situation, this PR modifies the limbo resolution logic to "throttle" the limbo resolutions. It does this by allowing at most 100 concurrent limbo resolutions. Limbo resolutions in excess of this limit are queued up and not started until another limbo resolution completes. This fix should avoid the "resource-exhausted" errors and the infinite read loop.

@dconeybe dconeybe self-assigned this Apr 8, 2020
@dconeybe dconeybe marked this pull request as ready for review April 8, 2020 13:33
@dconeybe dconeybe requested a review from wu-hui April 8, 2020 13:33
@dconeybe dconeybe assigned wu-hui and unassigned dconeybe Apr 8, 2020
@@ -231,6 +240,16 @@ class SyncEngine : public remote::RemoteStoreCallback, public QueryEventSource {

void TrackLimboChange(const LimboDocumentChange& limbo_change);

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultranit: I understand this is copied from other platforms. I'd still suggest to mention max_concurrent_limbo_resolutions_ in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

limbo_targets_by_key_[key] = limbo_target_id;
active_limbo_resolutions_by_target_.emplace(limbo_target_id,
LimboResolution{key});
active_limbo_targets_by_key_[key] = limbo_target_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there special reason why [] is used here as opposed to emplace you used in the above line?

If not, we should be consistent here and use emplace in both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that particular line of code is not new but only changed due to the renaming of the variable limbo_targets_by_key_ to active_limbo_targets_by_key_. If I understand emplace() correctly, there is no benefit to using it over operator[] because limbo_target_id is just an int and therefore does not benefit from move semantics. I was also trying to minimize the unrelated changes so if you'd still prefer that I change it to emplace() then I will do that in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I am OK with a separate PR to make it more consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, emplace on a map is constructing the std::pair representing the key-value pair, so even though the limbo_target_id is trivially copyable, the pair is not because the key is a DocumentKey.

This is something of an academic concern though: we've structured our code so that vocabulary types like DocumentKey are cheaply copyable. They're immutable values that hide a std::shared_ptr, making a copy just a reference count bump. In this case emplace vs operator[] without moving both make a copy so there's essentially no difference (emplace with a moved key does save the copy, but that doesn't work here because the key is used below).

In any case, making these consistent is still worthwhile, and feel free to just inline that into this PR. A separate PR just for that small change is more effort than this change is worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It turns out that this was the only place where [] needed to be changed to emplace.

Copy link
Contributor

@wu-hui wu-hui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just one place over the usage of [] and emplace. The other nit is optional.

@wu-hui wu-hui assigned dconeybe and unassigned wu-hui Apr 8, 2020
…EnqueuedLimboResolutions(), as requested in code review.
Copy link
Contributor Author

@dconeybe dconeybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @wu-hui!

@@ -231,6 +240,16 @@ class SyncEngine : public remote::RemoteStoreCallback, public QueryEventSource {

void TrackLimboChange(const LimboDocumentChange& limbo_change);

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

limbo_targets_by_key_[key] = limbo_target_id;
active_limbo_resolutions_by_target_.emplace(limbo_target_id,
LimboResolution{key});
active_limbo_targets_by_key_[key] = limbo_target_id;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that particular line of code is not new but only changed due to the renaming of the variable limbo_targets_by_key_ to active_limbo_targets_by_key_. If I understand emplace() correctly, there is no benefit to using it over operator[] because limbo_target_id is just an int and therefore does not benefit from move semantics. I was also trying to minimize the unrelated changes so if you'd still prefer that I change it to emplace() then I will do that in a follow-up PR.

@dconeybe dconeybe assigned wu-hui and unassigned dconeybe Apr 8, 2020
Copy link
Contributor

@wu-hui wu-hui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wu-hui wu-hui assigned dconeybe and unassigned wu-hui Apr 8, 2020
@@ -90,7 +92,8 @@ class SyncEngine : public remote::RemoteStoreCallback, public QueryEventSource {
public:
SyncEngine(local::LocalStore* local_store,
remote::RemoteStore* remote_store,
const auth::User& initial_user);
const auth::User& initial_user,
size_t maxConcurrentLimboResolutions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake case for variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed. Good catch.

@@ -1,4 +1,6 @@
# Unreleased
- [changed] Firestore now limits the number of concurrent document lookups it
will perform when resolving inconsistencies in the local cache (#5310).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should link to the issue rather than the PR--it doesn't matter that the issue was in another repository. See below for a what a link looks like to an issue in firebase-js-sdk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed the PR number to the link to the issue firebase/firebase-js-sdk#449.

limbo_targets_by_key_[key] = limbo_target_id;
active_limbo_resolutions_by_target_.emplace(limbo_target_id,
LimboResolution{key});
active_limbo_targets_by_key_[key] = limbo_target_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, emplace on a map is constructing the std::pair representing the key-value pair, so even though the limbo_target_id is trivially copyable, the pair is not because the key is a DocumentKey.

This is something of an academic concern though: we've structured our code so that vocabulary types like DocumentKey are cheaply copyable. They're immutable values that hide a std::shared_ptr, making a copy just a reference count bump. In this case emplace vs operator[] without moving both make a copy so there's essentially no difference (emplace with a moved key does save the copy, but that doesn't work here because the key is used below).

In any case, making these consistent is still worthwhile, and feel free to just inline that into this PR. A separate PR just for that small change is more effort than this change is worth.

@dconeybe
Copy link
Contributor Author

Note that the ASAN check failure is a known issue with the build system: #5316 (comment). I will ignore that failure and go ahead and merge.

@dconeybe dconeybe merged commit cd6d0f1 into master Apr 15, 2020
@dconeybe dconeybe deleted the dconeybe/LimboThrottling branch April 15, 2020 16:01
@firebase firebase locked and limited conversation to collaborators May 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants