Skip to content

Add Index-Free query engine #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

schmidt-sebastian
Copy link
Contributor

This is the last PR with index-free logic. It, however, doesn't turn it on yet (and is still against the indexfree-master branch).

There are a fair bit of comments in the code to explain how it all works. The only other thing worth pointing out is that I updated the LocalStoreTests to run parameterized with different query engine, which changed some of the plumbing during the test setup.

I verified that the Spec tests pass with the IndexFreeQueryEngine, but also left those running against the SimpleQueryEngine to match the code in production.

@schmidt-sebastian schmidt-sebastian force-pushed the mrschmidt/indexfree-query-engine branch from 36891f7 to db9a9a7 Compare August 9, 2019 22:25
@schmidt-sebastian schmidt-sebastian force-pushed the mrschmidt/indexfree-query-engine branch from 98a562f to d65656b Compare August 9, 2019 22:29
@schmidt-sebastian
Copy link
Contributor Author

PTAL :)

* Returns true if this query does not specify any query constraints that could remove results.
*/
public boolean matchesAllDocuments() {
return filters.isEmpty() && limit == NO_LIMIT && startAt == null && endAt == null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order bys also remove documents because they implicitly restrict the result set to those documents containing the field on which to order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my often suggested workaround for "not null" queries. Good catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix isn't exactly right because users can explicitly orderBy("__name__") which is the one case where we don't lose any entries. I think it's:

FieldPath firstField = getFirstOrderByField();
firstField == null || (firstField.isKeyField() && getExplicitOrderBy().size() == 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't feel like there was much to gain from special-casing this, but since this is now a generalized API in Query itself, it probably makes sense to address this case.

// Limit queries are not eligible for index-free query execution if any part of the result was
// modified after we received the last query snapshot. This makes sure that we re-populate the
// view with older documents that may sort before the modified document.
if (query.hasLimit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tragic if there's nothing we can do about it. Users use limits all the time and refine views by applying additional filters. If two different queries run against the same data each will invalidate the other's ability to run index free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably make this less bad if we change this line

|| doc.getVersion().compareTo(existingDoc.getVersion()) >= 0) {
to not update documents that haven't changed (provided they have no local commits).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems pretty reasonable.

public void testUsesTargetMappingToExecuteQueries() {
// This test verifies that once a target mapping has been written, only documents that match
// the query are read from the RemoteDocumentCache.
assumeTrue(queryEngine instanceof IndexFreeQueryEngine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: none of the other platforms have an equivalent to assumeTrue. Is this really better than just if/return?

It seems like !garbageCollectorIsEager() is a similar kind of assumption. Handle them the same way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaScript allows us to conditionally invoke it.skip() (https://github.com/firebase/firebase-js-sdk/blob/80d0846598d5a155390701c3b9a93ac160ee12b0/packages/firestore/test/integration/api/database.test.ts#L126). I prefer using the test runners this way because it doesn't give us the illusion that we ran a large number of tests when in reality we didn't

I know that we handle garbageCollectorIsEager differently, but I would much prefer to also use assume for this. Either way, I don't feel strongly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference either but it seems like we should at least handle query engine and GC the same way. assumeFalse(garbageCollectorIsEager) or assumeTrue(garbageCollectorNonEager())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only a few tests that skip test execution completely based on garbageCollectorIsEager, so I updated them to use Assume.

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Aug 15, 2019
Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed most of the feedback, and left some questions unanswered for follow-up.

* Returns true if this query does not specify any query constraints that could remove results.
*/
public boolean matchesAllDocuments() {
return filters.isEmpty() && limit == NO_LIMIT && startAt == null && endAt == null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my often suggested workaround for "not null" queries. Good catch.

…e/firebase-android-sdk into mrschmidt/indexfree-query-engine
@schmidt-sebastian schmidt-sebastian removed their assignment Aug 16, 2019
* Returns true if this query does not specify any query constraints that could remove results.
*/
public boolean matchesAllDocuments() {
return filters.isEmpty() && limit == NO_LIMIT && startAt == null && endAt == null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix isn't exactly right because users can explicitly orderBy("__name__") which is the one case where we don't lose any entries. I think it's:

FieldPath firstField = getFirstOrderByField();
firstField == null || (firstField.isKeyField() && getExplicitOrderBy().size() == 1);


/** Add a document to local cache but not the remote key mapping. */
private void addDocumentToLocalResult(Document doc) {
remoteDocumentCache.add(doc, doc.getVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for consistency. Note that I wrapped addDocumentToRemoteResult in a transaction since the LRU accounting throws a NPE without it.

addDocumentToRemoteResult(MATCHING_DOC_A);
addDocumentToRemoteResult(MATCHING_DOC_B);

DocumentSet docs = runQuery(query, queryData, /* expectIndexFree= */ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're already spelling out /* expectIndexFree= */ at each call site and just assigning to a member here you might consider doing this separately, like so:

IndexFreeQueryEngineTest expectIndexFree() {
  expectIndexFreeExecution = true;
  return this;
}

Then your callers can be

DocumentSet docs = expectIndexFree().runQuery(query, runData);

This is a totally optional minor thing, but eliminates passing a boolean literal, which always seems suspect in my book.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to use Lambdas. It now looks like

expectIndexFreeQuery(() -> runQuery(query, queryData));

or

expectFullCollectionQuery(() -> runQuery(query, queryData));

QueryData originalQueryData = queryData(query, /* hasLimboFreeSnapshot= */ true);

addDocumentToRemoteResult(MATCHING_DOC_A);
addDocumentToRemoteResult(NON_MATCHING_DOC_B);
Copy link
Contributor

@wilhuff wilhuff Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my earlier question: I don't understand how this happens. Is this shorthand for the query result initially matching B, but then an update changed B to not match?

Also, all versions of document B are at version 2 and changing UPDATED_MATCHING_DOC_B doesn't change that. I guess this happens to work because queryData forces the initial query to version 1, but that means that we never find B except because it's been updated.

Is this the premise of the test? If so it'd be worth spelling this out in comments.

Also: "remote" and "local" don't really capture the difference between these operations that well since both cases are updating the remote document cache. One case is setting which documents match the query and the other is performing an out-of-band update I'd suggest renaming these initiallyMatched (or matchesQuery or addToQueryResult) vs otherQueryUpdates (or updateSeparately, updateByOtherQuery, updateOutOfBand, or something similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test case, I can just use two matching documents in the beginning and overwrite one to be non-matching. There is a similar test case where that does quite achieve the same test coverage, I added a comment here (doesNotUseInitialResultsForLimitQueryWithDocumentRemoval()).

Furthermore, I split the addDocumentToRemoteResult() into two calls. We now have addDocument() and persistQueryMapping() which should hopefully make it much clearer what is going on.

public void testUsesTargetMappingToExecuteQueries() {
// This test verifies that once a target mapping has been written, only documents that match
// the query are read from the RemoteDocumentCache.
assumeTrue(queryEngine instanceof IndexFreeQueryEngine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference either but it seems like we should at least handle query engine and GC the same way. assumeFalse(garbageCollectorIsEager) or assumeTrue(garbageCollectorNonEager())?

// Limit queries are not eligible for index-free query execution if any part of the result was
// modified after we received the last query snapshot. This makes sure that we re-populate the
// view with older documents that may sort before the modified document.
if (query.hasLimit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems pretty reasonable.

@schmidt-sebastian
Copy link
Contributor Author

/test check-changed

1 similar comment
@schmidt-sebastian
Copy link
Contributor Author

/test check-changed

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Aug 18, 2019

@schmidt-sebastian: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
device-check-changed 10f0c8e link /test device-check-changed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Aug 21, 2019
@schmidt-sebastian
Copy link
Contributor Author

/retest device-check-changed

@schmidt-sebastian
Copy link
Contributor Author

Test failure is for a RTDB test (persistedServerValuesAreResolved).

@schmidt-sebastian schmidt-sebastian merged commit 715f3f0 into mrschmidt/indexfree-master Aug 21, 2019
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/indexfree-query-engine branch August 27, 2019 23:45
@firebase firebase locked and limited conversation to collaborators Oct 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants