Performance: Don't deserialize full document Proto for Query execution #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

schmidt-sebastian merged 10 commits into master from mrschmidt/querywithoutproto

Jun 27, 2019

Contributor

schmidt-sebastian commented Jun 21, 2019 •

edited

Loading

The PR adds a lazy-deserialization option to Documents. This allows Query filter evaluation without parsing the entire document Proto.

Speedup of a Query that returns 2 documents out of a collection with 1700 documents:

With master: 3644ms, 3591ms, 3577ms, 3616ms, 3603ms (avg 3606ms)
With PR: 3012ms, 2881ms, 2894ms, 2929ms, 2888ms (avg 2920ms)

These numbers, as always, are from my trusted Nexus 5X.


          Don't deserialize full document Proto for Query execution

48dcf7e

googlebot added the cla: yes label

google-oss-bot added the size/L label

schmidt-sebastian mentioned this pull request

Don't store document with 0 version in Spec Test firebase/firebase-js-sdk#1903

Merged

Contributor Author

schmidt-sebastian commented Jun 21, 2019

Note: This pulls in firebase/firebase-js-sdk#1903 to fix the unit tests since we now always assert that no Document proto has a zero-version, regardless of source.

schmidt-sebastian force-pushed the mrschmidt/querywithoutproto branch from c677659 to fd00f7b Compare

June 21, 2019 23:42


          Fix unit tests

8d31c26

schmidt-sebastian force-pushed the mrschmidt/querywithoutproto branch from fd00f7b to 8d31c26 Compare

June 21, 2019 23:43

schmidt-sebastian requested a review from wilhuff

June 21, 2019 23:43

schmidt-sebastian assigned wilhuff

schmidt-sebastian added 2 commits

June 21, 2019 16:58


          Fix Kotlin

9f3e2e7


          Update Spec tests

c76f65a

schmidt-sebastian changed the title ~~Don't deserialize full document Proto for Query execution~~ Performance: Don't deserialize full document Proto for Query execution

wilhuff reviewed

View reviewed changes

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated Show resolved Hide resolved

firebase-firestore/src/main/java/com/google/firebase/firestore/model/mutation/SetMutation.java Outdated Show resolved Hide resolved

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated

+                          }
+                          value = value.getMapValue().getFieldsMap().get(path.getSegment(i));
+                        }
+                        return value != null ? serializer.decodeValue(value) : null;

Contributor

wilhuff Jun 24, 2019

Our OrderBy implementation does not implement a Schwartzian transform so we'll be performing these decodes for each comparison. For a large collection this could potentially eat into the gains you're observing here. I wonder if a cache of FieldPath to FieldValue would hurt here?

Contributor Author

schmidt-sebastian Jun 25, 2019

I added a cache - but this has to be a ConcurrentHashMap since this function can be called from user land. Please do shout if that is not actually an issue :)

Contributor

wilhuff Jun 25, 2019

Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.

Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?

Contributor Author

schmidt-sebastian Jun 26, 2019

I think the impact of this caching will be hard to quantify. I thought about adding this initially, but was only won over by your comments earlier. We do re-run getField() excessively during the computation of our Views.

I changed the code to lazy-init the field value cache. I do want to change the comparator to compare on the Protos directly, at which point we can remove the cache again.

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated Show resolved Hide resolved

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated Show resolved Hide resolved

wilhuff assigned schmidt-sebastian and unassigned wilhuff

schmidt-sebastian added the api: firestore label

schmidt-sebastian added 2 commits

June 24, 2019 21:30


          Merge branch 'master' into mrschmidt/querywithoutproto

d60bb85


          Use only one Document class

4f793f0

schmidt-sebastian force-pushed the mrschmidt/querywithoutproto branch from 7c548c1 to 4f793f0 Compare

June 25, 2019 21:34

schmidt-sebastian assigned wilhuff and unassigned schmidt-sebastian

wilhuff reviewed

View reviewed changes

Contributor

wilhuff left a comment

I like this version a LOT more, but I'm concerned about the differences in equality/hashCode, especially because this ends up having an effect on external users.

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated

+                          }
+                          value = value.getMapValue().getFieldsMap().get(path.getSegment(i));
+                        }
+                        return value != null ? serializer.decodeValue(value) : null;

Contributor

wilhuff Jun 25, 2019

Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.

Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?

firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java Outdated

-                }
+                private @Nullable final com.google.firestore.v1.Document proto;
+                private @Nullable final Function<Value, FieldValue> converter;
+                private ObjectValue objectValue;

Contributor

wilhuff Jun 25, 2019

@Nullable, right? (in the case of the second constructor it stays null initially).

Contributor Author

schmidt-sebastian Jun 26, 2019

Yep. Done.

firebase-firestore/src/test/java/com/google/firebase/firestore/DocumentSnapshotTest.java Outdated

@@ @@ -45,12 +45,13 @@ public void testEquals() { @@
                   assertNotEquals(base, differentData);
                   assertNotEquals(base, fromCache);
+                  // Note: `base` and `differentData` have the same hash code since we no longer take document

Contributor

wilhuff Jun 25, 2019

This can't be right. equals and hashCode have to be consistent or all kinds of hell breaks loose when you use these in hash sets or as keys in a map. If two objects are equal for equality purposes, the contract for hashCode requires that the hashCodes be equal too.

We can't just ignore data in equality. There are a few cases to consider:

When version=SnapshotVersion.MIN the documents are just made-up, so data must play a roll
When DocumentState isn't synced, the documents are potentially dirty even if they have a version

However if we have a non-MIN version and the DocumentState is SYNCED then we could ignore the data for equality and hashing purposes. This would preserve what you're after here which is avoiding rehydrating the data in the common case of loading a value from the remote document cache.

What do you think of implementing both equality and hashCode according to the principle above? I think that would give most (if not all) the benefit without compromising the contract of equals/hashCode.

Contributor Author

schmidt-sebastian Jun 25, 2019

Hrm. This is a good comment, and I 100% agree with you, but I don't think I broke the contract here in the way that you stated. I removed data from hash code (I am still using document key and version). This might cause more hash collisions, but the invariant that two equal objects have the same hash code should remain intact.

Contributor

wilhuff Jun 27, 2019

Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.

An alternative phrasing that might be less confusing is:

Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular base and differentData have a hash collision because we don't use data in the hashCode.

You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode(). That way someone won't see that the value is unused in the future and then accidentally "fix" it.

Contributor Author

schmidt-sebastian Jun 27, 2019

Good idea about the additional comment. I added it to Document.hashCode().

wilhuff assigned schmidt-sebastian and unassigned wilhuff


          Address feedback

f559f72

schmidt-sebastian assigned wilhuff and unassigned schmidt-sebastian


          Fix Kotlin

a7de38f

wilhuff approved these changes

View reviewed changes

Contributor

wilhuff left a comment

LGTM

firebase-firestore/src/test/java/com/google/firebase/firestore/DocumentSnapshotTest.java Outdated

@@ @@ -45,12 +45,13 @@ public void testEquals() { @@
                   assertNotEquals(base, differentData);
                   assertNotEquals(base, fromCache);
+                  // Note: `base` and `differentData` have the same hash code since we no longer take document

Contributor

wilhuff Jun 27, 2019

Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.

An alternative phrasing that might be less confusing is:

Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular base and differentData have a hash collision because we don't use data in the hashCode.

You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode(). That way someone won't see that the value is unused in the future and then accidentally "fix" it.

wilhuff assigned schmidt-sebastian and unassigned wilhuff

schmidt-sebastian added 2 commits

June 27, 2019 09:11


          Merge branch 'master' into mrschmidt/querywithoutproto

22b2c25


          Comment

f50e19a

schmidt-sebastian merged commit c11a7a4 into master

schmidt-sebastian mentioned this pull request

Firestore queries performance problem #565

Closed

schmidt-sebastian deleted the mrschmidt/querywithoutproto branch

July 3, 2019 15:46

schmidt-sebastian mentioned this pull request

Don't deserialize full Document proto for Query execution firebase/firebase-js-sdk#2115

Merged

firebase locked and limited conversation to collaborators

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: firestore cla: yes size/L