-
Notifications
You must be signed in to change notification settings - Fork 616
Performance: Don't deserialize full document Proto for Query execution #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note: This pulls in firebase/firebase-js-sdk#1903 to fix the unit tests since we now always assert that no Document proto has a zero-version, regardless of source. |
c677659
to
fd00f7b
Compare
fd00f7b
to
8d31c26
Compare
firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/model/mutation/SetMutation.java
Outdated
Show resolved
Hide resolved
} | ||
value = value.getMapValue().getFieldsMap().get(path.getSegment(i)); | ||
} | ||
return value != null ? serializer.decodeValue(value) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our OrderBy
implementation does not implement a Schwartzian transform so we'll be performing these decodes for each comparison. For a large collection this could potentially eat into the gains you're observing here. I wonder if a cache of FieldPath
to FieldValue
would hurt here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a cache - but this has to be a ConcurrentHashMap since this function can be called from user land. Please do shout if that is not actually an issue :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.
Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the impact of this caching will be hard to quantify. I thought about adding this initially, but was only won over by your comments earlier. We do re-run getField() excessively during the computation of our Views.
I changed the code to lazy-init the field value cache. I do want to change the comparator to compare on the Protos directly, at which point we can remove the cache again.
firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java
Outdated
Show resolved
Hide resolved
firebase-firestore/src/main/java/com/google/firebase/firestore/model/Document.java
Outdated
Show resolved
Hide resolved
7c548c1
to
4f793f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this version a LOT more, but I'm concerned about the differences in equality/hashCode, especially because this ends up having an effect on external users.
} | ||
value = value.getMapValue().getFieldsMap().get(path.getSegment(i)); | ||
} | ||
return value != null ? serializer.decodeValue(value) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.
Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?
} | ||
private @Nullable final com.google.firestore.v1.Document proto; | ||
private @Nullable final Function<Value, FieldValue> converter; | ||
private ObjectValue objectValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nullable
, right? (in the case of the second constructor it stays null initially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Done.
@@ -45,12 +45,13 @@ public void testEquals() { | |||
assertNotEquals(base, differentData); | |||
assertNotEquals(base, fromCache); | |||
|
|||
// Note: `base` and `differentData` have the same hash code since we no longer take document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be right. equals
and hashCode
have to be consistent or all kinds of hell breaks loose when you use these in hash sets or as keys in a map. If two objects are equal for equality purposes, the contract for hashCode requires that the hashCodes be equal too.
We can't just ignore data
in equality. There are a few cases to consider:
- When version=
SnapshotVersion.MIN
the documents are just made-up, so data must play a roll - When
DocumentState
isn't synced, the documents are potentially dirty even if they have a version
However if we have a non-MIN
version and the DocumentState
is SYNCED
then we could ignore the data for equality and hashing purposes. This would preserve what you're after here which is avoiding rehydrating the data in the common case of loading a value from the remote document cache.
What do you think of implementing both equality and hashCode according to the principle above? I think that would give most (if not all) the benefit without compromising the contract of equals/hashCode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. This is a good comment, and I 100% agree with you, but I don't think I broke the contract here in the way that you stated. I removed data
from hash code (I am still using document key and version). This might cause more hash collisions, but the invariant that two equal objects have the same hash code should remain intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.
An alternative phrasing that might be less confusing is:
Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular
base
anddifferentData
have a hash collision because we don't use data in the hashCode.
You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode()
. That way someone won't see that the value is unused in the future and then accidentally "fix" it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea about the additional comment. I added it to Document.hashCode()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -45,12 +45,13 @@ public void testEquals() { | |||
assertNotEquals(base, differentData); | |||
assertNotEquals(base, fromCache); | |||
|
|||
// Note: `base` and `differentData` have the same hash code since we no longer take document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.
An alternative phrasing that might be less confusing is:
Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular
base
anddifferentData
have a hash collision because we don't use data in the hashCode.
You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode()
. That way someone won't see that the value is unused in the future and then accidentally "fix" it.
The PR adds a lazy-deserialization option to Documents. This allows Query filter evaluation without parsing the entire document Proto.
Speedup of a Query that returns 2 documents out of a collection with 1700 documents:
These numbers, as always, are from my trusted Nexus 5X.