Skip to content

Don't deserialize full Document proto for Query execution #2115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 29, 2019

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Aug 26, 2019

The PR adds a lazy-deserialization option to Documents. This allows Query filter evaluation without parsing the entire document Proto.

This is a port of firebase/firebase-android-sdk#561 with two additions:

  • Document.field() also caches cache misses.
  • Document.data is now Document.data() since we perform an expensive conversion.

Stats from Chrome:

  • 10 documents out of a 1000, 10 iterations: Average with PR 317ms, without 356ms.
  • 980 documents out of a 1000, 10 iterations: Average with PR 376ms, without 472ms.

@schmidt-sebastian schmidt-sebastian force-pushed the mrschmidt/dontdeserialize branch from 531f1c1 to 8a276f2 Compare August 27, 2019 03:08
Copy link
Contributor

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do some simple benchmarks?

  • similar to the Android PR, one where only a few documents are returned from a large collection;
  • one where all or almost all documents are returned from a large collection (to make sure there's no performance regression).

if (protoValue.mapValue) {
protoValue = protoValue.mapValue.fields[path.get(i)];
} else {
protoValue = undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop would be clearer if it used break:

for (let i = 1; i < path.length; ++i) {
  if (!protoValue || !protoValue.mapValue) {
    break;
  }
  protoValue = protoValue.mapValue.fields[path.get(i)];
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you don't like my hidden second loop condition. I moved this to a helper function, which allows me to return undefined instead of breaking (which is important since we don't want to use protoValue if it is not of map type).

@@ -718,15 +723,13 @@ export class JsonProtoSerializer {
const entityChange = change.documentChange!;
const key = this.fromName(entityChange.document!.name!);
const version = this.fromVersion(entityChange.document!.updateTime!);
const fields = this.fromFields(entityChange.document!.fields || {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so this removes an existing optimization which, IIUC, is for an unrelated use case. What's the context here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization this comment eludes to is passing in both fields and entityChange.document, specifically the fact that we pass in the Proto since we need it again later. The change in this PR is about trying to always pass the Proto, and constructing fields only as we need it.

* serialization). Might be undefined.
*/
readonly proto?: api.Document
private objectValue?: ObjectValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever valid to pass both the objectValue and the proto/converter pair? If not, would it be worthwhile to assert that either only objectValue or else both proto and converter are given? (perhaps that could supersede the assertions inside field and data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It theoretically allows you to memoize both the proto and the objectValue, which helps when we write an object back to persistence. I changed the asserts in this PR to assert in the constructor that proto and converter are set if objectValue is not defined.

let fieldValue = this.fieldValueCache.get(canonicalPath);

if (fieldValue === undefined) {
// Instead of deserializing the full Document proto, we only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty long now. Perhaps split this part into a helper function?

if (fieldValue === undefined) {
  return parseFieldValue(path, canonicalPath);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the code that extract the Proto value from the protobuf into a separate function, which also helps with the control flow of the loop.

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats from Chrome:

10 documents out of a 1000, 10 iterations: Average with PR 317ms, without 356ms.
980 documents out of a 1000, 10 iterations: Average with PR 376ms, without 472ms.

if (protoValue.mapValue) {
protoValue = protoValue.mapValue.fields[path.get(i)];
} else {
protoValue = undefined;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you don't like my hidden second loop condition. I moved this to a helper function, which allows me to return undefined instead of breaking (which is important since we don't want to use protoValue if it is not of map type).

* serialization). Might be undefined.
*/
readonly proto?: api.Document
private objectValue?: ObjectValue,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It theoretically allows you to memoize both the proto and the objectValue, which helps when we write an object back to persistence. I changed the asserts in this PR to assert in the constructor that proto and converter are set if objectValue is not defined.

let fieldValue = this.fieldValueCache.get(canonicalPath);

if (fieldValue === undefined) {
// Instead of deserializing the full Document proto, we only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the code that extract the Proto value from the protobuf into a separate function, which also helps with the control flow of the loop.

@@ -718,15 +723,13 @@ export class JsonProtoSerializer {
const entityChange = change.documentChange!;
const key = this.fromName(entityChange.document!.name!);
const version = this.fromVersion(entityChange.document!.updateTime!);
const fields = this.fromFields(entityChange.document!.fields || {});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization this comment eludes to is passing in both fields and entityChange.document, specifically the fact that we pass in the Proto since we need it again later. The change in this PR is about trying to always pass the Proto, and constructing fields only as we need it.

}

return fieldValue!;
}
}

fieldValue(path: FieldPath): unknown {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this since it was basically unused. I didn't want to many small function here.

Copy link
Contributor

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running the numbers! Can you add them to this PR's description?

@@ -180,6 +160,27 @@ export class Document extends MaybeDocument {
return this.hasLocalMutations || this.hasCommittedMutations;
}

/**
* Returns a the nested Protobuf value for 'path`. Can only be called if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/a the/the/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.proto !== undefined,
"Can only call getProtoField() when proto is defined"
);
let protoValue: api.Value | undefined = this.proto!.fields[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add an empty line after assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return undefined;
}
protoValue = protoValue.mapValue.fields[path.get(i)];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add an empty line right after the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@schmidt-sebastian schmidt-sebastian merged commit c7e5012 into master Aug 29, 2019
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/dontdeserialize branch August 29, 2019 04:02
schmidt-sebastian added a commit that referenced this pull request Sep 11, 2019
This broke when data became data() in #2115
schmidt-sebastian added a commit that referenced this pull request Sep 12, 2019
This broke when data became data() in #2115
@firebase firebase locked and limited conversation to collaborators Oct 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants