Port performance optimizations to speed up reading large collections from Android #1433

var-const · 2018-12-18T02:39:33Z

Straightforward port of firebase/firebase-android-sdk#123.

var-const · 2018-12-18T02:40:46Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+      documentKeys.first()!.path.toArray(),
+      documentKeys.last()!.path.toArray()
+    );
+    let key = documentKeys.first();


This is built upon the idea that control.skip() makes the algorithm more efficient. Will be happy with any feedback.

Makes sense to me. I might name it nextKey though?

var-const · 2018-12-18T02:52:10Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+   *     found, the key will be mapped to null) and a map of sizes indexed by
+   *     key (zero if the key cannot be found).
+   */
+  getSizedEntries(


Another approach would be for getSizedEntries to return a SortedMap<DocumentKey, DocumentSizeEntry>. I decided in favor of returning two maps because it makes it easier to avoid code duplication between getEntries and getSizedEntries.

Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>. You can avoid code duplication and avoid doing extra work for sizes that way.

I tried it out, but I'm not sure I prefer it. The problem is that getEntries in RemoteDocumentChangeBuffer won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn that either updates the sizeMap or is a no-op).

Something like that makes sense, but since it deals with DbRemoteDoc, I'd keep it internal and have getEntries() and getSizedEntries() functions that wrap it.

I think what Greg is recommending is basically a mapDbEntries() function, but it might be a little simpler to instead have it be a forEachDbEntry(transaction, documentKeys, callback) function that iterates the matching documents and just calls the callback with each raw dbRemoteDocument. That may mean a little bit of redundant code for getEntries() and getSizedEntries() to build up their respective maps, but it seems simpler to me (and perhaps more generically useful, if we had a case where we don't necessarily want to build up a map).

Done, please take a look.

var-const · 2018-12-18T02:52:46Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+              key!,
+              this.serializer.fromDbRemoteDocument(dbRemoteDoc)
+            );
+            sizeMap = sizeMap.insert(key!, dbDocumentSize(dbRemoteDoc));


Whether to update the sizeMap could be controlled by a flag to avoid doing unnecessary work when this function is invoked via getEntries, but it might be overkill.

var-const · 2018-12-18T02:56:24Z

packages/firestore/src/local/remote_document_change_buffer.ts

+  ): PersistencePromise<NullableMaybeDocumentMap> {
+    const changes = this.assertChanges();
+
+    // Record the size of everything we load from the cache so we can compute a delta later.


Question: I could add code to look up the keys in the buffer. I decided to postpone it because it will necessitate some logic to merge the buffer lookup results with results from cache, so I'd like to check beforehand if you think this optimization is worthwhile.

var-const · 2018-12-18T02:57:58Z

Michael, I'll port the tests and measure performance numbers tomorrow, but I think it should be in good enough shape for review. Thanks!

gsoltis · 2018-12-18T17:20:18Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+    return remoteDocumentsStore(transaction)
+      .iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => {
+        const potentialKey = DocumentKey.fromSegments(potentialKeyRaw);
+        while (DocumentKey.comparator(key!, potentialKey) != 1) {


I would use < 1 instead. Even though this comparison function might return {-1, 0, 1}, some comparison functions return { < 0, 0, > 0}, which will break here for values > 1

Done, thanks.

+1 to this except I'd use <= 0 😁 (technically the comparator could return 0.5 or something).

gsoltis · 2018-12-18T17:28:14Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+   *     found, the key will be mapped to null) and a map of sizes indexed by
+   *     key (zero if the key cannot be found).
+   */
+  getSizedEntries(


Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>. You can avoid code duplication and avoid doing extra work for sizes that way.

gsoltis · 2018-12-18T17:32:36Z

packages/firestore/src/local/local_documents_view.ts

+    batches: MutationBatch[]
+  ): PersistencePromise<NullableMaybeDocumentMap> {
+    let results = nullableMaybeDocumentMap();
+    return new PersistencePromise<NullableMaybeDocumentMap>(resolve => {


I don't think you have to do this in a closure. Consider:

let results = nullableMaybeDocumentMap(): docs.forEach((key, localView) => { for (const batch of batches) { localView = batch.applyToLocalView(key, localView); } results = results.insert(key, localView); }); return PersistencePromise.resolve(results);

Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results directly?

Going further, I don't think this function needs to take a transaction or return a PersistencePromise. It can just be synchronous.

Note that in some cases we do have functions that are "needlessly" asynchronous and return a PersistencePromise when they could be synchronous. But we typically do this for public functions where we want to reserve the ability to make them asynchronous in the future, and so we want the consuming component to deal with them as an asynchronous function.

But since this is a private function, I wouldn't worry about future-proofing.

@var-const In general, JavaScript is completely single-threaded (there's no way to block other than by spinning the CPU) and so wrapping a computation in a promise doesn't really help (in particular it won't enable any parallelism).

And PersistencePromise is extra weird because it's a specially-designed Promise-like construct that tries to be as synchronous as possible because IndexedDb has weird semantics where in the completion for one operation you must synchronously start the next operation or else your transaction will auto-close. So even using PersistencePromise, this code is actually 100% synchronous. So you can go ahead and just yank PersistencePromise out.

JavaScript is completely single-threaded

Right. Thanks, made this function return the result directly.

gsoltis · 2018-12-18T17:36:56Z

packages/firestore/src/local/local_documents_view.ts

+    transaction: PersistenceTransaction,
+    baseDocs: NullableMaybeDocumentMap
+  ): PersistencePromise<MaybeDocumentMap> {
+    let allKeys = documentKeySet();


If you don't want to construct a new set just to pass the keys, you could change the signature for getAllMutationBatchesAffectingDocumentKeys to accept a map of DocumentKey to {}. That indicates that you don't care about the values in the map, as you won't have their type, but you will still be able to iterate through the keys. Alternatively, I would consider defining a .keySet() method for SortedMap, since iterating through and simultaneously building up a new immutable set is not very efficient.

Done (I had to use any, though, compiler complained about MaybeDocument | null not being assignable to {}).

Can you try using AnyJs instead of any?

any is dangerous in that it basically opts out of typechecking. So if you type the values as any, TypeScript will let you do anything with the values (call nonexistent methods, etc.) which is the opposite of what we want (we don't want to be able to do anything with the values).

AnyJs is a type we introduced essentially meant to be a supertype of any JS type (similar to Object in java). So TypeScript won't let you do anything with it without doing typeof / instanceof checks to narrow it down.

TypeScript 3.x actually introduced unknown which is similar to AnyJs and I'm hoping we can migrate to it soon.

Sorry, missed this comment. Done now.

gsoltis · 2018-12-18T17:42:02Z

packages/firestore/src/local/local_serializer.ts

@@ -71,7 +71,12 @@ export class LocalSerializer {
  /** Encodes a document for storage locally. */
  toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument {
    if (maybeDoc instanceof Document) {
-      const doc = this.remoteSerializer.toDocument(maybeDoc);
+      let doc: api.Document;


Up to you, but this bit could be shortened to:

const doc = maybeDoc.proto ? maybeDoc.proto : this.remoteSerializer.toDocument(maybeDoc);

or even:

const doc = maybeDoc.proto || this.remoteSerializer.toDocument(maybeDoc);

Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)

gsoltis · 2018-12-18T17:47:20Z

packages/firestore/src/model/document.ts

+     * Memoized serialized form of the document for optimization purposes (avoids repeated
+     * serialization). Might be undefined.
+     */
+    readonly proto?: api.Document


Is there a concern with increased memory usage? Is there a way we can be more certain that we aren't keeping the serialized form around indefinitely in, say, a view?

Hmm, I'm not sure about the full life cycle of the Document, but some points:

the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;

this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?

perhaps local serializer should reset the memoized proto after writing it to storage?

FWIW- If we are, I don't think it's the end of the world. So I wouldn't go terribly far out-of-our-way to guarantee this unless it shows up as a problem.

gsoltis · 2018-12-18T17:51:17Z

packages/firestore/src/util/sorted_set.ts

+    if (!iter.hasNext()) {
+      return null;
+    }
+    if (this.comparator(iter.peek()!.key, elem) == 0) {


gsoltis · 2018-12-18T17:55:20Z

packages/firestore/src/util/sorted_set.ts

+    if (!iter.hasNext()) {
+      return null;
+    }
+    if (this.comparator(iter.peek()!.key, elem) == 0) {


Also, I think maybe you can do this without .peek() ?

const next = iter.getNext(); if (this.comparator(next.key, elem) !== 0) { return key; } else if (this.hasNext()) { return this.getNext().key; } else { return null; }

Done, thanks.

gsoltis · 2018-12-18T18:00:30Z

packages/firestore/src/model/collections.ts

+  return maybeDocumentMap();
+}
+
+export type DocumentSizeEntries = {


I am nervous about this type. I know currently that the keys are kept in sync, but there is nothing here that guarantees it.

That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?

var-const

@gsoltis Greg, thanks for the review. Addressed your comments, will address Michael's comments tomorrow.

var-const · 2018-12-18T22:21:59Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+    return remoteDocumentsStore(transaction)
+      .iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => {
+        const potentialKey = DocumentKey.fromSegments(potentialKeyRaw);
+        while (DocumentKey.comparator(key!, potentialKey) != 1) {


Done, thanks.

var-const · 2018-12-18T22:23:02Z

packages/firestore/src/local/local_documents_view.ts

+    batches: MutationBatch[]
+  ): PersistencePromise<NullableMaybeDocumentMap> {
+    let results = nullableMaybeDocumentMap();
+    return new PersistencePromise<NullableMaybeDocumentMap>(resolve => {


Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results directly?

var-const · 2018-12-18T22:24:33Z

packages/firestore/src/local/local_serializer.ts

@@ -71,7 +71,12 @@ export class LocalSerializer {
  /** Encodes a document for storage locally. */
  toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument {
    if (maybeDoc instanceof Document) {
-      const doc = this.remoteSerializer.toDocument(maybeDoc);
+      let doc: api.Document;


Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)

var-const · 2018-12-18T22:30:39Z

packages/firestore/src/util/sorted_set.ts

+    if (!iter.hasNext()) {
+      return null;
+    }
+    if (this.comparator(iter.peek()!.key, elem) == 0) {


Done, thanks.

var-const · 2018-12-18T23:02:45Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+   *     found, the key will be mapped to null) and a map of sizes indexed by
+   *     key (zero if the key cannot be found).
+   */
+  getSizedEntries(


I tried it out, but I'm not sure I prefer it. The problem is that getEntries in RemoteDocumentChangeBuffer won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn that either updates the sizeMap or is a no-op).

var-const · 2018-12-18T23:08:53Z

packages/firestore/src/local/local_documents_view.ts

+    transaction: PersistenceTransaction,
+    baseDocs: NullableMaybeDocumentMap
+  ): PersistencePromise<MaybeDocumentMap> {
+    let allKeys = documentKeySet();


Done (I had to use any, though, compiler complained about MaybeDocument | null not being assignable to {}).

var-const · 2018-12-18T23:20:19Z

packages/firestore/src/model/document.ts

+     * Memoized serialized form of the document for optimization purposes (avoids repeated
+     * serialization). Might be undefined.
+     */
+    readonly proto?: api.Document


Hmm, I'm not sure about the full life cycle of the Document, but some points:

the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;

this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?

perhaps local serializer should reset the memoized proto after writing it to storage?

var-const · 2018-12-18T23:21:17Z

packages/firestore/src/model/collections.ts

+  return maybeDocumentMap();
+}
+
+export type DocumentSizeEntries = {


That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?

var-const · 2018-12-21T01:37:51Z

Tried to do some performance measuring (I had a lot of trouble with Chrome apparently somehow caching the Firestore code being used, which only got resolved by restarting the browser. After that, the behavior seemed to be consistent, though I'm still slightly unsure).

Like other platforms, I tried reading a collection consisting of 1K somewhat large documents. Numbers fluctuate quite a bit:

master -- 1.8s - 3.1s (raw numbers: 3104ms 2433ms 2863ms 2447ms 2545ms 1916ms 1894ms 1810ms 1865ms);
branch -- 1.6s - 2s (raw numbers: 1559ms 1635ms 1915ms 2012ms 1734ms 1702ms 1733ms 1736ms 1843ms). The median seems to be around 1.7s

mikelehen

LGTM with a couple nits. Thanks!

mikelehen · 2018-12-21T20:48:32Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

@@ -206,46 +218,72 @@ export class IndexedDbRemoteDocumentCache implements RemoteDocumentCache {
  ): PersistencePromise<DocumentSizeEntries> {
    let results = nullableMaybeDocumentMap();
    let sizeMap = new SortedMap<DocumentKey, number>(DocumentKey.comparator);


Sorry, now that getEntries() is no longer a wrapper around getSizedEntries(), can we drop sizeMap and have this be new SortedMap<DocumentKey, DocumentSizeEntry|null> ?

First, I don't feel strongly about this. The reason I set it up that way is so that getEntries in RemoteDocumentChangeBuffer can return the MaybeDocumentMap directly. If this were a sorted map of DocumentSizeEntrys, then getEntries would have to create a new MaybeDocumentMap and fill it with just MaybeDocuments.

It's probably not a big deal, so if you think code clarity is more important here, I'll do the change.

Oh, sorry! I was confused. I thought it was just so the old getEntries() implementation could return the MaybeDocumentMap directly. But I see now that RemoteDocumentChangeBuffer.getEntries() ends up calling getSizedEntries() and using the sizes and also passing the MaybeDocumentMap straight through. So it needs both, and the way it's structured right now makes sense.

So nevermind. Please keep it the way it is.

mikelehen · 2018-12-21T20:48:41Z

packages/firestore/src/local/indexeddb_remote_document_cache.ts

+    });
+  }
+
+  forEachDbEntry(


private please

mikelehen · 2018-12-21T20:53:09Z

packages/firestore/src/local/remote_document_change_buffer.ts

@@ -108,8 +108,7 @@ export abstract class RemoteDocumentChangeBuffer {
  }

  /**
-   * Looks up several entries in the cache.
-   * checked, and if no buffered change applies, this will forward to
+   * Looks up several entries in the cache, orwarding to


forwarding :)

mikelehen

Still LGTM. THanks!

var-const added 15 commits December 13, 2018 16:54

1

3da020e

Compiles

9717fef

Compiles, pt 2

204443e

Compiles, pt 3m

c56820b

[AUTOMATED]: Prettier Code Styling

f3174ca

Some fixes

36c53d3

Very hacky version works

942b0fa

Most unit tests pass

7767b60

[AUTOMATED]: Prettier Code Styling

a1ad6d2

Undo temp/accidental changes

cc29731

applyRemoteEvent, sized entries

307e0b3

Fix failing tests

62c627b

Serializer

8f293eb

[AUTOMATED]: Prettier Code Styling

4e23f3b

small cleanup

fb751dd

var-const requested review from gsoltis, mikelehen, rsgowman, schmidt-sebastian and wilhuff as code owners December 18, 2018 02:39

google-oss-bot added the needs-triage label Dec 18, 2018

var-const requested a review from zxu123 as a code owner December 18, 2018 02:39

var-const added 2 commits December 17, 2018 21:47

Fix accidental

431f618

Comment

1681f3d

var-const commented Dec 18, 2018

View reviewed changes

var-const assigned mikelehen Dec 18, 2018

var-const added api: firestore and removed needs-triage labels Dec 18, 2018

gsoltis suggested changes Dec 18, 2018

View reviewed changes

var-const added 2 commits December 18, 2018 22:54

Review feedback 1, test

55c7ff3

[AUTOMATED]: Prettier Code Styling

5afa305

var-const commented Dec 19, 2018

View reviewed changes

mikelehen assigned var-const and unassigned mikelehen Dec 19, 2018

var-const added 4 commits December 20, 2018 16:55

Review feedback 1

56eefbc

Review feedback 2

5e506fa

[AUTOMATED]: Prettier Code Styling

7b11cec

Review feedback 3

1a32b22

var-const assigned mikelehen and gsoltis and unassigned var-const Dec 21, 2018

mikelehen approved these changes Dec 21, 2018

View reviewed changes

mikelehen removed their assignment Dec 21, 2018

Review feedback

17a69a6

var-const assigned mikelehen and gsoltis and unassigned gsoltis Dec 22, 2018

var-const added 4 commits December 21, 2018 21:01

Appease linter

4c270c7

Fix node tests

512667e

Appease linter 2

6cb4bfb

[AUTOMATED]: Prettier Code Styling

77bf92e

mikelehen approved these changes Dec 22, 2018

View reviewed changes

mikelehen removed their assignment Dec 22, 2018

var-const added 2 commits December 21, 2018 22:58

Comment

e7b8c8e

Merge branch 'master' into varconst/port-android-1000-reads-3

a0d25f5

gsoltis approved these changes Dec 22, 2018

View reviewed changes

var-const merged commit 1888bd7 into master Dec 22, 2018

firebase locked and limited conversation to collaborators Oct 14, 2019

Port performance optimizations to speed up reading large collections from Android #1433

Port performance optimizations to speed up reading large collections from Android #1433

Uh oh!

Conversation

var-const commented Dec 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

var-const commented Dec 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

var-const left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment