Skip to content

Port performance optimizations to speed up reading large collections from Android #1433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Dec 22, 2018

Conversation

var-const
Copy link
Contributor

Straightforward port of firebase/firebase-android-sdk#123.

documentKeys.first()!.path.toArray(),
documentKeys.last()!.path.toArray()
);
let key = documentKeys.first();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is built upon the idea that control.skip() makes the algorithm more efficient. Will be happy with any feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I might name it nextKey though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* found, the key will be mapped to null) and a map of sizes indexed by
* key (zero if the key cannot be found).
*/
getSizedEntries(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be for getSizedEntries to return a SortedMap<DocumentKey, DocumentSizeEntry>. I decided in favor of returning two maps because it makes it easier to avoid code duplication between getEntries and getSizedEntries.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>. You can avoid code duplication and avoid doing extra work for sizes that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out, but I'm not sure I prefer it. The problem is that getEntries in RemoteDocumentChangeBuffer won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn that either updates the sizeMap or is a no-op).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that makes sense, but since it deals with DbRemoteDoc, I'd keep it internal and have getEntries() and getSizedEntries() functions that wrap it.

I think what Greg is recommending is basically a mapDbEntries() function, but it might be a little simpler to instead have it be a forEachDbEntry(transaction, documentKeys, callback) function that iterates the matching documents and just calls the callback with each raw dbRemoteDocument. That may mean a little bit of redundant code for getEntries() and getSizedEntries() to build up their respective maps, but it seems simpler to me (and perhaps more generically useful, if we had a case where we don't necessarily want to build up a map).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please take a look.

key!,
this.serializer.fromDbRemoteDocument(dbRemoteDoc)
);
sizeMap = sizeMap.insert(key!, dbDocumentSize(dbRemoteDoc));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether to update the sizeMap could be controlled by a flag to avoid doing unnecessary work when this function is invoked via getEntries, but it might be overkill.

): PersistencePromise<NullableMaybeDocumentMap> {
const changes = this.assertChanges();

// Record the size of everything we load from the cache so we can compute a delta later.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I could add code to look up the keys in the buffer. I decided to postpone it because it will necessitate some logic to merge the buffer lookup results with results from cache, so I'd like to check beforehand if you think this optimization is worthwhile.

@var-const
Copy link
Contributor Author

Michael, I'll port the tests and measure performance numbers tomorrow, but I think it should be in good enough shape for review. Thanks!

return remoteDocumentsStore(transaction)
.iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => {
const potentialKey = DocumentKey.fromSegments(potentialKeyRaw);
while (DocumentKey.comparator(key!, potentialKey) != 1) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use < 1 instead. Even though this comparison function might return {-1, 0, 1}, some comparison functions return { < 0, 0, > 0}, which will break here for values > 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this except I'd use <= 0 😁 (technically the comparator could return 0.5 or something).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* found, the key will be mapped to null) and a map of sizes indexed by
* key (zero if the key cannot be found).
*/
getSizedEntries(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>. You can avoid code duplication and avoid doing extra work for sizes that way.

batches: MutationBatch[]
): PersistencePromise<NullableMaybeDocumentMap> {
let results = nullableMaybeDocumentMap();
return new PersistencePromise<NullableMaybeDocumentMap>(resolve => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have to do this in a closure. Consider:

let results = nullableMaybeDocumentMap():
docs.forEach((key, localView) => {
  for (const batch of batches) {
     localView = batch.applyToLocalView(key, localView);
  }
  results = results.insert(key, localView);
});
return PersistencePromise.resolve(results);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going further, I don't think this function needs to take a transaction or return a PersistencePromise. It can just be synchronous.

Note that in some cases we do have functions that are "needlessly" asynchronous and return a PersistencePromise when they could be synchronous. But we typically do this for public functions where we want to reserve the ability to make them asynchronous in the future, and so we want the consuming component to deal with them as an asynchronous function.

But since this is a private function, I wouldn't worry about future-proofing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@var-const In general, JavaScript is completely single-threaded (there's no way to block other than by spinning the CPU) and so wrapping a computation in a promise doesn't really help (in particular it won't enable any parallelism).

And PersistencePromise is extra weird because it's a specially-designed Promise-like construct that tries to be as synchronous as possible because IndexedDb has weird semantics where in the completion for one operation you must synchronously start the next operation or else your transaction will auto-close. So even using PersistencePromise, this code is actually 100% synchronous. So you can go ahead and just yank PersistencePromise out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaScript is completely single-threaded

Right. Thanks, made this function return the result directly.

transaction: PersistenceTransaction,
baseDocs: NullableMaybeDocumentMap
): PersistencePromise<MaybeDocumentMap> {
let allKeys = documentKeySet();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to construct a new set just to pass the keys, you could change the signature for getAllMutationBatchesAffectingDocumentKeys to accept a map of DocumentKey to {}. That indicates that you don't care about the values in the map, as you won't have their type, but you will still be able to iterate through the keys. Alternatively, I would consider defining a .keySet() method for SortedMap, since iterating through and simultaneously building up a new immutable set is not very efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I had to use any, though, compiler complained about MaybeDocument | null not being assignable to {}).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try using AnyJs instead of any?

any is dangerous in that it basically opts out of typechecking. So if you type the values as any, TypeScript will let you do anything with the values (call nonexistent methods, etc.) which is the opposite of what we want (we don't want to be able to do anything with the values).

AnyJs is a type we introduced essentially meant to be a supertype of any JS type (similar to Object in java). So TypeScript won't let you do anything with it without doing typeof / instanceof checks to narrow it down.

TypeScript 3.x actually introduced unknown which is similar to AnyJs and I'm hoping we can migrate to it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this comment. Done now.

@@ -71,7 +71,12 @@ export class LocalSerializer {
/** Encodes a document for storage locally. */
toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument {
if (maybeDoc instanceof Document) {
const doc = this.remoteSerializer.toDocument(maybeDoc);
let doc: api.Document;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but this bit could be shortened to:

const doc = maybeDoc.proto ? maybeDoc.proto : this.remoteSerializer.toDocument(maybeDoc);

or even:

const doc = maybeDoc.proto || this.remoteSerializer.toDocument(maybeDoc);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)

* Memoized serialized form of the document for optimization purposes (avoids repeated
* serialization). Might be undefined.
*/
readonly proto?: api.Document
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern with increased memory usage? Is there a way we can be more certain that we aren't keeping the serialized form around indefinitely in, say, a view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about the full life cycle of the Document, but some points:

  • the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;
  • this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?
  • perhaps local serializer should reset the memoized proto after writing it to storage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW- If we are, I don't think it's the end of the world. So I wouldn't go terribly far out-of-our-way to guarantee this unless it shows up as a problem.

if (!iter.hasNext()) {
return null;
}
if (this.comparator(iter.peek()!.key, elem) == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

===

if (!iter.hasNext()) {
return null;
}
if (this.comparator(iter.peek()!.key, elem) == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think maybe you can do this without .peek() ?

const next = iter.getNext();
if (this.comparator(next.key, elem) !== 0) {
  return key;
} else if (this.hasNext()) {
  return this.getNext().key;
} else {
  return null;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

return maybeDocumentMap();
}

export type DocumentSizeEntries = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am nervous about this type. I know currently that the keys are kept in sync, but there is nothing here that guarantees it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?

Copy link
Contributor Author

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsoltis Greg, thanks for the review. Addressed your comments, will address Michael's comments tomorrow.

return remoteDocumentsStore(transaction)
.iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => {
const potentialKey = DocumentKey.fromSegments(potentialKeyRaw);
while (DocumentKey.comparator(key!, potentialKey) != 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

batches: MutationBatch[]
): PersistencePromise<NullableMaybeDocumentMap> {
let results = nullableMaybeDocumentMap();
return new PersistencePromise<NullableMaybeDocumentMap>(resolve => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results directly?

@@ -71,7 +71,12 @@ export class LocalSerializer {
/** Encodes a document for storage locally. */
toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument {
if (maybeDoc instanceof Document) {
const doc = this.remoteSerializer.toDocument(maybeDoc);
let doc: api.Document;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)

if (!iter.hasNext()) {
return null;
}
if (this.comparator(iter.peek()!.key, elem) == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

* found, the key will be mapped to null) and a map of sizes indexed by
* key (zero if the key cannot be found).
*/
getSizedEntries(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out, but I'm not sure I prefer it. The problem is that getEntries in RemoteDocumentChangeBuffer won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn that either updates the sizeMap or is a no-op).

transaction: PersistenceTransaction,
baseDocs: NullableMaybeDocumentMap
): PersistencePromise<MaybeDocumentMap> {
let allKeys = documentKeySet();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I had to use any, though, compiler complained about MaybeDocument | null not being assignable to {}).

* Memoized serialized form of the document for optimization purposes (avoids repeated
* serialization). Might be undefined.
*/
readonly proto?: api.Document
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about the full life cycle of the Document, but some points:

  • the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;
  • this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?
  • perhaps local serializer should reset the memoized proto after writing it to storage?

return maybeDocumentMap();
}

export type DocumentSizeEntries = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?

@mikelehen mikelehen assigned var-const and unassigned mikelehen Dec 19, 2018
@var-const
Copy link
Contributor Author

Tried to do some performance measuring (I had a lot of trouble with Chrome apparently somehow caching the Firestore code being used, which only got resolved by restarting the browser. After that, the behavior seemed to be consistent, though I'm still slightly unsure).

Like other platforms, I tried reading a collection consisting of 1K somewhat large documents. Numbers fluctuate quite a bit:

  • master -- 1.8s - 3.1s (raw numbers: 3104ms 2433ms 2863ms 2447ms 2545ms 1916ms 1894ms 1810ms 1865ms);
  • branch -- 1.6s - 2s (raw numbers: 1559ms 1635ms 1915ms 2012ms 1734ms 1702ms 1733ms 1736ms 1843ms). The median seems to be around 1.7s

@var-const var-const assigned mikelehen and gsoltis and unassigned var-const Dec 21, 2018
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple nits. Thanks!

@@ -206,46 +218,72 @@ export class IndexedDbRemoteDocumentCache implements RemoteDocumentCache {
): PersistencePromise<DocumentSizeEntries> {
let results = nullableMaybeDocumentMap();
let sizeMap = new SortedMap<DocumentKey, number>(DocumentKey.comparator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now that getEntries() is no longer a wrapper around getSizedEntries(), can we drop sizeMap and have this be new SortedMap<DocumentKey, DocumentSizeEntry|null> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I don't feel strongly about this. The reason I set it up that way is so that getEntries in RemoteDocumentChangeBuffer can return the MaybeDocumentMap directly. If this were a sorted map of DocumentSizeEntrys, then getEntries would have to create a new MaybeDocumentMap and fill it with just MaybeDocuments.

It's probably not a big deal, so if you think code clarity is more important here, I'll do the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry! I was confused. I thought it was just so the old getEntries() implementation could return the MaybeDocumentMap directly. But I see now that RemoteDocumentChangeBuffer.getEntries() ends up calling getSizedEntries() and using the sizes and also passing the MaybeDocumentMap straight through. So it needs both, and the way it's structured right now makes sense.

So nevermind. Please keep it the way it is.

});
}

forEachDbEntry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -108,8 +108,7 @@ export abstract class RemoteDocumentChangeBuffer {
}

/**
* Looks up several entries in the cache.
* checked, and if no buffered change applies, this will forward to
* Looks up several entries in the cache, orwarding to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forwarding :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mikelehen mikelehen removed their assignment Dec 21, 2018
@var-const var-const assigned mikelehen and gsoltis and unassigned gsoltis Dec 22, 2018
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. THanks!

@mikelehen mikelehen removed their assignment Dec 22, 2018
@var-const var-const merged commit 1888bd7 into master Dec 22, 2018
@firebase firebase locked and limited conversation to collaborators Oct 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants