-
Notifications
You must be signed in to change notification settings - Fork 938
Port performance optimizations to speed up reading large collections from Android #1433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
documentKeys.first()!.path.toArray(), | ||
documentKeys.last()!.path.toArray() | ||
); | ||
let key = documentKeys.first(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is built upon the idea that control.skip()
makes the algorithm more efficient. Will be happy with any feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I might name it nextKey
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* found, the key will be mapped to null) and a map of sizes indexed by | ||
* key (zero if the key cannot be found). | ||
*/ | ||
getSizedEntries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be for getSizedEntries
to return a SortedMap<DocumentKey, DocumentSizeEntry>
. I decided in favor of returning two maps because it makes it easier to avoid code duplication between getEntries
and getSizedEntries
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T
. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>
. You can avoid code duplication and avoid doing extra work for sizes that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out, but I'm not sure I prefer it. The problem is that getEntries
in RemoteDocumentChangeBuffer
won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn
that either updates the sizeMap
or is a no-op).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that makes sense, but since it deals with DbRemoteDoc, I'd keep it internal and have getEntries() and getSizedEntries() functions that wrap it.
I think what Greg is recommending is basically a mapDbEntries() function, but it might be a little simpler to instead have it be a forEachDbEntry(transaction, documentKeys, callback) function that iterates the matching documents and just calls the callback with each raw dbRemoteDocument. That may mean a little bit of redundant code for getEntries() and getSizedEntries() to build up their respective maps, but it seems simpler to me (and perhaps more generically useful, if we had a case where we don't necessarily want to build up a map).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please take a look.
key!, | ||
this.serializer.fromDbRemoteDocument(dbRemoteDoc) | ||
); | ||
sizeMap = sizeMap.insert(key!, dbDocumentSize(dbRemoteDoc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether to update the sizeMap
could be controlled by a flag to avoid doing unnecessary work when this function is invoked via getEntries
, but it might be overkill.
): PersistencePromise<NullableMaybeDocumentMap> { | ||
const changes = this.assertChanges(); | ||
|
||
// Record the size of everything we load from the cache so we can compute a delta later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I could add code to look up the keys in the buffer. I decided to postpone it because it will necessitate some logic to merge the buffer lookup results with results from cache, so I'd like to check beforehand if you think this optimization is worthwhile.
Michael, I'll port the tests and measure performance numbers tomorrow, but I think it should be in good enough shape for review. Thanks! |
return remoteDocumentsStore(transaction) | ||
.iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => { | ||
const potentialKey = DocumentKey.fromSegments(potentialKeyRaw); | ||
while (DocumentKey.comparator(key!, potentialKey) != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use < 1
instead. Even though this comparison function might return {-1, 0, 1}
, some comparison functions return { < 0, 0, > 0}
, which will break here for values > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this except I'd use <= 0
😁 (technically the comparator could return 0.5 or something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* found, the key will be mapped to null) and a map of sizes indexed by | ||
* key (zero if the key cannot be found). | ||
*/ | ||
getSizedEntries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider taking as an argument a function that processes each key into the type of the result, for example fn: (key: DocumentKey, doc: DbRemoteDoc | null) => T
. Then, your return type can be PersistencePromise<SortedMap<DocumentKey, T>>
. You can avoid code duplication and avoid doing extra work for sizes that way.
batches: MutationBatch[] | ||
): PersistencePromise<NullableMaybeDocumentMap> { | ||
let results = nullableMaybeDocumentMap(); | ||
return new PersistencePromise<NullableMaybeDocumentMap>(resolve => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have to do this in a closure. Consider:
let results = nullableMaybeDocumentMap():
docs.forEach((key, localView) => {
for (const batch of batches) {
localView = batch.applyToLocalView(key, localView);
}
results = results.insert(key, localView);
});
return PersistencePromise.resolve(results);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going further, I don't think this function needs to take a transaction or return a PersistencePromise. It can just be synchronous.
Note that in some cases we do have functions that are "needlessly" asynchronous and return a PersistencePromise when they could be synchronous. But we typically do this for public functions where we want to reserve the ability to make them asynchronous in the future, and so we want the consuming component to deal with them as an asynchronous function.
But since this is a private function, I wouldn't worry about future-proofing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@var-const In general, JavaScript is completely single-threaded (there's no way to block other than by spinning the CPU) and so wrapping a computation in a promise doesn't really help (in particular it won't enable any parallelism).
And PersistencePromise is extra weird because it's a specially-designed Promise-like construct that tries to be as synchronous as possible because IndexedDb has weird semantics where in the completion for one operation you must synchronously start the next operation or else your transaction will auto-close. So even using PersistencePromise, this code is actually 100% synchronous. So you can go ahead and just yank PersistencePromise out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaScript is completely single-threaded
Right. Thanks, made this function return the result directly.
transaction: PersistenceTransaction, | ||
baseDocs: NullableMaybeDocumentMap | ||
): PersistencePromise<MaybeDocumentMap> { | ||
let allKeys = documentKeySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want to construct a new set just to pass the keys, you could change the signature for getAllMutationBatchesAffectingDocumentKeys
to accept a map of DocumentKey
to {}
. That indicates that you don't care about the values in the map, as you won't have their type, but you will still be able to iterate through the keys. Alternatively, I would consider defining a .keySet()
method for SortedMap
, since iterating through and simultaneously building up a new immutable set is not very efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (I had to use any
, though, compiler complained about MaybeDocument | null
not being assignable to {}
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try using AnyJs
instead of any
?
any is dangerous in that it basically opts out of typechecking. So if you type the values as any
, TypeScript will let you do anything with the values (call nonexistent methods, etc.) which is the opposite of what we want (we don't want to be able to do anything with the values).
AnyJs
is a type we introduced essentially meant to be a supertype of any JS type (similar to Object
in java). So TypeScript won't let you do anything with it without doing typeof / instanceof checks to narrow it down.
TypeScript 3.x actually introduced unknown
which is similar to AnyJs and I'm hoping we can migrate to it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this comment. Done now.
@@ -71,7 +71,12 @@ export class LocalSerializer { | |||
/** Encodes a document for storage locally. */ | |||
toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument { | |||
if (maybeDoc instanceof Document) { | |||
const doc = this.remoteSerializer.toDocument(maybeDoc); | |||
let doc: api.Document; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, but this bit could be shortened to:
const doc = maybeDoc.proto ? maybeDoc.proto : this.remoteSerializer.toDocument(maybeDoc);
or even:
const doc = maybeDoc.proto || this.remoteSerializer.toDocument(maybeDoc);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)
* Memoized serialized form of the document for optimization purposes (avoids repeated | ||
* serialization). Might be undefined. | ||
*/ | ||
readonly proto?: api.Document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concern with increased memory usage? Is there a way we can be more certain that we aren't keeping the serialized form around indefinitely in, say, a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about the full life cycle of the Document
, but some points:
- the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;
- this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?
- perhaps local serializer should reset the memoized proto after writing it to storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW- If we are, I don't think it's the end of the world. So I wouldn't go terribly far out-of-our-way to guarantee this unless it shows up as a problem.
if (!iter.hasNext()) { | ||
return null; | ||
} | ||
if (this.comparator(iter.peek()!.key, elem) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
===
if (!iter.hasNext()) { | ||
return null; | ||
} | ||
if (this.comparator(iter.peek()!.key, elem) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think maybe you can do this without .peek()
?
const next = iter.getNext();
if (this.comparator(next.key, elem) !== 0) {
return key;
} else if (this.hasNext()) {
return this.getNext().key;
} else {
return null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
return maybeDocumentMap(); | ||
} | ||
|
||
export type DocumentSizeEntries = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am nervous about this type. I know currently that the keys are kept in sync, but there is nothing here that guarantees it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsoltis Greg, thanks for the review. Addressed your comments, will address Michael's comments tomorrow.
return remoteDocumentsStore(transaction) | ||
.iterate({ range }, (potentialKeyRaw, dbRemoteDoc, control) => { | ||
const potentialKey = DocumentKey.fromSegments(potentialKeyRaw); | ||
while (DocumentKey.comparator(key!, potentialKey) != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
batches: MutationBatch[] | ||
): PersistencePromise<NullableMaybeDocumentMap> { | ||
let results = nullableMaybeDocumentMap(); | ||
return new PersistencePromise<NullableMaybeDocumentMap>(resolve => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I should have mentioned this -- this function doesn't have to be async at all. I wrapped the computation in a promise because I presumed it's time-consuming enough to warrant this. Do you think it should just return results
directly?
@@ -71,7 +71,12 @@ export class LocalSerializer { | |||
/** Encodes a document for storage locally. */ | |||
toDbRemoteDocument(maybeDoc: MaybeDocument): DbRemoteDocument { | |||
if (maybeDoc instanceof Document) { | |||
const doc = this.remoteSerializer.toDocument(maybeDoc); | |||
let doc: api.Document; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (I like the Lua-style second version, but chose the first one because it's more similar to other platforms)
if (!iter.hasNext()) { | ||
return null; | ||
} | ||
if (this.comparator(iter.peek()!.key, elem) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
* found, the key will be mapped to null) and a map of sizes indexed by | ||
* key (zero if the key cannot be found). | ||
*/ | ||
getSizedEntries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out, but I'm not sure I prefer it. The problem is that getEntries
in RemoteDocumentChangeBuffer
won't be able to return a map of documents directly (due to type difference) and instead would have to build a new map. If extra work for calculating/storing sizes is a concern, it's easy (though ugly) to solve with a flag (or perhaps, more similar to this approach, by having a fn
that either updates the sizeMap
or is a no-op).
transaction: PersistenceTransaction, | ||
baseDocs: NullableMaybeDocumentMap | ||
): PersistencePromise<MaybeDocumentMap> { | ||
let allKeys = documentKeySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (I had to use any
, though, compiler complained about MaybeDocument | null
not being assignable to {}
).
* Memoized serialized form of the document for optimization purposes (avoids repeated | ||
* serialization). Might be undefined. | ||
*/ | ||
readonly proto?: api.Document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about the full life cycle of the Document
, but some points:
- the only place where the proto is memoized is in remote serializer. When documents are created by local serializer or from mutations, no memoization is performed;
- this is a memory-speed tradeoff. I haven't seen complaints about memory consumption on other platforms (other than running out of memory on iOS due to a bug), is this an issue in Web client?
- perhaps local serializer should reset the memoized proto after writing it to storage?
return maybeDocumentMap(); | ||
} | ||
|
||
export type DocumentSizeEntries = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Do you think it would be better to avoid adding a new type and just have the relevant functions return two values?
Tried to do some performance measuring (I had a lot of trouble with Chrome apparently somehow caching the Firestore code being used, which only got resolved by restarting the browser. After that, the behavior seemed to be consistent, though I'm still slightly unsure). Like other platforms, I tried reading a collection consisting of 1K somewhat large documents. Numbers fluctuate quite a bit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple nits. Thanks!
@@ -206,46 +218,72 @@ export class IndexedDbRemoteDocumentCache implements RemoteDocumentCache { | |||
): PersistencePromise<DocumentSizeEntries> { | |||
let results = nullableMaybeDocumentMap(); | |||
let sizeMap = new SortedMap<DocumentKey, number>(DocumentKey.comparator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, now that getEntries() is no longer a wrapper around getSizedEntries(), can we drop sizeMap and have this be new SortedMap<DocumentKey, DocumentSizeEntry|null> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I don't feel strongly about this. The reason I set it up that way is so that getEntries
in RemoteDocumentChangeBuffer
can return the MaybeDocumentMap
directly. If this were a sorted map of DocumentSizeEntry
s, then getEntries
would have to create a new MaybeDocumentMap
and fill it with just MaybeDocument
s.
It's probably not a big deal, so if you think code clarity is more important here, I'll do the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry! I was confused. I thought it was just so the old getEntries() implementation could return the MaybeDocumentMap directly. But I see now that RemoteDocumentChangeBuffer.getEntries() ends up calling getSizedEntries() and using the sizes and also passing the MaybeDocumentMap straight through. So it needs both, and the way it's structured right now makes sense.
So nevermind. Please keep it the way it is.
}); | ||
} | ||
|
||
forEachDbEntry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -108,8 +108,7 @@ export abstract class RemoteDocumentChangeBuffer { | |||
} | |||
|
|||
/** | |||
* Looks up several entries in the cache. | |||
* checked, and if no buffered change applies, this will forward to | |||
* Looks up several entries in the cache, orwarding to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forwarding :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM. THanks!
Straightforward port of firebase/firebase-android-sdk#123.