Skip to content

Performance optimizations to speed up reading large collections #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 28, 2018
Merged
2 changes: 1 addition & 1 deletion fiamui-app/fiamui-app.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ dependencies {

androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'
androidTestImplementation 'com.android.support.test:rules:1.0.2'
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
import com.google.firebase.firestore.model.SnapshotVersion;
import com.google.firebase.firestore.model.mutation.Mutation;
import com.google.firebase.firestore.model.mutation.MutationBatch;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import javax.annotation.Nullable;
Expand Down Expand Up @@ -69,6 +71,28 @@ private MaybeDocument getDocument(DocumentKey key, List<MutationBatch> inBatches
return document;
}

@Nullable
private List<MaybeDocument> getDocumentsInternal(Iterable<DocumentKey> keys, List<MutationBatch> batches) {
List<MaybeDocument> documents = remoteDocumentCache.getAll(keys);
// TODO(varconst): uncomment and fix.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bother with batches, because there are no pending write batches in the case being tested.

// for (MutationBatch batch : batches) {
// document = batch.applyToLocalView(key, document);
// }

return documents;
}

@Nullable
private List<MaybeDocument> getDocumentsInternal(Collection<MaybeDocument> docs, List<MutationBatch> batches) {
List<MaybeDocument> result = new ArrayList<>(docs);
// TODO(varconst): uncomment and fix.
// for (MutationBatch batch : batches) {
// document = batch.applyToLocalView(key, document);
// }

return result;
}

/**
* Gets the local view of the documents identified by {@code keys}.
*
Expand All @@ -79,15 +103,28 @@ ImmutableSortedMap<DocumentKey, MaybeDocument> getDocuments(Iterable<DocumentKey
ImmutableSortedMap<DocumentKey, MaybeDocument> results = emptyMaybeDocumentMap();

List<MutationBatch> batches = mutationQueue.getAllMutationBatchesAffectingDocumentKeys(keys);
for (DocumentKey key : keys) {
// TODO: PERF: Consider fetching all remote documents at once rather than
// one-by-one.
MaybeDocument maybeDoc = getDocument(key, batches);
List<MaybeDocument> docs = getDocumentsInternal(keys, batches);
for (MaybeDocument maybeDoc : docs) {
// TODO: Don't conflate missing / deleted.
if (maybeDoc == null) {
maybeDoc = new NoDocument(maybeDoc.getKey(), SnapshotVersion.NONE, /*hasCommittedMutations=*/ false);
}
results = results.insert(maybeDoc.getKey(), maybeDoc);
}
return results;
}

ImmutableSortedMap<DocumentKey, MaybeDocument> getDocuments(Map<DocumentKey, MaybeDocument> docsByKey) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids accessing the local database and just applies pending batches to the given documents.

ImmutableSortedMap<DocumentKey, MaybeDocument> results = emptyMaybeDocumentMap();

List<MutationBatch> batches = mutationQueue.getAllMutationBatchesAffectingDocumentKeys(docsByKey.keySet());
List<MaybeDocument> docs = getDocumentsInternal(docsByKey.values(), batches);
for (MaybeDocument maybeDoc : docs) {
// TODO: Don't conflate missing / deleted.
if (maybeDoc == null) {
maybeDoc = new NoDocument(key, SnapshotVersion.NONE, /*hasCommittedMutations=*/ false);
maybeDoc = new NoDocument(maybeDoc.getKey(), SnapshotVersion.NONE, /*hasCommittedMutations=*/ false);
}
results = results.insert(key, maybeDoc);
results = results.insert(maybeDoc.getKey(), maybeDoc);
}
return results;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,18 @@ public LocalSerializer(RemoteSerializer rpcSerializer) {
com.google.firebase.firestore.proto.MaybeDocument encodeMaybeDocument(MaybeDocument document) {
com.google.firebase.firestore.proto.MaybeDocument.Builder builder =
com.google.firebase.firestore.proto.MaybeDocument.newBuilder();

if (document instanceof NoDocument) {
NoDocument noDocument = (NoDocument) document;
builder.setNoDocument(encodeNoDocument(noDocument));
builder.setHasCommittedMutations(noDocument.hasCommittedMutations());
} else if (document instanceof Document) {
Document existingDocument = (Document) document;
builder.setDocument(encodeDocument(existingDocument));
if (existingDocument.getProto() != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization 1 ("Avoid encode"): avoid serializing a Document to a proto again; instead, keep the serialized version around and reuse it. Storing the proto right in the document object is a strawman.

builder.setDocument(existingDocument.getProto());
} else {
builder.setDocument(encodeDocument(existingDocument));
}
builder.setHasCommittedMutations(existingDocument.hasCommittedMutations());
} else if (document instanceof UnknownDocument) {
builder.setUnknownDocument(encodeUnknownDocument((UnknownDocument) document));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import com.google.firebase.firestore.remote.TargetChange;
import com.google.firebase.firestore.util.Logger;
import com.google.protobuf.ByteString;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
Expand Down Expand Up @@ -329,14 +330,27 @@ public ImmutableSortedMap<DocumentKey, MaybeDocument> applyRemoteEvent(RemoteEve
}
}

Set<DocumentKey> changedDocKeys = new HashSet<>();
Map<DocumentKey, MaybeDocument> changedDocs = new HashMap<>();
Map<DocumentKey, MaybeDocument> documentUpdates = remoteEvent.getDocumentUpdates();
Set<DocumentKey> limboDocuments = remoteEvent.getResolvedLimboDocuments();

Set<DocumentKey> keys = new HashSet<>();
for (Entry<DocumentKey, MaybeDocument> entry : documentUpdates.entrySet()) {
keys.add(entry.getKey());
}
List<MaybeDocument> existingDocs = remoteDocuments.getAll(keys);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization 2 ("No get by one"): get all the documents from the local database in a single query.

IIUC, keys are not repeated, so retrieving all the documents before going into the main for loop is okay, because no iteration of the loop may affect subsequent iterations.


for (Entry<DocumentKey, MaybeDocument> entry : documentUpdates.entrySet()) {
DocumentKey key = entry.getKey();
MaybeDocument doc = entry.getValue();
changedDocKeys.add(key);
MaybeDocument existingDoc = remoteDocuments.get(key);
MaybeDocument existingDoc = null;
for (MaybeDocument aDoc : existingDocs) {
if (aDoc.getKey().equals(key)) {
existingDoc = aDoc;
break;
}
}

// If a document update isn't authoritative, make sure we don't
// apply an old document version to the remote cache. We make an
// exception for SnapshotVersion.MIN which can happen for
Expand All @@ -347,6 +361,7 @@ public ImmutableSortedMap<DocumentKey, MaybeDocument> applyRemoteEvent(RemoteEve
|| (authoritativeUpdates.contains(doc.getKey()) && !existingDoc.hasPendingWrites())
|| doc.getVersion().compareTo(existingDoc.getVersion()) >= 0) {
remoteDocuments.add(doc);
changedDocs.put(key, doc);
} else {
Logger.debug(
"LocalStore",
Expand All @@ -355,6 +370,7 @@ public ImmutableSortedMap<DocumentKey, MaybeDocument> applyRemoteEvent(RemoteEve
key,
existingDoc.getVersion(),
doc.getVersion());
changedDocs.put(key, existingDoc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this probably isn't necessary (nothing changed). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the previous behavior was to add the key to changedDocKeys unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, logically this seems unnecessary. However, the previous behavior added the key to changedDocKeys unconditionally (line 338) and then used that key set to retrieve the documents. From a glance at the code, I cannot determine whether this change would be effectively a no-op or not.

The tests pass either way, but I'm not sure we cover the situation when the server sends outdated docs. All in all, I'm a little wary about this change... What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does https://github.com/firebase/firebase-js-sdk/blob/master/packages/firestore/test/unit/specs/listen_spec.test.ts#L228 cover this?

In general the best way to see if we have a test covering a behavior is to set a breakpoint and then check if we hit it under the debugger.

I applaud your skeptical approach to changing existing behavior, but it's also worth considering that a lot of this code was written in a hurry or has evolved organically over time. For example, we only learned that Watch even had this kind of behavior after we observed it in a bug bash. The code to defend against this was added later and it's likely we just didn't adjust the initial change computation.

In any case, our approach in the LocalStore has always been to err on the side of over-notifying because the view code is ultimately responsible for computing what has changed. This likely has no visible effect precisely because the view is discarding updates that don't net any changes, but that doesn't invalidate the logic behind changing this. We just need to avoid under-notifying--that's something for which the view can't compensate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's safe. Since we're not updating the cached doc, it logically makes sense not to include it in the changed docs. The result of this method is used to update our Views, and again, since we kept the existing doc, no update should be needed. And this code is exercised by the "Listens: Individual documents cannot revert" spec test.

So I'd feel comfortable removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if (limboDocuments.contains(key)) {
Expand All @@ -376,7 +392,7 @@ public ImmutableSortedMap<DocumentKey, MaybeDocument> applyRemoteEvent(RemoteEve
queryCache.setLastRemoteSnapshotVersion(remoteVersion);
}

return localDocuments.getDocuments(changedDocKeys);
return localDocuments.getDocuments(changedDocs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization 3 ("No double get"): avoid retrieving the documents from local database again, we already have them in this function; just apply the pending write batches to them.

});
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
import com.google.firebase.firestore.model.DocumentKey;
import com.google.firebase.firestore.model.MaybeDocument;
import com.google.firebase.firestore.model.ResourcePath;
import java.util.ArrayList;
import java.util.List;
import java.util.Iterator;
import java.util.Map;
import javax.annotation.Nullable;
Expand Down Expand Up @@ -53,6 +55,12 @@ public MaybeDocument get(DocumentKey key) {
return docs.get(key);
}

@Override
public List<MaybeDocument> getAll(Iterable<DocumentKey> keys) {
// FIXME(varconst) placeholder
return new ArrayList<>();
}

@Override
public ImmutableSortedMap<DocumentKey, Document> getAllDocumentsMatchingQuery(Query query) {
ImmutableSortedMap<DocumentKey, Document> result = emptyDocumentMap();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import com.google.firebase.firestore.model.Document;
import com.google.firebase.firestore.model.DocumentKey;
import com.google.firebase.firestore.model.MaybeDocument;
import java.util.List;
import javax.annotation.Nullable;

/**
Expand Down Expand Up @@ -51,6 +52,8 @@ interface RemoteDocumentCache {
@Nullable
MaybeDocument get(DocumentKey documentKey);

List<MaybeDocument> getAll(Iterable<DocumentKey> documentKeys);

/**
* Executes a query against the cached Document entries
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@
import com.google.firebase.firestore.model.ResourcePath;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.MessageLite;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import javax.annotation.Nullable;

Expand Down Expand Up @@ -66,6 +70,62 @@ public MaybeDocument get(DocumentKey documentKey) {
.firstValue(row -> decodeMaybeDocument(row.getBlob(0)));
}

@Nullable
@Override
public List<MaybeDocument> getAll(Iterable<DocumentKey> documentKeys) {
List<MaybeDocument> result = new ArrayList<>();
if (!documentKeys.iterator().hasNext()) {
return result;
}

// SQLite limits maximum number of host parameters to 999 (see
// https://www.sqlite.org/limits.html). To work around this, split the given keys into several
// smaller sets and issue a separate query for each.
int limit = 900;
Iterator<DocumentKey> keyIter = documentKeys.iterator();
int queriesPerformed = 0;
while (keyIter.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use for (DocumentKey key: keys) here too I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks, I didn't realize it can be done on Iterable.

++queriesPerformed;
StringBuilder placeholdersBuilder = new StringBuilder();
List<String> args = new ArrayList<>();

for (int i = 0; keyIter.hasNext() && i < limit; i++) {
DocumentKey key = keyIter.next();

if (i > 0) {
placeholdersBuilder.append(", ");
}
placeholdersBuilder.append("?");

args.add(EncodedPath.encode(key.getPath()));
}
String placeholders = placeholdersBuilder.toString();

db.query(
"SELECT contents FROM remote_documents "
+ "WHERE path IN ("
+ placeholders
+ ") "
+ "ORDER BY path")
.binding(args.toArray())
.forEach(
row -> {
result.add(decodeMaybeDocument(row.getBlob(0)));
});
}

// If more than one query was issued, batches might be in an unsorted order (batches are ordered
// within one query's results, but not across queries). It's likely to be rare, so don't impose
// performance penalty on the normal case.
if (queriesPerformed > 1) {
Collections.sort(
result,
(MaybeDocument lhs, MaybeDocument rhs) ->
lhs.getKey().compareTo(rhs.getKey()));
}
return result;
}

@Override
public ImmutableSortedMap<DocumentKey, Document> getAllDocumentsMatchingQuery(Query query) {
// Use the query path as a prefix for testing if a document matches the query.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,16 @@ public static Comparator<Document> keyComparator() {

private final DocumentState documentState;

private com.google.firestore.v1beta1.Document proto;

public void setProto(com.google.firestore.v1beta1.Document protoDoc) {
proto = protoDoc;
}

public com.google.firestore.v1beta1.Document getProto() {
return proto;
}

public Document(
DocumentKey key, SnapshotVersion version, ObjectValue data, DocumentState documentState) {
super(key, version);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,13 @@ public void onHeaders(Metadata headers) {
public void onNext(RespT response) {
dispatcher.run(
() -> {
Logger.debug(
AbstractStream.this.getClass().getSimpleName(),
"(%x) Stream received: %s",
System.identityHashCode(AbstractStream.this),
response);
if (Logger.isDebugEnabled()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with Proguard, logging adds about a second to 2-3 seconds the network usually takes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm! I wonder why... Since we're not building up the log string or anything, it seems like this should just be a plain cheap method call (which immediately no-ops on the receiving end), unless getClass().getSimpleName() or System.identityHashCode() is expensive (my bet would be on the former).

Any interest in digging a little deeper to see why it's slow? We may learn something that helps us preemptively improve other parts of the code [e.g. maybe we should cache this.getClass().getSimpleName()].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profiler shows String.format as the top culprit; probably nothing we can do about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still struggling to see how String.format() would be getting called by this code. Maybe we can chat during standup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing me to dig into this. Looks like I got confused here. In the app, logging is on, which is why String.format is called here. When I tested the numbers, looks like I conflated no Proguard with logging, and Proguard with no logging (I used several repos to run tests, one of which had logging turned off).

To hopefully untangle this, I reran the numbers, using three "variables":

  • SDK version;
  • Proguard/no Proguard in the app;
  • logging/no logging.

Results (three runs in each case, Release mode, same device as before):

  • SDK 17.1.1, no Proguard, no logging: 4.4s - 4.8s (raw numbers: 4384ms / 4466ms / 4759ms);
  • SDK 17.1.1, no Proguard, with logging: 4.5s - 4.8s (raw numbers: 4789ms / 4525ms / 4836ms);
  • SDK 17.1.1, with Proguard, no logging: 4.4s - 4.6s (raw numbers: 4565ms / 4362ms / 4836ms);
  • SDK 17.1.1, with Proguard, with logging: 4.5s - 4.6s (raw numbers: 4478ms / 4602ms / 4602ms (not a typo)).

So it seems that in 17.1.1 (IIUC, the last SDK version that was proguarded), the difference between logging enabled and disabled is negligible, and whether the app itself is proguarded doesn't really matter.

  • SDK 17.1.2, no Proguard, no logging: 4.7s - 5.1s (raw numbers: 5064ms / 4992ms / 4717ms);
  • SDK 17.1.2, no Proguard, with logging: 6.3s - 6.5s (raw numbers: 6303ms / 6351ms / 6503ms);
  • SDK 17.1.2, with Proguard, no logging: 4.1s - 4.5s (raw numbers: 4361ms / 4094ms / 4490ms);
  • SDK 17.1.2, with Proguard, with logging: 4.5s - 4.7s (raw numbers: 4536ms / 4503ms / 4682ms).

In 17.1.2, an app pays significant penalty if it enables logging but isn't proguarded. The rest of the numbers are probably within error margin.

  • this branch, no Proguard, no logging: 2.7s - 2.9s (raw numbers: 2774ms / 2851ms / 2690ms);
  • this branch, no Proguard, with logging: 4.8s - 4.9s (raw numbers: 4992ms / 4807ms / 4939ms);
  • this branch, with Proguard, no logging: 2.7s - 2.8s (raw numbers: 2802ms / 2783ms / 2710ms);
  • this branch, with Proguard, with logging: 2.8s - 2.9s (raw numbers: 2881ms / 2861ms / 2795ms).

Surprisingly, for this branch, the penalty of no Proguard/logging seems even higher than 17.1.2 (but perhaps within fluctuation). If Proguard is enabled, logging doesn't add any significant difference.

Logger.debug(
AbstractStream.this.getClass().getSimpleName(),
"(%x) Stream received: %s",
System.identityHashCode(AbstractStream.this),
response);
}
AbstractStream.this.onNext(response);
});
}
Expand Down Expand Up @@ -203,6 +205,7 @@ public void run() {
this.idleTimerId = idleTimerId;
this.listener = listener;
this.idleTimeoutRunnable = new IdleTimeoutRunnable();

backoff =
new ExponentialBackoff(
workerQueue,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,7 @@ public WatchChange decodeWatchChange(ListenResponse protoChange) {
!version.equals(SnapshotVersion.NONE), "Got a document change without an update time");
ObjectValue data = decodeFields(docChange.getDocument().getFieldsMap());
Document document = new Document(key, version, data, Document.DocumentState.SYNCED);
document.setProto(docChange.getDocument());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "Avoid encode" optimization: keep the original proto around.

watchChange = new WatchChange.DocumentChange(added, removed, document.getKey(), document);
break;
case DOCUMENT_DELETE:
Expand Down