Performance: Decode documents in background thread #559

schmidt-sebastian · 2019-06-21T22:26:17Z

This PR reduces the time it takes to execute getAllDocumentsMatchingQuery by moving the Proto serialization into Android's background queue.

Note that the only real complication here is that Android's THREAD_POOL_EXECUTOR uses a LinkedBlockingQueue with a fixed size of 120. We can't enqueue more than 120 elements without triggering RejectedExecutionsExceptions. Instead, this PR uses its own queue and up to 4 runners (the core pool size of the THREAD_POOL_EXECUTOR).

On a Nexus 5X, reading 1700 documents from a single collection takes:

without PR: 3621ms, 3600ms, 3637ms, 3550ms, 3593ms (avg 3605 ms)
with this PR: 1498ms, 1438ms, 1502ms, 1496ms, 1433ms (avg 1473 ms)

Reading a collection with 10 document 100 times:

without PR: 2295 ms, 2280 ms, 2319 ms, 2296ms, 2315ms (avg 2301 ms)
with PR: 1380ms, 1646 ms, 1503 ms, 1191ms, 1451 ms (avg 1416 ms)

Reading a collection with 2 document 1000 times:

without PR: 8902ms, 8520ms, 7472ms, 7953ms, 10907ms (avg 8751 ms)
with PR: 7901ms, 8458ms, 9109ms, 8730ms, 778ms1 (avg 8395 ms)

Reading a collection with 1 document 1000 times:

without PR: 780ms, 763ms, 724ms, 774ms, 727ms (avg 753 ms)
with PR: 796ms, 752ms, 724ms, 786ms, 722ms (avg 756 ms)

Note that without the special handling of one document collections, the read time for a 1 document collection roughly doubled (from .7 ms to 1.5 ms).

schmidt-sebastian · 2019-06-21T22:38:11Z

/retest

wilhuff · 2019-06-24T15:47:29Z

firebase-firestore/CHANGELOG.md

@@ -1,4 +1,6 @@
 # Unreleased
+- [changed] Reduced execution time of queries with large result sets by up to


Using "reduced" as the verb here seems like the wrong pairing. Reduce resource consumption but improve or (increase) performance seem more customary.

How about "Improved the performance of queries with large result sets by up to 60%"?

wilhuff · 2019-06-24T18:10:41Z

...e-firestore/src/main/java/com/google/firebase/firestore/local/SQLiteRemoteDocumentCache.java

-              }
-
-              results.put(doc.getKey(), doc);
+              byte[] rawContents = row.getBlob(1);


This one method is now huge, tightly intermingling threading implementation details with the query processing. Most of this could be generic. There are other circumstances where we potentially process large numbers of rows (e.g. the mutation queue) and this could be extended to those places. Probably even more importantly, if we pull this out into something separate we could test it directly :-).

One thing that's preventing this from being generic is the intermediate results are stored in a concurrent map, which ties the results to the query structure. However, we don't need this to be a map of key to document until we're assembling the result. We could just as easily publish to each document into a dequeue.

Additional notes:

This implementation leaves time on the table by not starting query matching until all the protos are decoded. The Firestore worker thread that was processing query results can't start assembling the result before the worker threads are done.

Also, we're performing query matching serially on the Firestore worker, when we could be performing it in the parallel block.

This is a very low number of threads that are all doing significant work per item. We could probably get away with just a simple lock wrapping our output without building an intermediate ConcurrentHashMap.

If we were targeting Java 8, I'd suggest that you've essentially reimplemented a subset of java.util.stream. There are ports of that to Android API < 24, but that's a lot of code for something that's pretty small.

At a high level this could be phrased as map/reduce, where the map phase is parsing and filtering (first by instanceof Document, second by query.matches). The reduce phase is the assembly of the matchingDocuments map.

I suggest a few refactoring experiments:

See if pulling query matching into the worker threads helps

See if directly assembling the ImmutableSortedMap while holding a lock works

If not, see if a regular Deque of matching docs while holding a lock is OK

Alternatively try a ConcurrentLinkedQueue

Then try to extract a component that does the parallel processing in a way that's more self contained

WDYT?

I'll try to make this more generic, but I didn't want this to turn into a 500 line change with multiple new classes. I'll give it another go though.

Re: Query processing. Query matching is not thread safe (because of

firebase-android-sdk/firebase-firestore/src/main/java/com/google/firebase/firestore/core/Query.java

Line 54 in 745272a

private List<OrderBy> memoizedOrderBy;

) and If we do want to perform this concurrently, then I would rather tackle this in a follow-up PR.

Does this turn into a problem in practice? What if we access the orderBy in this thread before enqueuing any work?

@mikelehen mostly convince me that Query matching is not a problem in practice. I also believe that Query matching itself is not a huge bottleneck, but it doesn't hurt to also execute it in the background, especially given that we are already running a task there.

I rewrote the code in this function to use the Tasks API. By that I mean I added APIs to Util that mirror their equivalent methods of the Task API, but with some significant differences:

My version of await can be called on the main thread to support unit testing of the RemoteDocumentStore.

My version of whenAllComplete doesn't execute the continuation on the main queue, which blocks our SpecTests on Robolectric.

If you like this approach, I can write dedicated tests for these helpers. We might also want to move them to a dedicated place (such as TaskUtil).

schmidt-sebastian · 2019-06-25T05:26:36Z

Taking this one back for now, since the Tasks change likely makes single document fetches much slower. It always queues an execution on a background thread.

schmidt-sebastian · 2019-06-25T23:35:33Z

I updated the PR to add a new queue for background tasks, which doesn't rely on Android's Task API to do its management (since that would move all state keeping to a background thread). The queue is not super general-purpose, but it works for the use case in this PR and should also work if we choose to do something similar in the mutation queue.

wilhuff

Have only reviewed the executor here

wilhuff · 2019-06-26T01:52:47Z

firebase-firestore/src/main/java/com/google/firebase/firestore/util/Executors.java

+   * handle an unbounded number of pending tasks.
+   */
+  public static final Executor BACKGROUND_EXECUTOR =
+      new Executor() {


Please don't make major components we're going to reason about like this anonymous classes. Make it a private static final class with a name. This will make stack traces much, much saner.

wilhuff · 2019-06-26T01:53:49Z

firebase-firestore/src/main/java/com/google/firebase/firestore/util/Executors.java

+   * An executor that runs tasks in parallel on Android's AsyncTask.THREAD_POOL_EXECUTOR.
+   *
+   * <p>Unlike the main THREAD_POOL_EXECUTOR, this executor manages its own queue of tasks and can
+   * handle an unbounded number of pending tasks.


Having an unbounded number of pending tasks seems problematic from a memory usage point of view.

Previously we'd unpack one blob, parse it, compare it, and then allow it to be GCed.

Now, if we get significantly ahead of the background threads' ability to clear this queue each queued document will be resident. Previously, ignoring constant factors around the current document we're looking at, our memory usage was essentially O(matching documents), but this increases it to potentially be O(all documents in a collection). This could be a huge difference. We should block admission into the queue if the background threads aren't cleaning up.

A straightforward way to bound this is to just use a semaphore with a fixed number of permits to guard admission. This will have the side effect of fixing the thread-safety issue you've documented below, which makes me uncomfortable.

wilhuff · 2019-06-26T01:56:59Z

firebase-firestore/src/main/java/com/google/firebase/firestore/util/Executors.java

+            // While undesired, this would merely queue another task on THREAD_POOL_EXECUTOR,
+            // and we are unlikely to hit the 120 pending task limit.
+            activeRunnerCount.incrementAndGet();
+            AsyncTask.THREAD_POOL_EXECUTOR.execute(


We still need to handle the fact that AsyncTask.THREAD_POOL_EXECUTOR.execute can reject adding this task.

We should implement a caller runs policy, rather than trying to block on the executor becoming available. i.e.

// acquire permit Runnable wrappedCommand = // thing that runs and then releases semaphore try { AsyncTask.THREAD_POOL_EXECUTOR.execute(wrappedCommand); } catch (RejectedExecutionException ignored) { wrappedCommand.run(); }

Actually it occurs to me that there's essentially no reason to block under any circumstance. If we choose to implement a semaphore to limit the number of tasks we admit into the underlying executor, we could tryAcquire, and it we don't get a ticket just run the (unwrapped) command directly rather than blocking.

We probably still want a semaphore so that we don't abuse the AsyncTask.THREAD_POOL_EXECUTOR. If not for the fact that other participants are unlikely to be well behaved, we could just rely on the rejected exceptions + caller runs to implement pushback.

wilhuff

Completed the review.

The shape of this is now pretty great, but I have a bunch of questions/notes about the details.

wilhuff · 2019-06-26T02:42:28Z

firebase-firestore/src/main/java/com/google/firebase/firestore/util/Executors.java

+            // While undesired, this would merely queue another task on THREAD_POOL_EXECUTOR,
+            // and we are unlikely to hit the 120 pending task limit.
+            activeRunnerCount.incrementAndGet();
+            AsyncTask.THREAD_POOL_EXECUTOR.execute(


Actually it occurs to me that there's essentially no reason to block under any circumstance. If we choose to implement a semaphore to limit the number of tasks we admit into the underlying executor, we could tryAcquire, and it we don't get a ticket just run the (unwrapped) command directly rather than blocking.

We probably still want a semaphore so that we don't abuse the AsyncTask.THREAD_POOL_EXECUTOR. If not for the fact that other participants are unlikely to be well behaved, we could just rely on the rejected exceptions + caller runs to implement pushback.

wilhuff · 2019-06-26T02:49:07Z