Pulling in batches for async and blocking API with BOLT V4. #637

zhenlineo · 2019-10-10T12:02:37Z

This provides a simple implementation to pull records in batches.
The idea is that when we start, we prefetch default fetch size (1000) records, once the driver receives SUCCESS {has_more=true}, then we automatically request a new batch from the server.

A driver user could cancel the streaming at any time by calling StatementResult#summary or StatementRunner#close.

Changes to Driver API

StatementResult#summary will actually behave like StatementResult#consume. It cancels streaming if the streaming has not been finished.
StatementResult#consume is removed from public API.
Unconsumed records in StatementResult cannot be accessed once StatementRunner#close is called.

Nested queries with StatementRunner#run is still supported.

…going. As a result `StatementResult#consume` is removed as it is the same as `StatementResult#summary` Added tests back.

…ords. Feature left: Nested session runs should buffer all unconsumed records into memory. AutoPull handler does not support auto read depending on local record buffer.

…nto memory. This enables nested session runs.

…pull messages. When creating the async result, we write a RUN message, followed by a PULL message. The RUN and PULL messages shall be flushed together. If RUN and PULL are flushed separately, the following scenario may happen: C: RUN "RETURN Wrong" {} {mode="r"} S: FAILURE Neo.ClientError.Statement.SyntaxError "Variable `Wrong` not defined (line 1, column 8 (offset: 7)) C: RESET C: PULL {n=1000} S: SUCCESS {} S: FAILURE Neo.ClientError.Request.Invalid "Message 'PULL Map{n -> Long(1000)}' cannot be handled by a session in the READY state."

michael-simons

General remark: I like the removal of consume. It will break things for people, but will eventually be much more clear than the dualism of summary() and consume().

Offering a higher level method like the following

	@Override
		public ResultSummary run() {

			try (AutoCloseableStatementRunner statementRunner = getStatementRunner(this.targetDatabase)) {
				StatementResult result = runnableStatement.runWith(statementRunner);
				return result.consume();
			}
		}

will not be longer possible, right?

And would need to look like this

@Override
		public ResultSummary run() {

			try (AutoCloseableStatementRunner statementRunner = getStatementRunner(this.targetDatabase)) {
				StatementResult result = runnableStatement.runWith(statementRunner);
				while(result.hasNext()) {
					result.next();
				}
				return result.summary();
			}
		}

or doing it with .list() and drop the returned values.

Having said that, I would like an overloaded summary taking in a boolean to indicate to actually not only exhaust the current batch, but everything.
But than, why removing consume()?

michael-simons · 2019-10-14T12:04:03Z

driver/src/main/java/org/neo4j/driver/StatementResult.java

    /**
     * Return the result summary.
     *
-     * If the records in the result is not fully consumed, then calling this method will force to pull all remaining
-     * records into buffer to yield the summary.
+     * If the records in the result is not fully consumed, then calling this method will exhausts the result.


Maybe clarify that exhausting means exhausting the current batch, not the whole possible result set / size.

No, it will exhausting the whole result. It is equivalent to Subscription#cancel.

We currently cannot discard N, we can only discard ALL.

The following code is equivalent to a driver user:

// Code with 1.7 driver public ResultSummary run() { try (AutoCloseableStatementRunner statementRunner = getStatementRunner(this.targetDatabase)) { StatementResult result = runnableStatement.runWith(statementRunner); return result.consume(); } } // Code with 4.0 driver public ResultSummary run() { try (AutoCloseableStatementRunner statementRunner = getStatementRunner(this.targetDatabase)) { StatementResult result = runnableStatement.runWith(statementRunner); return result.summary(); } }

michael-simons · 2019-10-14T12:04:38Z

driver/src/main/java/org/neo4j/driver/internal/InternalDriver.java

@@ -164,10 +164,10 @@ private static RuntimeException driverCloseException()
        return new IllegalStateException( "This driver instance has already been closed" );
    }

-    public NetworkSession newSession( SessionConfig parameters )
+    public NetworkSession newSession( SessionConfig config )


Unrelated, but good catch 👍

michael-simons · 2019-10-14T12:07:42Z

driver/src/main/java/org/neo4j/driver/internal/FailableCursor.java

@@ -22,5 +22,6 @@

 public interface FailableCursor
 {
+    CompletionStage<Throwable> consumeAsync();
    CompletionStage<Throwable> failureAsync();


Is this still needed? At least from what I read, you using consumeAsync now where failureAsync on the failable cursor has been used before.

I found that the failable cursor is used as interface for the RxStatementResultCursor as well… Hmm, it feels like that area could need some unification as well.

The different of these two method:
consumeAsync: Discard all unconsumed records and return if any error for the whole execution and streaming. Used in StatementRunner#close, where after this boundary, all records are discarded.
failureAsync: Buffer all unconsumed records into memory and return if any error for the whole execution and streaming. Used for nested queries between session#runs, so that the second run will not discard all previous unconsumed run records.

michael-simons · 2019-10-14T12:16:44Z

driver/src/main/java/org/neo4j/driver/internal/async/NetworkSession.java

@@ -194,7 +196,7 @@ public boolean isOpen()
                if ( cursor != null )
                {
                    // there exists a cursor with potentially unconsumed error, try to extract and propagate it
-                    return cursor.failureAsync();
+                    return cursor.consumeAsync();


This is one of the places I mentioned above…
Wouldn't that change be applicable to other places where failureAsync is used?

michael-simons · 2019-10-14T12:16:58Z

driver/src/main/java/org/neo4j/driver/internal/async/ResultCursorsHolder.java

@@ -74,6 +74,6 @@ private static Throwable findFirstFailure( CompletableFuture<Throwable>[] comple
    {
        return cursorStage
                .exceptionally( cursor -> null )
-                .thenCompose( cursor -> cursor == null ? completedWithNull() : cursor.failureAsync() );
+                .thenCompose( cursor -> cursor == null ? completedWithNull() : cursor.consumeAsync() );


michael-simons · 2019-10-14T12:17:23Z

driver/src/main/java/org/neo4j/driver/internal/cursor/AsyncStatementResultCursorImpl.java

 import org.neo4j.driver.summary.ResultSummary;

-public class AsyncStatementResultCursor implements InternalStatementResultCursor
+public class AsyncStatementResultCursorImpl implements AsyncStatementResultCursor


michael-simons · 2019-10-14T12:27:11Z

driver/src/main/java/org/neo4j/driver/internal/handlers/pulln/BasicPullResponseHandler.java

+ * | onRecord           | X    | X      | yield record ->STREAMING       | X                  | ->CANCELED     |
+ * | onFailure          | X    | X      | ->FAILED                       | X                  | ->FAILED       |
+ *
+ * Currently the error state (marked with X on the table above) might not be enforced.


DONE is an error state?

If it is Done, then there is no further transition you can go from Done state.

The top row of the table defines what state you are currently in. Then the first column defines what action you can perform.

I know this is a bit shaky. I will add a card to improve this class with a proper state machine as you pointed out.

michael-simons · 2019-10-14T12:29:23Z

driver/src/main/java/org/neo4j/driver/internal/handlers/pulln/PullResponseHandler.java

+     */
+    void installSummaryConsumer( BiConsumer<ResultSummary,Throwable> summaryConsumer );
+
+    enum Status


It would be useful to find the state transitions here.

Zhen Li added 3 commits October 8, 2019 12:18

Back pressure support for async and blocking API with BOLT V4.

d100fca

Made Statement#summary to cancel streaming if streaming is still on…

47f709d

…going. As a result `StatementResult#consume` is removed as it is the same as `StatementResult#summary` Added tests back.

Ensure StatementResult#summary will discard all local or remote rec…

91e0bb4

…ords. Feature left: Nested session runs should buffer all unconsumed records into memory. AutoPull handler does not support auto read depending on local record buffer.

zhenlineo force-pushed the 4.0-aync-pull-n branch from 7c2baed to fa0f6cc Compare October 10, 2019 12:26

Changed session.run not discard previous run result, but buffer all i…

da8e744

…nto memory. This enables nested session runs.

zhenlineo force-pushed the 4.0-aync-pull-n branch from fa0f6cc to da8e744 Compare October 10, 2019 13:13

Adding fetchSize at driver config and session config.

36eb748

zhenlineo force-pushed the 4.0-aync-pull-n branch 3 times, most recently from 6e21f34 to d637500 Compare October 11, 2019 12:26

zhenlineo force-pushed the 4.0-aync-pull-n branch from d637500 to 9014d06 Compare October 11, 2019 12:29

zhenlineo requested review from michael-simons and meistermeier October 11, 2019 13:27

michael-simons reviewed Oct 14, 2019

View reviewed changes

zhenlineo merged commit 8d5c196 into neo4j:4.0 Oct 15, 2019

zhenlineo deleted the 4.0-aync-pull-n branch October 15, 2019 11:31

zhenlineo changed the title ~~Back pressure support for async and blocking API with BOLT V4.~~ Pulling in batches for async and blocking API with BOLT V4. Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pulling in batches for async and blocking API with BOLT V4. #637

Pulling in batches for async and blocking API with BOLT V4. #637

Uh oh!

zhenlineo commented Oct 10, 2019 •

edited

Loading

Uh oh!

michael-simons left a comment

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

zhenlineo Oct 15, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

zhenlineo Oct 15, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

zhenlineo Oct 15, 2019 •

edited

Loading

Uh oh!

michael-simons Oct 14, 2019

Uh oh!

Uh oh!

Pulling in batches for async and blocking API with BOLT V4. #637

Pulling in batches for async and blocking API with BOLT V4. #637

Uh oh!

Conversation

zhenlineo commented Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-simons left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhenlineo Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhenlineo commented Oct 10, 2019 •

edited

Loading

zhenlineo Oct 15, 2019 •

edited

Loading