Skip to content

Query never completes fix #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Squiry
Copy link
Collaborator

@Squiry Squiry commented Jul 8, 2021

The problem: when our receive buffer fills and conversation demand drops to zero and last item is CommandComplete we never request more frames and will never get ReadyForQuery

To fix this we check if conversation can be completed with message and send onComplete even if demand is zero.

Should fix #401

The problem: when our receive buffer fills and conversation demand drops to zero and last item is `CommandComplete` we never request more frames and will never get `ReadyForQuery`

To fix this we check if conversation can be completed with message and send `onComplete` even if demand is zero.
@Squiry Squiry requested a review from mp911de July 8, 2021 20:52
@mp911de
Copy link
Collaborator

mp911de commented Jul 9, 2021

Thanks a lot. I currently do not understand why demand drops to zero. According to the test, we never cancel or issue a request(n) ourselves that would cap demand, instead StepVerifier requests Long.MAX_VALUE and limitRate translate that to multiple request(1) calls.

Could it be that we have a bug in our accumulation of request(n) calls that we do not properly translate request(1) into our netty subscription?

@Squiry
Copy link
Collaborator Author

Squiry commented Jul 9, 2021

Actually it looks more like windowUntil bug. Let me try to explain what's happening.
Normally request(1) are propagated after each frame back to Conversation in FluxWindowPredicate#671. So we consume our CommandComplete, ask more frames and close current window. But when a lot of buffers are involved the FluxWindowPredicate#671 part is happening after the FluxWindowPredicate#626 called and requests are not propagating. And we will never get our ReadyForQuery to complete flux of Result. Request propagating in windows is kind of hard and I have no idea when it breaks tbh.

@mp911de
Copy link
Collaborator

mp911de commented Jul 9, 2021

Yikes, windowUntil immediately lets me recall an earlier issue. I assumed that we don't use windowUntil anymore because of that. Actually, windowUntil drained the final demand to zero after a few database comments. Paging @simonbasle FYI.

Let me have a look over the code how we can fix it by removing windowUntil usage unless Project Reactor can suggest an alternative.

@mp911de mp911de added for: team-attention An issue we need to discuss as a team to make progress type: bug A general bug labels Jul 9, 2021
@Squiry
Copy link
Collaborator Author

Squiry commented Jul 9, 2021

Oh! I got it! In window onNext#234 goes to a guarded drain loop. But onComplete is called anyway, even if there's drain in other thread happening at this time.

@simonbasle
Copy link

I don't have a total grasp of the issue, but maybe this can help: it is perfectly fine for a Subscription to call Subscriber#onComplete despite having 0 pending request, so I have a feeling this change is acceptable.

Let's take the example of a publisher that can emit 1, 2, 3. If subscriber request(2) it MUST refrain from emitting 3 of course. But if subscriber request(3) then it can and should emit 1/2/3 and onComplete.

@simonbasle
Copy link

Oh! I got it! In window onNext#234 goes to a guarded drain loop. But onComplete is called anyway, even if there's drain in other thread happening at this time.

ah, so there's a bug in reactor then? 🕵️

@Squiry
Copy link
Collaborator Author

Squiry commented Jul 9, 2021

it is perfectly fine for a Subscription to call Subscriber#onComplete despite having 0 pending request

It is, but I'm afraid that some of demand management will go wrong after such demand hacks.

ah, so there's a bug in reactor then? 🕵️

I guess someone with a better reactor knowledge should approve that, I could have understand the code wrong.

@Squiry
Copy link
Collaborator Author

Squiry commented Jul 9, 2021

I assumed that we don't use windowUntil anymore because of that.

Nah, we have some problems in Extended Flow because it's more complex and we had to write it in another way afaik. Simple flow still uses windows.

@simonbasle
Copy link

I guess someone with a better reactor knowledge should approve that, I could have understand the code wrong.

after looking at the code, I'm not sure about that part. I guess if you find a way to demonstrate the issue in a reactor-core test case, that would greatly help.

@Squiry
Copy link
Collaborator Author

Squiry commented Jul 9, 2021

@simonbasle try this one. Window should be lager than windowUntil buffer (256) and a little bit more to sure that command complete arrives after.

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import reactor.core.publisher.Flux;
import reactor.core.scheduler.Schedulers;
import reactor.test.StepVerifier;

import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class ReactorTest {
    @Test
    @Timeout(10)
    void windowUntilTest() {
        final int windowSize = 315;
        AtomicInteger produced = new AtomicInteger(0);
        Flux<Frames> p = Flux.generate(sink -> {
            int currentFrame = produced.incrementAndGet();
            if (currentFrame > windowSize) {
                System.out.println("READY_FOR_QUERY");
                sink.next(Frames.READY_FOR_QUERY);
                sink.complete();
                return;
            }
            if (currentFrame == windowSize) {
                System.out.println("COMMAND_COMPLETE");
                sink.next(Frames.COMMAND_COMPLETE);
                return;
            }
            System.out.println("ROW");
            sink.next(Frames.ROW);
        });


        Flux.from(p)
            .windowUntil(e -> e == Frames.COMMAND_COMPLETE)
            .concatMap(f -> f)
            .limitRate(1, 1)
            .delayElements(Duration.of(10, ChronoUnit.MILLIS), Schedulers.single())
            .limitRate(1, 1)
            .as(StepVerifier::create)
            .expectNextCount(windowSize + 2)
            .verifyComplete();

    }

    enum Frames {
        ROW, COMMAND_COMPLETE, READY_FOR_QUERY
    }
}

Oh! I got it! In window onNext#234 goes to a guarded drain loop. But onComplete is called anyway, even if there's drain in other thread happening at this time.

And it's not in a other thread. The drain happens at the same thread but higher by stack.

@Squiry
Copy link
Collaborator Author

Squiry commented Jul 14, 2021

@mp911de are we waiting for @simonbasle response or I should reimplement simple flow without windows?

@mp911de
Copy link
Collaborator

mp911de commented Jul 14, 2021

Let's move forward with removing windowUntil for the time being and see what will break.

mp911de pushed a commit that referenced this pull request Jul 30, 2021
The problem: when our receive buffer fills and conversation demand drops to zero and last item is `CommandComplete` we never request more frames and will never get `ReadyForQuery`

Signed-off-by: Mark Paluch <[email protected]>

[closes #401][#426]
mp911de added a commit that referenced this pull request Jul 30, 2021
Remove windowUntil(…) to avoid miscalculation of demand in long-running streams. Refine discardOnCancel operator calls to properly guard simple queries.

[#401][#426]

Signed-off-by: Mark Paluch <[email protected]>
mp911de pushed a commit that referenced this pull request Jul 30, 2021
The problem: when our receive buffer fills and conversation demand drops to zero and last item is `CommandComplete` we never request more frames and will never get `ReadyForQuery`

Signed-off-by: Mark Paluch <[email protected]>

[closes #401][#426]
mp911de added a commit that referenced this pull request Jul 30, 2021
Remove windowUntil(…) to avoid miscalculation of demand in long-running streams. Refine discardOnCancel operator calls to properly guard simple queries.

[#401][#426]

Signed-off-by: Mark Paluch <[email protected]>
@mp911de
Copy link
Collaborator

mp911de commented Jul 30, 2021

Thank you for your contribution. That's merged, polished, and backported now. I removed windowUntil and disabled the compound statement TCK test. In return, we now correctly propagate the demand.

@mp911de mp911de closed this Jul 30, 2021
@mp911de mp911de added this to the 0.8.9.RELEASE milestone Jul 30, 2021
mp911de added a commit that referenced this pull request Jul 30, 2021
[#401][#426]

Signed-off-by: Mark Paluch <[email protected]>
mp911de added a commit that referenced this pull request Jul 30, 2021
[#401][#426]

Signed-off-by: Mark Paluch <[email protected]>
mp911de added a commit that referenced this pull request Jul 30, 2021
[#401][#426]

Signed-off-by: Mark Paluch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: team-attention An issue we need to discuss as a team to make progress type: bug A general bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fetch rows never completes
3 participants