Skip to content

fix: retry stuck requests/responses #4187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 30, 2018

Conversation

TsvetanMilanov
Copy link
Contributor

Detect stuck http requests/responses and retry them once. Why only once? Because if they get stuck more than once, the problem is not temporary and we just need to inform the user that there is a problem.

In our cloud builds integration tests we reproduced stuck request and we got a network packets dump. It showed that, for some reason, after we send to the server Client Hello, and it returns ACK, it doesn't send us Server Hello. This causes the request to get stuck on waiting the Server Hello packet. After 15 minutes, we try fo send [FYN, ACK], but the server is not responding and we try to send [FYN, ACK] again and again.

While investigating the issue, we found this blog post -> https://www.snellman.net/blog/archive/2017-07-20-s3-mystery/ for stuck responses. That's why we have logic to check for response packets every 10 seconds.

Since the bug is sporadic, we need to simulate it. The easiest way is to build some big Angular project in the cloud using the nativescript-cloud extension on Linux operating system.

Those are the cases which need to be tested and the commands I used to simulate the network issues:

Stuck requests:
- The CLI should fail after 2 minutes if the request is stuck.
- sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -A INPUT -p tcp -s livesync.ly -j DROP - Stop the incoming Server Hello packets from livesync.ly
- tns build cloud android --accountId 1 - The CLI will print this warning The request can't receive any response. Retrying request to ... and 1 minute after that, it will fail with The request can't receive any response.
- The CLI should download the build result if there are no problems during the retry.
- sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -A INPUT -p tcp -s livesync.ly -j DROP - Stop the incoming Server Hello packets from livesync.ly
- tns build cloud android --accountId 1 - The CLI will print this warning The request can't receive any response. Retrying request to ...
- sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -D INPUT -p tcp -s livesync.ly -j DROP - Delete the drop server hello packets rule
- The CLI should download the build result.

Stuck responses:
- The CLI should fail if there are no response body packets received from the server for 20 seconds.
- tns build cloud android --accountId 1 - When the CLI prints Finished cloud build of '', platform: 'Android', configuration: 'Debug', buildId: successfully. Downloading result..., we need to drop all packets from the server.
- sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP - Drop all packets from the server.
- The CLI will print Can't receive all parts of the response. Retrying request to ...
- sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP - Allow the incoming traffic from the server to send the request
- After 1 second execute sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP - Drop all packets from the server. After 10 seconds the CLI should fail with Can't receive all parts of the response.
- The CLI should download the build result if there are no problems during the retry.
- tns build cloud android --accountId 1 - When the CLI prints Finished cloud build of '', platform: 'Android', configuration: 'Debug', buildId: successfully. Downloading result..., we need to drop all packets from the server.
- sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP - Drop all packets from the server.
- The CLI will print Can't receive all parts of the response. Retrying request to ...
- sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP - Allow the incoming traffic from the server to send the request
- The CLI should download the build result.

The CLI should not retry requests with big responses if there is at least one packet sent from the server each 10 seconds.
- sudo iptables -m u32 --u32 "0&0xffff=0x2388:0xffffffff" -A INPUT -p tcp -j DROP - Hacky way to slow down the download speed.
- tns build cloud android --accountId 1 - The download of the build result should be more than 1 minute (because of the hack in the step above). The CLI should not print any warnings and should download the build result.

If you test this PR, don't forget to remove all the iptables rules:
sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -D INPUT -p tcp -s livesync.ly -j DROP
sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
sudo iptables -m u32 --u32 "0&0xffff=0x2388:0xffffffff" -D INPUT -p tcp -j DROP

PR Checklist

What is the current behavior?

Currently when some http request gets stuck while sending or when some download is stuck, the CLI is just waiting.

What is the new behavior?

The CLI detects stuck requests/responses, aborts them and retries them once.

Fixes #4186.

@TsvetanMilanov TsvetanMilanov self-assigned this Nov 29, 2018
@TsvetanMilanov TsvetanMilanov force-pushed the milanov/retry-stuck-http-reqs branch from 21fbfb0 to a4c2f0b Compare November 29, 2018 15:49
Sometimes we have sporadically stuck http requests/responses and in those cases the CLI
is just waiting and not doing anything. If we retry the requests, they
will pass. That's why we need to add a retry logic in the http client.
@TsvetanMilanov TsvetanMilanov force-pushed the milanov/retry-stuck-http-reqs branch from a4c2f0b to 6e280c4 Compare November 29, 2018 16:57
@TsvetanMilanov TsvetanMilanov merged commit 66862e1 into release Nov 30, 2018
@TsvetanMilanov TsvetanMilanov deleted the milanov/retry-stuck-http-reqs branch November 30, 2018 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants