fix: retry stuck requests/responses #4187
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Detect stuck http requests/responses and retry them once. Why only once? Because if they get stuck more than once, the problem is not temporary and we just need to inform the user that there is a problem.
In our cloud builds integration tests we reproduced stuck request and we got a network packets dump. It showed that, for some reason, after we send to the server Client Hello, and it returns ACK, it doesn't send us Server Hello. This causes the request to get stuck on waiting the Server Hello packet. After 15 minutes, we try fo send [FYN, ACK], but the server is not responding and we try to send [FYN, ACK] again and again.
While investigating the issue, we found this blog post -> https://www.snellman.net/blog/archive/2017-07-20-s3-mystery/ for stuck responses. That's why we have logic to check for response packets every 10 seconds.
Since the bug is sporadic, we need to simulate it. The easiest way is to build some big Angular project in the cloud using the nativescript-cloud extension on Linux operating system.
Those are the cases which need to be tested and the commands I used to simulate the network issues:
Stuck requests:
- The CLI should fail after 2 minutes if the request is stuck.
-
sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -A INPUT -p tcp -s livesync.ly -j DROP
- Stop the incoming Server Hello packets from livesync.ly-
tns build cloud android --accountId 1
- The CLI will print this warning The request can't receive any response. Retrying request to ... and 1 minute after that, it will fail with The request can't receive any response.- The CLI should download the build result if there are no problems during the retry.
-
sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -A INPUT -p tcp -s livesync.ly -j DROP
- Stop the incoming Server Hello packets from livesync.ly-
tns build cloud android --accountId 1
- The CLI will print this warning The request can't receive any response. Retrying request to ...-
sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -D INPUT -p tcp -s livesync.ly -j DROP
- Delete the drop server hello packets rule- The CLI should download the build result.
Stuck responses:
- The CLI should fail if there are no response body packets received from the server for 20 seconds.
-
tns build cloud android --accountId 1
- When the CLI prints Finished cloud build of '', platform: 'Android', configuration: 'Debug', buildId: successfully. Downloading result..., we need to drop all packets from the server.-
sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
- Drop all packets from the server.- The CLI will print Can't receive all parts of the response. Retrying request to ...
-
sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
- Allow the incoming traffic from the server to send the request- After 1 second execute
sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
- Drop all packets from the server. After 10 seconds the CLI should fail with Can't receive all parts of the response.- The CLI should download the build result if there are no problems during the retry.
-
tns build cloud android --accountId 1
- When the CLI prints Finished cloud build of '', platform: 'Android', configuration: 'Debug', buildId: successfully. Downloading result..., we need to drop all packets from the server.-
sudo iptables -A INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
- Drop all packets from the server.- The CLI will print Can't receive all parts of the response. Retrying request to ...
-
sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
- Allow the incoming traffic from the server to send the request- The CLI should download the build result.
The CLI should not retry requests with big responses if there is at least one packet sent from the server each 10 seconds.
-
sudo iptables -m u32 --u32 "0&0xffff=0x2388:0xffffffff" -A INPUT -p tcp -j DROP
- Hacky way to slow down the download speed.-
tns build cloud android --accountId 1
- The download of the build result should be more than 1 minute (because of the hack in the step above). The CLI should not print any warnings and should download the build result.If you test this PR, don't forget to remove all the iptables rules:
sudo iptables -m u32 --u32 "49&0xFF=0x16 && 54&0xFF=0x02" -D INPUT -p tcp -s livesync.ly -j DROP
sudo iptables -D INPUT -p tcp --sport 443 -s 52.0.0.0/6 -j DROP
sudo iptables -m u32 --u32 "0&0xffff=0x2388:0xffffffff" -D INPUT -p tcp -j DROP
PR Checklist
What is the current behavior?
Currently when some http request gets stuck while sending or when some download is stuck, the CLI is just waiting.
What is the new behavior?
The CLI detects stuck requests/responses, aborts them and retries them once.
Fixes #4186.