Skip to content

Find a way to kill blocking handler callbacks #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fhessel opened this issue Dec 19, 2017 · 8 comments
Open

Find a way to kill blocking handler callbacks #3

fhessel opened this issue Dec 19, 2017 · 8 comments
Assignees
Labels

Comments

@fhessel
Copy link
Owner

fhessel commented Dec 19, 2017

Currently, when the HTTPSConnection passes control to the request handler function, this will pass control of the server's task/thread to this handler function. If it does not return (e.g. due to a misbehaving loop running infinitely), the server gets stuck there, does not accept any new connections and also does not handle other ongoing connections anymore.
There should be some possibility to revoke control from those handler tasks, or at least a way for them to pass control back to the HTTPSConnection, without finishing the request. Maybe one could introduce a return parameter to the callback function that can be set to true to state that the very same callback has to be called again later, e.g. when more data becomes available on the underlying SSL socket. It would be nice to find a comprehensive approach that handles this problem and websockets as well (see #2).

@fhessel fhessel self-assigned this Dec 19, 2017
@fhessel fhessel added the bug label Mar 21, 2018
@ksere
Copy link

ksere commented Jun 5, 2018

Hi,

I am getting some requests stuck when the server tries to close the connection here as SSL_Shutdown never returns.

You think this is related to this issue or should I look at something else?

Cheers!

@fhessel
Copy link
Owner Author

fhessel commented Jun 5, 2018

Hi, thank you for the feedback!

I created this specific issue mainly to improve handling more than one connection in parallel. As every request is delegated to a user-written handler function and the server passes control to this function, the server may be blocked as all connections are dealt with consecutively. If you imagine a client sending a big request body, which is split over several TCP packets, and one of those packets does not arrive, the handler function will wait for it indefinitely and never return. This issue is about solving this by allowing the handler function to return with a flag or something that says: "This request hasn't been processed completely, call me later again".

Your problems seems to be similar, but not related to actual request handling but to tearing down the SSL connection. This part of connection handling should also not block the whole server task, obviously. I did not experience this issue by myself, but I suppose it might be related to the close_notify flags used by TLS. If the server initiates the shutdown, it should send a close_notify to the client, which then should reply with a record including close_notify, too. After that, the TCP connection can safely be teared down. Explicitly closing the TLS context is important for HTTPS to prevent truncation attacks.
If you experience this problem only for some requests, the response from the client could be missing and maybe that's what SSL_shutdown is waiting for. In this case (eg. when the client is just gone), a timeout (and raising some kind of exception when it's triggered) should be a viable solution. But I admit that I need to have an in-depth look at the implementation to determine if that really is the cause here.

But we could cover this in this issue too, as it's definitively a way to block the server during request handling (maybe even from client side), with an impact on all connections to the server.

To make some progress on it: Could you already determine whether the problem really is the SSL_shutdown function not returning, or rather the loop around it?

If the loop is causing trouble and SSL_shutdown just returns 0 all the time, a possible solution would be to introduce a STATE_CLOSING to the HTTPSConnection and, while being in this state, trying SSL_shutdown once and then proceeding with other connections. That should stop blocking the server completely.

@ksere
Copy link

ksere commented Jun 6, 2018

Warning! My level of experience with C/C++ is very low.

Could you already determine whether the problem really is the SSL_shutdown function not returning, or rather the loop around it?

From what i can tell, when this happens SSL_shutdown(_ssl) never returns true. Execution never proceeds to the next line.

This happens when the request is originating from a native Java android app i 've put together from bits and pieces here and there (Also not a Java developer :) ). If the request comes from a browser or Postman, it works fine and the SSL teardown completes. So it is for sure that the client does something wrong and i should fix that, but as you agree as well, the server should handle that case.

Edit: I just discovered, the SSL teardown completes and execution proceeds after the amount of time the client has set to connection KeepAlive. It was set to 15minutes and i never saw it proceed before.

If the loop is causing trouble and SSL_shutdown just returns 0 all the time, a possible solution would be to introduce a STATE_CLOSING to the HTTPSConnection and, while being in this state, trying SSL_shutdown once and then proceeding with other connections. That should stop blocking the server completely.

As a workaround for my case i was thinking of implenting a 'watchdog' like method which would either completely restart the device, or perform a check in that specific closeConnection() method.

While googling i came upon this (which i assume is the same or related openssl library with ESP32's) which states among others:

SSL_shutdown() tries to send the "close notify" shutdown alert to the peer. Whether the operation succeeds or not, the SSL_SENT_SHUTDOWN flag is set and a currently open session is considered closed and good and will be kept in the session cache for further reuse.

Does that imply that you shouldn't wait for the SSL_Shutdown return value? Not sure, further below the contradict that.

Thanks for the quick reply & thanks for creating the only working & complete (as far as i know) https esp32 server library!

@fhessel
Copy link
Owner Author

fhessel commented Jun 6, 2018

While googling i came upon this (which i assume is the same or related openssl library with ESP32's) which states among others:
[...]
Does that imply that you shouldn't wait for the SSL_Shutdown return value? Not sure, further below the contradict that.

You are right that the server should not wait for it indefinitely, but have a look at the description for a return value of 0:

The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred.

That's the reason for the loop I used in the code (the line you mentioned above):
while(SSL_shutdown(_ssl) == 0) delay(1);

This code was only a first attempt, and it assumes the client to eventually send the close notify flag – an assumption that will indeed not hold in real world scenarios. And while it may help to detect truncation attacks, it opens an easy way to run a denial of service attack against the server. So I agree with you that this should be changed.

I can have a look into this and try to provide a fix, but it may take me some days due to other activities.

In the mean time, you should be fine by changing line 121 to SSL_shutdown(_ssl); (without the loop around it) – at least according to the specs it should return after sending the closing notification on first call – or going for a watchdog approach.

Another important thing regarding keep-alive is that it may be bound to the TCP socket rather than the TLS connection on client side. This would mean that SSL_shutdown will not force the client side to close the connection. In this case, this would only happen later when close(_socket) is called. That could be an explanation of the behavior you observed (resuming after 15 minutes).

@ksere
Copy link

ksere commented Jun 7, 2018

Removing the while loop and running SSL_shutdown only once seem to work fine for me and i couldn't notice any other side-effects.

In the meantime although I couldn't find what was wrong with my client not respecting the close notify request, i discovered that there was another issue with the keep-alive and the client not re-using the active connection. Thus opening new connections, reaching the server's max connections and then getting stuck till the oldest connection was successfully closed.

Also consider making the server's keep-alive timeout configurable if there aren't any other issues with that i may not be aware of.

Thanks for looking into this. There is no hurry for me for a fix, it is working fine as it is.

Cheers!

@fhessel
Copy link
Owner Author

fhessel commented Jun 7, 2018

Also consider making the server's keep-alive timeout configurable if there aren't any other issues with that i may not be aware of.

It's not available via API yet, but you might be looking for this constant.

In the meantime although I couldn't find what was wrong with my client not respecting the close notify request, i discovered that there was another issue with the keep-alive and the client not re-using the active connection. Thus opening new connections, reaching the server's max connections and then getting stuck till the oldest connection was successfully closed.

Clients not reusing a connection or using a connection pool is an issue, but due to the limited resources of the ESP32, it's hard to allow for more parallel connections. Even for only two or three parallel connections, the SSL handshake might fail (the server will than refuse the connection attempt). However, you are able to configure the parallel connections by modifying the third (and optional) parameter of the HTTPSServer constructor and see what works best for your use case.

fhessel added a commit that referenced this issue Jun 7, 2018
…endless loop (see discussion in #3 for details)
@fhessel
Copy link
Owner Author

fhessel commented Jun 7, 2018

I tried to implement the STATE_CLOSING solution I mentioned above, it's on master now. I did just a short test so check I did not break any major functionality, but as I said before, I did not experience your shutdown problem by myself so I was not able to verify if this is a solution to it.

SSL shutdown timeout can be configured here, after that period of time the server will continue to tear down the SSL connection without the client having answered.

@ksere
Copy link

ksere commented Jun 12, 2018

I have updated and so far it looks like its working as it should.
I 'll let you know if i run into anything strange.

Thanks again!

Kostas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants