-
Notifications
You must be signed in to change notification settings - Fork 135
Bodyparser branch rewrite #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Parsers for both multipart and form-encoded form POST handlers implemented. The API is geared towards handling file uploads and other large data items, and require the fields to be handled sequentially. An example program that allows uploading, downloading and editing (for text files) is supplied.
Sorry for the delay. I did some minor style-related changes (mostly using only spaces for indentation and explicit if-blocks). Currently, I'm having some trouble with the URL-encoded parser during POST to edit a file. It stops parsing after a few hundred characters, the file is truncated and I get this in the log:
I'll have a look at it and try to fix it. Other than that, the parsers seem to work quite well. |
The problem is this block. Adding a log statement makes it visible: size_t readchars = _request->readChars(bodyBuffer, bodyLength);
HTTPS_LOGI("Got %u chars, wanted %u chars.", readchars, bodyLength);
As the internal buffer of the HTTPRequest is limited to 512 bytes, you won't get more there without reading again. So if the url-encoded content of the text field is bigger than that, the remaining body will never be parsed. |
You're right (of course:-). I have the feeling that worrying about chunked content may be a bit much (I cannot imagine any implementation in their right mind using chunked encoding together with a url-encoded form... Or am I oversimplifying the world and might this happen because lower layers would select chunked encoding in some implementation?). So a simple while loop around the readChars should do the trick. I can add and test, but it may be a week or two before I get around to it so feel free to do it yourself if you think its a good idea. |
I wouldn't expect desktop browsers using chunked encoding either, but other clients may. Imagine using a Python script to send data that you generate on the fly, then the HTTP client lib would have to buffer everything or switch to chunked transfer encoding. But chunked transfer isn't currently implemented anyway. My main concern is only not to rely on
Tried that already, but then you get to the problem that there is no 1:1 representation between input and output bytes and you might read chunks like I started by writing a second variant for And you're right with your comment about buffering, that indeed gets quite excessive. But I think that's a problem of the lower layers (HTTPConnection etc.), and has to be resolved there. I wouldn't put that into this PR, however, there are some very low-hanging fruits for performance improvement (e.g. this not being at least a |
I had a similar issue with the boundary-detection in the multipart reader (you may have CR LF at the end of the buffer but you shouldn't read them until you're sure they're not part of a boundary marker). There I solved it (I hope:-) by never consuming to end-of-buffer, at least not until I know that all data has been read. A similar scheme could be used here, by having the consumer only read at most |
I've implemented reading the whole body into a malloc()ed buffer (either knowing the length or not knowing it). This isn't the most efficient implementation, but I didn't want to go all-out and implement the buffering scheme. And the inefficiency is only an issue if someone builds a website/webserver that transmits large volumes of data through an url encoded POST, and you shouldn't do that:-) |
Sorry again for the delay, I'll give it a review asap. I just wanted to have a rudimentary test script in place before, as it eases checking PRs a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally got to it. Files are not truncated any more, and from testing with the example, I'd say it works.
Buffering has to be reworked again, but that's not only a problem of the body parser nor this PR. Also, I'm not very happy with the use of malloc
, but I see that it's easier to be able to expand the buffers this way.
From checking the code, I only found some issues around memory allocation which might be a problem if the ESP is low on memory.
Just a note: If you accept the suggestions or add a commit, the new checking actions will be triggered and then most likely fail as the scripts are missing in your PR. If you don't want to rebase, you can ignore that. Merging should be possible either way.
Co-Authored-By: Frank Hessel <[email protected]>
Co-Authored-By: Frank Hessel <[email protected]>
If I didn't miss anything, only the comment regarding handling a |
Overlooked that one, sorry. Fixed. |
Parsers for both multipart and form-encoded form POST handlers implemented.
The API is geared towards handling file uploads and other large data items, and require the fields to be handled sequentially.
An example program that allows uploading, downloading and editing (for text files) is supplied.