Bodyparser branch rewrite #72

jackjansen · 2020-02-21T14:33:42Z

Parsers for both multipart and form-encoded form POST handlers implemented.

The API is geared towards handling file uploads and other large data items, and require the fields to be handled sequentially.

An example program that allows uploading, downloading and editing (for text files) is supplied.

Parsers for both multipart and form-encoded form POST handlers implemented. The API is geared towards handling file uploads and other large data items, and require the fields to be handled sequentially. An example program that allows uploading, downloading and editing (for text files) is supplied.

fhessel · 2020-02-28T23:08:46Z

Sorry for the delay.

I did some minor style-related changes (mostly using only spaces for indentation and explicit if-blocks).

Currently, I'm having some trouble with the URL-encoded parser during POST to edit a file. It stops parsing after a few hundred characters, the file is truncated and I get this in the log:

[HTTPS:I:     87460] New connection. Socket FID=59
[HTTPS:I:     89271] Request: POST /edit?filename=test.txt (FID=59)
[HTTPS:W:     89298] Callback function did not parse full request body

I'll have a look at it and try to fix it. Other than that, the parsers seem to work quite well.

fhessel · 2020-02-28T23:34:15Z

The problem is this block. Adding a log statement makes it visible:

size_t readchars = _request->readChars(bodyBuffer, bodyLength);
HTTPS_LOGI("Got %u chars, wanted %u chars.", readchars, bodyLength);

[HTTPS:I:     59535] Request: POST /edit?filename=test.txt (FID=57)
[HTTPS:I:     59541] Got 512 chars, wanted 7612 chars.
[HTTPS:W:     59561] Callback function did not parse full request body

As the internal buffer of the HTTPRequest is limited to 512 bytes, you won't get more there without reading again. So if the url-encoded content of the text field is bigger than that, the remaining body will never be parsed.
Furthermore, relying on knowing the size will break if you don't have the final request length, e.g., if content-encoding: chunked is used.

jackjansen · 2020-02-29T15:23:25Z

You're right (of course:-).

I have the feeling that worrying about chunked content may be a bit much (I cannot imagine any implementation in their right mind using chunked encoding together with a url-encoded form... Or am I oversimplifying the world and might this happen because lower layers would select chunked encoding in some implementation?).

So a simple while loop around the readChars should do the trick.

I can add and test, but it may be a week or two before I get around to it so feel free to do it yourself if you think its a good idea.

fhessel · 2020-02-29T15:49:56Z

I wouldn't expect desktop browsers using chunked encoding either, but other clients may. Imagine using a Python script to send data that you generate on the fly, then the HTTP client lib would have to buffer everything or switch to chunked transfer encoding.

But chunked transfer isn't currently implemented anyway. My main concern is only not to rely on Content-Length being present, as that cannot be guaranteed.

So a simple while loop around the readChars should do the trick.

Tried that already, but then you get to the problem that there is no 1:1 representation between input and output bytes and you might read chunks like %2 and 0 instead of %20, and urlDecode() isn't able to cope with that.

I started by writing a second variant for urlDecode which a) works on byte buffers and b) can tell the caller that not all bytes have been processed, quite similar to what I've suggested in the issue linked above. Seems like we don't get around that. I think from there it's not much more work, so I can do that.

And you're right with your comment about buffering, that indeed gets quite excessive. But I think that's a problem of the lower layers (HTTPConnection etc.), and has to be resolved there.

I wouldn't put that into this PR, however, there are some very low-hanging fruits for performance improvement (e.g. this not being at least a memcpy). I'm currently trying to combine hardware-based CI with GitHub Actions. Once that's ready, it'll be much easier to work on the code without breaking things.

jackjansen · 2020-03-02T15:45:22Z

Tried that already, but then you get to the problem that there is no 1:1 representation between input and output bytes and you might read chunks like %2 and 0 instead of %20, and urlDecode() isn't able to cope with that

I had a similar issue with the boundary-detection in the multipart reader (you may have CR LF at the end of the buffer but you shouldn't read them until you're sure they're not part of a boundary marker). There I solved it (I hope:-) by never consuming to end-of-buffer, at least not until I know that all data has been read. A similar scheme could be used here, by having the consumer only read at most len()-3 bytes from the buffer until we know that end-of-buffer coincides with end-of-data.

jackjansen · 2020-03-09T20:21:53Z

I've implemented reading the whole body into a malloc()ed buffer (either knowing the length or not knowing it). This isn't the most efficient implementation, but I didn't want to go all-out and implement the buffering scheme. And the inefficiency is only an issue if someone builds a website/webserver that transmits large volumes of data through an url encoded POST, and you shouldn't do that:-)

fhessel · 2020-03-19T16:19:23Z

Sorry again for the delay, I'll give it a review asap. I just wanted to have a rudimentary test script in place before, as it eases checking PRs a lot.

fhessel

Finally got to it. Files are not truncated any more, and from testing with the example, I'd say it works.

Buffering has to be reworked again, but that's not only a problem of the body parser nor this PR. Also, I'm not very happy with the use of malloc, but I see that it's easier to be able to expand the buffers this way.

From checking the code, I only found some issues around memory allocation which might be a problem if the ESP is low on memory.

Just a note: If you accept the suggestions or add a commit, the new checking actions will be triggered and then most likely fail as the scripts are missing in your PR. If you don't want to rebase, you can ignore that. Merging should be possible either way.

examples/Static-Page/Static-Page.cpp

src/HTTPMultipartBodyParser.cpp

examples/HTML-Forms/HTML-Forms.ino

src/HTTPURLEncodedBodyParser.cpp

src/HTTPMultipartBodyParser.cpp

Co-Authored-By: Frank Hessel <[email protected]>

fhessel · 2020-03-23T22:09:20Z

If I didn't miss anything, only the comment regarding handling a NULL return value for malloc and realloc in HTTPMultipartBodyParser.cpp remains. With that being fixed, the PR is ready to merge.

jackjansen · 2020-03-24T00:04:29Z

Overlooked that one, sorry. Fixed.

fhessel and others added 3 commits December 6, 2018 20:05

First draft of bodyparser API

7861a98

Merge branch 'master' into bodyparser

2bd0b57

fhessel self-requested a review February 21, 2020 14:40

adjust coding style

e79c3be

Read the whole body. Also cater for not knowing the ContentLength.

b1dcfd4

fhessel reviewed Mar 23, 2020

View reviewed changes

jackjansen and others added 4 commits March 23, 2020 21:50

Update src/HTTPMultipartBodyParser.cpp

250decd

Co-Authored-By: Frank Hessel <[email protected]>

Update src/HTTPURLEncodedBodyParser.cpp

0c392ed

Co-Authored-By: Frank Hessel <[email protected]>

Merge branch 'master' into bodyparser

c3c3bbf

Removed outdated file

bf49c77

Check for 2 more out-of-memory conditions

9a26346

Build HTML-Forms example during ci

4681980

fhessel approved these changes Mar 24, 2020

View reviewed changes

fhessel merged commit be34405 into fhessel:master Mar 24, 2020

jackjansen deleted the bodyparser branch August 4, 2024 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bodyparser branch rewrite #72

Bodyparser branch rewrite #72

jackjansen commented Feb 21, 2020

fhessel commented Feb 28, 2020

fhessel commented Feb 28, 2020

jackjansen commented Feb 29, 2020

fhessel commented Feb 29, 2020

jackjansen commented Mar 2, 2020

jackjansen commented Mar 9, 2020

fhessel commented Mar 19, 2020

fhessel left a comment

fhessel commented Mar 23, 2020

jackjansen commented Mar 24, 2020

Bodyparser branch rewrite #72

Bodyparser branch rewrite #72

Conversation

jackjansen commented Feb 21, 2020

fhessel commented Feb 28, 2020

fhessel commented Feb 28, 2020

jackjansen commented Feb 29, 2020

fhessel commented Feb 29, 2020

jackjansen commented Mar 2, 2020

jackjansen commented Mar 9, 2020

fhessel commented Mar 19, 2020

fhessel left a comment

Choose a reason for hiding this comment

fhessel commented Mar 23, 2020

jackjansen commented Mar 24, 2020