-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Misaligned multi-byte formats not supported in pgm_read_* #5628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Downloaded and installed "latest" esp8266 a few nights ago... but seems it wasn't the latest. |
checked and found the latest version and it still has the float issue. Another issue is uint32_t and float should use the aligned code still. Reason being I was accessing a float and uint32_t variables inside a blob and it kept crashing. I made uint32_t and float use the aligned read but it still crashed. Reason being 1st byte was in 1 4 byte block and the next 3 were in the next. This can also happen for uint16_t inside blobs. Ended up writing a quick and nasty replacement function that takes into account the bytes being on different 4 byte boundaries. [code]
[/code] |
Updated title to better reflect the question. The underlying reason is that misaligned word and half-word reads are not supported on the CPU in the 8266, and result in a hardware exception. Many RISC CPUs have this same requirement (but ARMv7 supports it while earlier v5/etc. don't...so it "just works" on Arduino-brand ARM SAMD boards) The current code is focused on maximum speed and minimum size. Aligned 32-bit operands can be read directly from flash with a single instruction, like reading from any other part of memory. Aligned 16-bit reads read the 32-bit word in 1 swoop and pick the upper or lower half with a couple instructions. Adding in misaligned support will decrease read performance in the common, aligned case, by 2-3x and increase each instance of the inline pgm_read_* by probably 4-5x, but it needs to be written in assembly first. The question up for debate is whether to:
I'd be interested to see if others have any strong feelings one way or the other. I'm inclined to (3) or (4) since the others mean a massive bloat or lots of work on SSL and core libs. |
To me this is simple. If the API is present in Arduino, it should be compatible with no changes to library or sketch. If performance is an issue, then enhancing the API to expose the support is what you should do. So #2 above. When I wrote the original this was the goal. I guess the goal had gotten changed over time by other contributors. |
I hear you, compatibility is a good thing, but taken to extremes you end up with Windows 10. Plus, we're running a little shy of the MSFT staffing levels. :( I've only been able to locate 2 filed issues of this sort in the (3?) years the repo's been around. It may be worthy of doing, but at a low prio compared to other things on the long laundry list. PRs are always welcome, of course. :) |
So, actually, @Makuna, there's only a few spots in the core where pgm_read_word or pgm_read_dword are called. I don't even bother w/pgm_read_dword in BearSSL.a (just read the 32-bit), but there are ~30 spots w/uint16_t's being read out that would need patching. libc doesn't have it, it only does byte-wise pmem access. LWIP, we'd need to @d-a-v to check. |
I'd like to point out here that slowing this down in the name of compatibility could be a breaking change, if the slowdown is enough to cause a side effect in some app or lib, e.g.: trigger the WDT, increase power consumption due to additional wake time, etc. How about a mix of points 2 and 3?
Users who need one or the other can change the definition to the one they need. Or they can directly use the version they need. Compatibility is good, but so is performance. The trade-off can be needed in either way depending on the user's need. Whatever else, I think the choice to go one way or the other should be available in some form, and having a kind of master switch for the behavior is easier to manage on our side, and easier to handle/migrate/whatever on the user side. |
I was worried like you were, too, @devyte. But after looking through things in the core and libs, I'm not so worried about performance. The majority of PMEM accesses are bytewise (strings) which have no change or 32-bit direct w/o pgm_* macros (memcpy_P, strcpy_P, etc. which do byte until things are aligned, then go int32-wise). 3rd party libs, I would hazard, are going to have a similar profile. |
--edit-- |
As far as I know, lwIP has nothing in flash (no hardcoded table). |
sorry about that, I wanted to ingore what I was typing and I hit the close button |
#5692 will track the PR for implementing this. Any assembly optimizations you can come up with for the misaligned versions of the macros would be appreciated, because there's pretty severe inflation even at -O3 which produces ASM which doesn't look bad at all to me... |
ESP8266 cores 2.5.1 and 2.5.2 have an alignment issue causing incorrect reading of floats in PROGMEM See esp8266/Arduino#5628 and esp8266/Arduino#5692 By default the 2.5.1 and 2.5.2 cores use pgm_read_float_unaligned() but then the PROGMEM floats are not properly read. The read frequencies were way to high resulting in crackling noise or high pitched sound. Added a conditional redefine for ESP8266 environment which fixes properly reading the float values.
the lines
was/is
Notice both were missing the pointer part on the cast.
The text was updated successfully, but these errors were encountered: