Misaligned multi-byte formats not supported in pgm_read_* #5628

Hoek67 · 2019-01-18T02:48:38Z

the lines

#define pgm_read_float(addr)            (*reinterpret_cast<const float*>(addr))			 //dw fixed
#define pgm_read_float(addr)            (*(const float*)(addr))				// dw fixed

was/is

#define pgm_read_float(addr)            (*reinterpret_cast<const float>(addr))	// error		 
#define pgm_read_float(addr)            (*(const float)(addr))				// error

Notice both were missing the pointer part on the cast.

The text was updated successfully, but these errors were encountered:

Hoek67 · 2019-01-18T02:57:16Z

Downloaded and installed "latest" esp8266 a few nights ago... but seems it wasn't the latest.

Hoek67 · 2019-01-19T22:34:54Z

checked and found the latest version and it still has the float issue. Another issue is uint32_t and float should use the aligned code still. Reason being I was accessing a float and uint32_t variables inside a blob and it kept crashing. I made uint32_t and float use the aligned read but it still crashed. Reason being 1st byte was in 1 4 byte block and the next 3 were in the next. This can also happen for uint16_t inside blobs. Ended up writing a quick and nasty replacement function that takes into account the bytes being on different 4 byte boundaries.

[code]
static inline uint32_t pgm_read_with_offset(const void *addr)
{
uint32_t addr32 = (uint32_t)addr;
uint32_t offset = addr32 & 3; // 0 = aligned ... 1 = 1 out etc

	if (!offset) // aligned
	{
		return *(uint32_t *)addr;
	}

	uint32_t dword1 = *(uint32_t *)(addr32 & 0xFFFFFFFC);  // align to 4 bytes
	uint32_t dword2 = *(uint32_t *)((addr32 + 4) & 0xFFFFFFFC); // align to 4 bytes then add 4 to go to next boundary

	offset *= 8; // bytes to bits

	return (dword1 >> (offset)) | (dword2 << (32 - offset));
}

[/code]

earlephilhower · 2019-01-23T23:54:32Z

Updated title to better reflect the question.

The underlying reason is that misaligned word and half-word reads are not supported on the CPU in the 8266, and result in a hardware exception. Many RISC CPUs have this same requirement (but ARMv7 supports it while earlier v5/etc. don't...so it "just works" on Arduino-brand ARM SAMD boards)

The current code is focused on maximum speed and minimum size. Aligned 32-bit operands can be read directly from flash with a single instruction, like reading from any other part of memory. Aligned 16-bit reads read the 32-bit word in 1 swoop and pick the upper or lower half with a couple instructions.

Adding in misaligned support will decrease read performance in the common, aligned case, by 2-3x and increase each instance of the inline pgm_read_* by probably 4-5x, but it needs to be written in assembly first.

The question up for debate is whether to:

Support full Arduino semantics (based off of an 8-bit CPU which couldn't even read a 16- or 32-bit value, so all word/halfwords are SW emulated) with respective code bloat and slowdown
Same as above, but add in a pgm_read_aligned macro which would be the existing ones. All code in the core and libraries would need to be patched to use this to regain space/performance, but Arduino 8-bit stuff would just work
Inverse of above. Support it with a pgm_read_*_unaligned macro that can be selectively used by people with the need (i.e. from packed structures coming from 8-bit-land)
Not support it at all because the HW can't do it natively

I'd be interested to see if others have any strong feelings one way or the other. I'm inclined to (3) or (4) since the others mean a massive bloat or lots of work on SSL and core libs.

Makuna · 2019-01-24T01:44:14Z

To me this is simple. If the API is present in Arduino, it should be compatible with no changes to library or sketch. If performance is an issue, then enhancing the API to expose the support is what you should do. So #2 above. When I wrote the original this was the goal. I guess the goal had gotten changed over time by other contributors.

earlephilhower · 2019-01-24T03:26:04Z

I hear you, compatibility is a good thing, but taken to extremes you end up with Windows 10. Plus, we're running a little shy of the MSFT staffing levels. :(

I've only been able to locate 2 filed issues of this sort in the (3?) years the repo's been around. It may be worthy of doing, but at a low prio compared to other things on the long laundry list.

PRs are always welcome, of course. :)

earlephilhower · 2019-01-24T03:53:22Z

So, actually, @Makuna, there's only a few spots in the core where pgm_read_word or pgm_read_dword are called.

I don't even bother w/pgm_read_dword in BearSSL.a (just read the 32-bit), but there are ~30 spots w/uint16_t's being read out that would need patching.

libc doesn't have it, it only does byte-wise pmem access.

LWIP, we'd need to @d-a-v to check.

devyte · 2019-01-24T04:43:33Z

I'd like to point out here that slowing this down in the name of compatibility could be a breaking change, if the slowdown is enough to cause a side effect in some app or lib, e.g.: trigger the WDT, increase power consumption due to additional wake time, etc.

How about a mix of points 2 and 3?

Implement pgm_*_aligned() with the current code (fast, smaller bin, but crashes with unaligned addresses)
Implement pgm_*_unaligned() with the proposed code (compatible, but several X slower, and bigger bin)
Implement pgm_*() as a simple forwarder to one of the two above. This would work as a selector.

Users who need one or the other can change the definition to the one they need. Or they can directly use the version they need.

Compatibility is good, but so is performance. The trade-off can be needed in either way depending on the user's need. Whatever else, I think the choice to go one way or the other should be available in some form, and having a kind of master switch for the behavior is easier to manage on our side, and easier to handle/migrate/whatever on the user side.

earlephilhower · 2019-01-24T14:59:16Z

I was worried like you were, too, @devyte. But after looking through things in the core and libs, I'm not so worried about performance. The majority of PMEM accesses are bytewise (strings) which have no change or 32-bit direct w/o pgm_* macros (memcpy_P, strcpy_P, etc. which do byte until things are aligned, then go int32-wise).

3rd party libs, I would hazard, are going to have a similar profile.

earlephilhower · 2019-01-24T15:23:06Z

--edit--
After actually testing the different misaligned code I found a couple bugs. The real, working versions of the unaligned macros expand to ~15x size (insns and space) of the 1-insn aligned copies, but you do need the short-circuit if (aligned) return direct-read. So they're only ~4x the runtime of the aligned-only versions when reading aligned data.

d-a-v · 2019-01-24T17:30:12Z

LWIP, we'd need to @d-a-v to check.

As far as I know, lwIP has nothing in flash (no hardcoded table).

Makuna · 2019-01-24T22:40:07Z

sorry about that, I wanted to ingore what I was typing and I hit the close button

earlephilhower · 2019-01-29T16:40:07Z

#5692 will track the PR for implementing this. Any assembly optimizations you can come up with for the misaligned versions of the macros would be appreciated, because there's pretty severe inflation even at -O3 which produces ASM which doesn't look bad at all to me...

ESP8266 cores 2.5.1 and 2.5.2 have an alignment issue causing incorrect reading of floats in PROGMEM See esp8266/Arduino#5628 and esp8266/Arduino#5692 By default the 2.5.1 and 2.5.2 cores use pgm_read_float_unaligned() but then the PROGMEM floats are not properly read. The read frequencies were way to high resulting in crackling noise or high pitched sound. Added a conditional redefine for ESP8266 environment which fixes properly reading the float values.

Hoek67 closed this as completed Jan 18, 2019

Hoek67 reopened this Jan 19, 2019

devyte assigned earlephilhower Jan 22, 2019

earlephilhower changed the title ~~bug in pgmspace.h for "float" in the if ... and else parts~~ Misaligned multi-byte formats not supported in pgm_read_* Jan 23, 2019

earlephilhower added this to the 2.6.0 milestone Jan 24, 2019

earlephilhower mentioned this issue Jan 24, 2019

Fix pgm_read_float() macro definitions #5666

Merged

Makuna closed this as completed Jan 24, 2019

Makuna reopened this Jan 24, 2019

earlephilhower mentioned this issue Jan 29, 2019

Unaligned access support for pgm_read_word/dword #5692

Merged

earlephilhower closed this as completed in #5692 Apr 11, 2019

maxint-rd mentioned this issue Oct 3, 2019

pgm_read_float alignment issue when reading from an array of floats #6590

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misaligned multi-byte formats not supported in pgm_read_* #5628

Misaligned multi-byte formats not supported in pgm_read_* #5628

Hoek67 commented Jan 18, 2019

Hoek67 commented Jan 18, 2019

Hoek67 commented Jan 19, 2019

earlephilhower commented Jan 23, 2019

Makuna commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

devyte commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

earlephilhower commented Jan 24, 2019 •

edited

Loading

d-a-v commented Jan 24, 2019

Makuna commented Jan 24, 2019 •

edited

Loading

earlephilhower commented Jan 29, 2019

Misaligned multi-byte formats not supported in pgm_read_* #5628

Misaligned multi-byte formats not supported in pgm_read_* #5628

Comments

Hoek67 commented Jan 18, 2019

Hoek67 commented Jan 18, 2019

Hoek67 commented Jan 19, 2019

earlephilhower commented Jan 23, 2019

Makuna commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

devyte commented Jan 24, 2019

earlephilhower commented Jan 24, 2019

earlephilhower commented Jan 24, 2019 • edited Loading

d-a-v commented Jan 24, 2019

Makuna commented Jan 24, 2019 • edited Loading

earlephilhower commented Jan 29, 2019

earlephilhower commented Jan 24, 2019 •

edited

Loading

Makuna commented Jan 24, 2019 •

edited

Loading