Skip to content

Misaligned multi-byte formats not supported in pgm_read_* #5628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hoek67 opened this issue Jan 18, 2019 · 12 comments · Fixed by #5692
Closed

Misaligned multi-byte formats not supported in pgm_read_* #5628

Hoek67 opened this issue Jan 18, 2019 · 12 comments · Fixed by #5692
Assignees
Milestone

Comments

@Hoek67
Copy link

Hoek67 commented Jan 18, 2019

the lines

#define pgm_read_float(addr)            (*reinterpret_cast<const float*>(addr))			 //dw fixed
#define pgm_read_float(addr)            (*(const float*)(addr))				// dw fixed

was/is

#define pgm_read_float(addr)            (*reinterpret_cast<const float>(addr))	// error		 
#define pgm_read_float(addr)            (*(const float)(addr))				// error

Notice both were missing the pointer part on the cast.

@Hoek67 Hoek67 closed this as completed Jan 18, 2019
@Hoek67
Copy link
Author

Hoek67 commented Jan 18, 2019

Downloaded and installed "latest" esp8266 a few nights ago... but seems it wasn't the latest.

@Hoek67 Hoek67 reopened this Jan 19, 2019
@Hoek67
Copy link
Author

Hoek67 commented Jan 19, 2019

checked and found the latest version and it still has the float issue. Another issue is uint32_t and float should use the aligned code still. Reason being I was accessing a float and uint32_t variables inside a blob and it kept crashing. I made uint32_t and float use the aligned read but it still crashed. Reason being 1st byte was in 1 4 byte block and the next 3 were in the next. This can also happen for uint16_t inside blobs. Ended up writing a quick and nasty replacement function that takes into account the bytes being on different 4 byte boundaries.

[code]
static inline uint32_t pgm_read_with_offset(const void *addr)
{
uint32_t addr32 = (uint32_t)addr;
uint32_t offset = addr32 & 3; // 0 = aligned ... 1 = 1 out etc

	if (!offset) // aligned
	{
		return *(uint32_t *)addr;
	}

	uint32_t dword1 = *(uint32_t *)(addr32 & 0xFFFFFFFC);  // align to 4 bytes
	uint32_t dword2 = *(uint32_t *)((addr32 + 4) & 0xFFFFFFFC); // align to 4 bytes then add 4 to go to next boundary

	offset *= 8; // bytes to bits

	return (dword1 >> (offset)) | (dword2 << (32 - offset));
}		

[/code]

@earlephilhower earlephilhower changed the title bug in pgmspace.h for "float" in the if ... and else parts Misaligned multi-byte formats not supported in pgm_read_* Jan 23, 2019
@earlephilhower
Copy link
Collaborator

Updated title to better reflect the question.

The underlying reason is that misaligned word and half-word reads are not supported on the CPU in the 8266, and result in a hardware exception. Many RISC CPUs have this same requirement (but ARMv7 supports it while earlier v5/etc. don't...so it "just works" on Arduino-brand ARM SAMD boards)

The current code is focused on maximum speed and minimum size. Aligned 32-bit operands can be read directly from flash with a single instruction, like reading from any other part of memory. Aligned 16-bit reads read the 32-bit word in 1 swoop and pick the upper or lower half with a couple instructions.

Adding in misaligned support will decrease read performance in the common, aligned case, by 2-3x and increase each instance of the inline pgm_read_* by probably 4-5x, but it needs to be written in assembly first.

The question up for debate is whether to:

  1. Support full Arduino semantics (based off of an 8-bit CPU which couldn't even read a 16- or 32-bit value, so all word/halfwords are SW emulated) with respective code bloat and slowdown
  2. Same as above, but add in a pgm_read_aligned macro which would be the existing ones. All code in the core and libraries would need to be patched to use this to regain space/performance, but Arduino 8-bit stuff would just work
  3. Inverse of above. Support it with a pgm_read_*_unaligned macro that can be selectively used by people with the need (i.e. from packed structures coming from 8-bit-land)
  4. Not support it at all because the HW can't do it natively

I'd be interested to see if others have any strong feelings one way or the other. I'm inclined to (3) or (4) since the others mean a massive bloat or lots of work on SSL and core libs.

@Makuna
Copy link
Collaborator

Makuna commented Jan 24, 2019

To me this is simple. If the API is present in Arduino, it should be compatible with no changes to library or sketch. If performance is an issue, then enhancing the API to expose the support is what you should do. So #2 above. When I wrote the original this was the goal. I guess the goal had gotten changed over time by other contributors.

@earlephilhower
Copy link
Collaborator

I hear you, compatibility is a good thing, but taken to extremes you end up with Windows 10. Plus, we're running a little shy of the MSFT staffing levels. :(

I've only been able to locate 2 filed issues of this sort in the (3?) years the repo's been around. It may be worthy of doing, but at a low prio compared to other things on the long laundry list.

PRs are always welcome, of course. :)

@earlephilhower
Copy link
Collaborator

So, actually, @Makuna, there's only a few spots in the core where pgm_read_word or pgm_read_dword are called.

I don't even bother w/pgm_read_dword in BearSSL.a (just read the 32-bit), but there are ~30 spots w/uint16_t's being read out that would need patching.

libc doesn't have it, it only does byte-wise pmem access.

LWIP, we'd need to @d-a-v to check.

@earlephilhower earlephilhower added this to the 2.6.0 milestone Jan 24, 2019
@devyte
Copy link
Collaborator

devyte commented Jan 24, 2019

I'd like to point out here that slowing this down in the name of compatibility could be a breaking change, if the slowdown is enough to cause a side effect in some app or lib, e.g.: trigger the WDT, increase power consumption due to additional wake time, etc.

How about a mix of points 2 and 3?

  • Implement pgm_*_aligned() with the current code (fast, smaller bin, but crashes with unaligned addresses)
  • Implement pgm_*_unaligned() with the proposed code (compatible, but several X slower, and bigger bin)
  • Implement pgm_*() as a simple forwarder to one of the two above. This would work as a selector.

Users who need one or the other can change the definition to the one they need. Or they can directly use the version they need.

Compatibility is good, but so is performance. The trade-off can be needed in either way depending on the user's need. Whatever else, I think the choice to go one way or the other should be available in some form, and having a kind of master switch for the behavior is easier to manage on our side, and easier to handle/migrate/whatever on the user side.

@earlephilhower
Copy link
Collaborator

I was worried like you were, too, @devyte. But after looking through things in the core and libs, I'm not so worried about performance. The majority of PMEM accesses are bytewise (strings) which have no change or 32-bit direct w/o pgm_* macros (memcpy_P, strcpy_P, etc. which do byte until things are aligned, then go int32-wise).

3rd party libs, I would hazard, are going to have a similar profile.

@earlephilhower
Copy link
Collaborator

earlephilhower commented Jan 24, 2019

--edit--
After actually testing the different misaligned code I found a couple bugs. The real, working versions of the unaligned macros expand to ~15x size (insns and space) of the 1-insn aligned copies, but you do need the short-circuit if (aligned) return direct-read. So they're only ~4x the runtime of the aligned-only versions when reading aligned data.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 24, 2019

LWIP, we'd need to @d-a-v to check.

As far as I know, lwIP has nothing in flash (no hardcoded table).

@Makuna
Copy link
Collaborator

Makuna commented Jan 24, 2019

sorry about that, I wanted to ingore what I was typing and I hit the close button

@earlephilhower
Copy link
Collaborator

#5692 will track the PR for implementing this. Any assembly optimizations you can come up with for the misaligned versions of the macros would be appreciated, because there's pretty severe inflation even at -O3 which produces ASM which doesn't look bad at all to me...

maxint-rd added a commit to maxint-rd/MmlMusic that referenced this issue Oct 2, 2019
ESP8266 cores 2.5.1 and 2.5.2 have an alignment issue causing incorrect reading of floats in PROGMEM
See esp8266/Arduino#5628 and esp8266/Arduino#5692
By default the 2.5.1 and 2.5.2 cores use pgm_read_float_unaligned() but then the PROGMEM floats are not properly read.
The read frequencies were way to high resulting in crackling noise or high pitched sound.
Added a conditional redefine for ESP8266 environment which fixes properly reading the float values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants