Separate flash write API into raw and quirk-handling versions #7644

devyte · 2020-10-07T22:38:49Z

Basic Infos

This issue complies with the issue POLICY doc.
I have read the documentation at readthedocs and the issue is not addressed there.
I have tested that the issue is present in current master branch (aka latest git).
I have searched the issue tracker for a similar issue.
If there is a stack dump, I have decoded it.
I have filled out all fields below.

Platform

Hardware: [ESP-12|ESP-01|ESP-07|ESP8285 device|other]
Core Version: [latest git hash or date]
Development Env: [Arduino IDE|Platformio|Make|other]
Operating System: [Windows|Ubuntu|MacOS]

Settings in IDE

Module: [Generic ESP8266 Module|Wemos D1 mini r2|Nodemcu|other]
Flash Mode: [qio|dio|other]
Flash Size: [4MB/1MB]
lwip Variant: [v1.4|v2 Lower Memory|Higher Bandwidth]
Reset Method: [ck|nodemcu]
Flash Frequency: [40Mhz]
CPU Frequency: [80Mhz|160MHz]
Upload Using: [OTA|SERIAL]
Upload Speed: [115200|other] (serial upload only)

Problem Description

CC @drzony @earlephilhower @TD-er

To better understand #7514, I had some (lengthy) discussions with @drzony. Per his explanation (please bear with me here, I'm not an expert here and my understanding may be flawed):

99% of flash brands allow writing multiple times to the same address, as long as bits always flip 1->0 and no bits flip 0->1
This is an unsupported feature exploit. Datasheets explicitly state that writing must be done only to an area that has been previously erased.
Erasing is done on an entire block (4KB in our case). Writes can be done in bulks of 1 to pageSize bytes (256 bytes in our case). Multiple writes can be done within a previously erase block, as long as subsequent writes are always done in offsets that haven't been previously written to.
SPIFFS seems to exploit point 1 above as a design choice. We extended this exploit to our core.
In the case of PUYA chips the exploit doesn't work, so we had to implement the infamous PUYA patch to allow SPIFFS to work.
For normal flash operations within our core, i.e. EEPROM, Updater, eboot, etc, the PUYA patch isn't necessary. It's only SPIFFS that needs it.
For flash operations done elsewhere, i.e. 3rd party libs, the PUYA patch may be needed.

It seems that SPIFFS assumes NOR type flash, while other code like LittleFS assumes NAND type (@earlephilhower), and the PUYA chips don't like the SPIFFS assumption.

The proposal here is:

implement a formal raw flash read/write api. Our core would use this api for all of its internals, except SPIFFS. Behavior could be brand-specific, i.e.: quirks wouldn't be handled, whether in-spec (XMC) or exploit (99% of chip brands).
implement a formal quirk-handling flash read/write api on top of the previous. Special cases such as XMC and PUYA would be handled here. From the user's pov, all flash brands would behave the same, although there could be some cases with performance impacts (PUYA as it's being used now)
Up for discussion: SPIFFS apparently has its own handling for the PUYA case. It could be enabled and its behavior become isolated, or the underlying flash hal could be replaced with the quirk-handling api above for a generalized solution.

The above would mean that the current performance penalty for the PUYA case could be reduced or even eliminated for the non-SPIFFS operations.

TD-er · 2020-10-07T22:54:40Z

Isn't it simply a matter of reserving 1 byte, initialize it with "0x55" (or some other pattern) and try to set one of the 0's to 1.
If the byte you read is anything other than the initial pattern, you know you're dealing with "Puya-like" chips.
If you still see the initial pattern after 1 write attempt, then it is "normal" flash and just set one of the 1's to 0 to mark you completed the test.

This way you know what type of flash it is every time at boot without the need to check it using a write.
I think it may be a bit tricky to assume some expected behavior from a library, so use the existing patch strategy by default, unless you know for sure the library can work with "Puya" like flash and then explicitly mark it so.

devyte · 2020-10-08T02:09:35Z

That's an auto detection approach, which I believe has been discussed elsewhere. It may or may not be viable, similar to the flash size detection approach mentioned somewhere, but the goal here is a different one. Let's not hijack the thread.

drzony · 2020-10-08T06:54:28Z

@devyte I don't think that SPIFFS has a workaround for PUYA, according to pellepl/spiffs#172 it only fixed relying on 0->1 changes to be ignored by flash, but it still does 1->0 changes on the same offset.

drzony · 2020-10-08T07:06:07Z

As for the change in API, I don't think we need "no-quirk" API. It would mean that everyone who wants to use it would need to copy-paste the code of "with-quirks" API to his own code.

My proposal with regards to PUYA/non-PUYA mess is to create Esp::flashWriteNoErase function that can be used by people that want to use the exploit mentioned in point 1. We could then remove PUYA workaround from normal flashWrite.
We would then have always-works-datasheet-correct flashWrite and maybe-works-non-datasheet-corect flashWriteNoErase.

drzony · 2020-10-08T07:16:05Z

@TD-er I think that the first line of your comment is incorrect
I think you are confusing pellepl/spiffs#172 with PUYA.
Write without erase cases:

Changing 0->1 and/or 1->0 on PUYA with the same offset, will result in garbage
Changing 0->1 on some TI flashes will cause an error (they have an interrupt and a special register for that INVDRIS - Invalid Data Raw Interrupt Status), on other flashes it's ignored (write succeeds, but 0 remains 0 on flash)
Changing 1->0 on most of the flashes will result in proper write (including TI flashes)

@devyte Currently there is no workaround for 2 in any of ESP core code, since nobody encountered such a flash on ESP

TD-er · 2020-10-08T07:27:53Z

@drzony Your second point about checking a special register sounds interesting.
Could that be used to 'test' if other libraries rely on specific flash behavior?
That would probably simplify testing libraries to see if they would cause issues if the flash write behavior would be changed.
The 'normal' way for testing would be to include such testing in some debug mode of the write call, similar to what is done now for the Puya patch.

@devyte How do you imagine to switch between flash write behavior per library?
For example add a define in the .cpp of a library to force using a different flash write strategy? (that's what I was thinking of when I mentioned to only allow libraries to use the "optimal" write strategy for libraries known to work well with it)

drzony · 2020-10-08T07:32:57Z

@TD-er It cannot be used that way, since it only prevents 0->1 changes, which is ignored on other flashes (0 stays 0, but write succeeds)

TD-er · 2020-10-08T08:02:40Z

I meant if it marks such an operation as error in a special register, then such a chip could be used to help testing existing libraries without the penalty of needing extra reads.

drzony · 2020-10-08T08:12:25Z

Not really, you need to know beforehand that such chip has this register, otherwise you can't tell whether this is exactly this error (and if you know the chip, then you're done with detection). Anyway, let's not hijack API change thread with autodetection which is a whole large topic on it's own.

TD-er · 2020-10-08T09:38:39Z

It was by no means intended to switch back to "autodetect" but more to address my worries on decisions to make what a "default" flash write strategy should be.
Either way it is about knowingly render existing libraries incompatible (but not knowing which are incompatible as it is impossible to test it all) or keep the existing strategy as default, and only allow libraries known to work with the new strategy to use the new calls.

Or am I missing the point here?

drzony · 2020-10-08T10:02:39Z

Since we are allowing API breaking changes in v3.0 we can rename flashWrite to flashWriteNormal or something like that and add flashWriteNoErase, then the libraries that use flashWrite would need to make counscious decision about what strategy to use. (As an alternative we could add strategy parameter to flashWrite)

earlephilhower · 2020-10-11T01:58:32Z

My $0.02, for what it's worth: Flash != EEPROM, and I'm not really sure it's worth the effort in the end.

For EEPROMs there is expectation of byte-level (and even bit-level) update and everything should "just work." The current EEPROM.cpp library does this pretty well for apps, as far as I can see. It emulates using plain RAM and then flushes on final requests.

NAND Flash, however, has significant limitations in all incarnations I've run into. You have things like erase blocks, partial page programming, and a host of other ugliness. So, apps writing to flash need to take heed of these. Most basic would be flash word alignment and sizing. Current code requires 32b aligned, 32b sizes writes (due to a combination of the HW engine that does the writes and the blob/ROM which have minimal protections). That seems like a fine requirement because the core itself only writes using the blob (already 4K aligned, 4K sized AFAIK), the EEPROM (aligned properly AFAIK), LittleFS (aligned properly AFAIK), and SPIFFS (AFAIK may be aligned properly but (ab)uses some mode which is fine for NOR flash but isn't a property of the NAND flash people are shipping on the 8266 boards today).

SPIFFS is deprecated and not supported upstream, so I really would not sweat that too much.

If a user decides to use the raw flash outside of the API the Arduino core provides, they're welcome to but need to follow the restrictions. It might be worthwhile to throw in some assert()s on the flash write params so users would get warning they were doing something bad at the exact line they do it, as opposed to later on when some bit of flash has been changed in an unpredictable way.

drzony mentioned this issue Dec 29, 2020

Ota commands in flash #6538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate flash write API into raw and quirk-handling versions #7644

Separate flash write API into raw and quirk-handling versions #7644

devyte commented Oct 7, 2020

TD-er commented Oct 7, 2020

devyte commented Oct 8, 2020

drzony commented Oct 8, 2020

drzony commented Oct 8, 2020

drzony commented Oct 8, 2020 •

edited

Loading

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

earlephilhower commented Oct 11, 2020

Separate flash write API into raw and quirk-handling versions #7644

Separate flash write API into raw and quirk-handling versions #7644

Comments

devyte commented Oct 7, 2020

Basic Infos

Platform

Settings in IDE

Problem Description

TD-er commented Oct 7, 2020

devyte commented Oct 8, 2020

drzony commented Oct 8, 2020

drzony commented Oct 8, 2020

drzony commented Oct 8, 2020 • edited Loading

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

TD-er commented Oct 8, 2020

drzony commented Oct 8, 2020

earlephilhower commented Oct 11, 2020

drzony commented Oct 8, 2020 •

edited

Loading