Skip to content

Crashes since #6565 flashstring dedup #6574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 6 tasks
TD-er opened this issue Oct 1, 2019 · 24 comments
Closed
5 of 6 tasks

Crashes since #6565 flashstring dedup #6574

TD-er opened this issue Oct 1, 2019 · 24 comments
Assignees

Comments

@TD-er
Copy link
Contributor

TD-er commented Oct 1, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [ESP-12|ESP-01|ESP-07|ESP8285 device|other]
  • Core Version: [1aeea12]
  • Development Env: [Platformio]
  • Operating System: [Windows]

Settings in IDE

  • Module: [Generic ESP8266 Module]
  • Flash Mode: [dio]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Lower Memory|Higher Bandwidth]
  • Reset Method: [nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [OTA|SERIAL]
  • Upload Speed: [115200] (serial upload only)

Problem Description

Update to last feature/stage state does reduce bin size, but also causes reboots with exception 28
See #6565 #6565 (comment)

Decoded stack trace:

PS C:\GitHub\TD-er\ESPEasy> python .\tools\decoder.py -t C:\Users\gijs\.platformio\packages\toolchain-xtensa\ -e .\.pio\build\custom_ESP8266_4M1M\firmware.elf .\stacktrace.txt
Exception: 28 (LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads)

epc1:     0x4000e140
epc2:     0x00000000
epc3:     0x00000000
excvaddr: 0x00000000
depc:     0x00000000

ctx: cont

sp:       0x3fff20b0
end:      0x3fff2480
offset:   0x000001a0

stack:
0x402636f9: String::operator=(char const*) at ??:?
0x4026354d: String::move(String&) at ??:?
0x40264495: String::substring(unsigned int, unsigned int) const at ??:?
0x402636aa: String::operator=(String&&) at ??:?
0x4023ab9e: parseCompleteNonCommentLine(String&, String&, String&, String&, bool&, bool&, bool&, bool*, bool*, unsigned char&, unsigned char&) at ??:?
0x402b49e0: chip_v6_unset_chanfreq at ??:?
0x40263928: String::operator=(__FlashStringHelper const*) at ??:?
0x4026328d: String::invalidate() at ??:?
0x4024412f: rulesProcessingFile(String const&, String&) at ??:?
0x402639b1: String::concat(char const*, unsigned int) at ??:?
0x40270e88: spiffs_cache_page_get_by_fd at ??:?
0x40263427: String::reserve(unsigned int) at ??:?
0x40263b22: String::concat(int) at ??:?
0x402b0031: sleep_reset_analog_rtcreg_8266 at ??:?
0x40233159: rulesProcessing(String&) at ??:?
0x40263928: String::operator=(__FlashStringHelper const*) at ??:?
0x40263958: String::String(__FlashStringHelper const*) at ??:?
0x402bd840: chip_v6_unset_chanfreq at ??:?
0x4025a55b: setup at ??:?
0x4026f6c4: std::_Function_handler<void (WiFiEventSoftAPModeStationDisconnected const&), void (*)(WiFiEventSoftAPModeStationDisconnected const&)>::_M_invoke(std::_Any_data const&, WiFiEventSoftAPModeStationDisconnected const&) at ??:?
0x40264cf8: loop_wrapper() at core_esp8266_main.cpp:?
0x401003d1: cont_wrapper at ??:?

It is a call to this function (and this revision of the file): https://github.com/letscontrolit/ESPEasy/blob/6267b9f00fbfbb9aedb61b408d51e17f76fd6330/src/ESPEasyRules.ino#L313-L422

@earlephilhower
Copy link
Collaborator

earlephilhower commented Oct 1, 2019

The one change apps will see is that all PSTRs may no longer be aligned to 4-byte boundaries anymore due to coalescing (i.e. auto a = PSTR("Theend"); auto b = PSTR("end"); will make b point to (a+3) ).

So code which assumed a by-4 alignment for PSTRs may have trouble (but by-4 was never guaranteed and is a relatively new addition).

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

I assume it does also apply to F() macro strings, since I don't have that many PSTR defined strings in the code and the significant binary size reduction suggests a lot of deduped strings.

@earlephilhower
Copy link
Collaborator

F() uses PSTR() under the hood.

@mhightower83, I have a niggling sensation that you had a by-4 alignment assumption for PSTRs somewhere in your code. Alignment should be by-4 unless the PSTR in question is made to point into another string (i.e. another PSTR of longer length ends with your full PSTR).

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

Added a decoded stack trace to the start post.

Perhaps it is already the first line in that function:

  const bool lineStartsWith_on = line.substring(0, 3).equalsIgnoreCase(F("on "));

I looks like the only calls of functions in the String class with F() strings are:

  • equalsIgnoreCase
  • indexOf

There is also a simple assignment, but that's unlikely the cause, since that is being used before this function is called. ( String log = F("RuleDebug: "); )

@earlephilhower
Copy link
Collaborator

So the me21 script is not showing line numbers? That stinks, there's no reason it can't find lines, the exe used dumps them...

Anyway, looks like you have a NULL pointer problem. Could be related to the F("on ") or the line.substring variable (in RAM). Can you give the ELF and MAP for the file that the crash comes from to check the address used for F("on ").

Also, what would be most useful, can you get a MCVE from this so we can run ourselves and see the crash and peek around?

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

How can I get the address of F("on ") from the elf file?

@earlephilhower
Copy link
Collaborator

Just attach the ELF and MAP, I can open them up and see...

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

Since I did make a new build (with NDEBUG removed as build flag) the stack trace may also be changed.
Stack trace is included in the rar file

Not sure where the map file is though, I don't see a file with map in the name in the .pio/build directory

@earlephilhower
Copy link
Collaborator

@mhightower83, my comment above looks to be in error. GCC and LD still obey the __align directive even when coalescing strings. A simple test shows it only merging strings which match on a by-4 address:

int i=0;

void call1() {
  const char *str2 = PSTR("ello, world!");
  Serial.println(str2);
  Serial.printf("%p\n", &(str2[0]));
  i++;
}
void call2() {
  const char *str3 = PSTR(", world!");
  Serial.println(str3);
  Serial.printf("%p\n", &(str3[0]));
  i++;
}
void setup() {
  Serial.begin(115200);
}

void loop() {
  const char *str1 = PSTR("Hello, world!");
  call1();
  call2();
  Serial.println(str1);
  Serial.printf("%p\n", &(str1[0]));
  delay(1000);
}

Shows

ello, world!
0x40239d60
, world!
0x40239d64
Hello, world!
0x40239d50

(i.e. str2 of (===str1 starting at 2nd char) was not merged w/str1, but str3, which could merge with str1 or str2 ended up merging w/str3 to ensure alignment.)

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

Just looking at the code of String.

For example: inline void setLen(int len) { if (isSSO()) sso.len = len; else ptr.len = len; }
There is no check to see if the capacity of the buffers is sufficient.

What if the new string objects are always a multiple of 4 in size, but the strlen like functions only return upto the first occurrence of a 0?
The value of len() in String is being used as index for the buf in subString, like here:

String String::substring(unsigned int left, unsigned int right) const {
    if(left > right) {
        unsigned int temp = right;
        right = left;
        left = temp;
    }
    String out;
    if(left >= len())
        return out;
    if(right > len())
        right = len();
    char temp = buffer()[right];  // save the replaced character
    wbuffer()[right] = '\0';
    out = wbuffer() + left;  // pointer arithmetic
    wbuffer()[right] = temp;  //restore character
    return out;
}

And the function len() does not check if it is within the capacity of the buffers.
So maybe it is not directly related to the flash string calls, but it could be a failing check on the size of subString where it wants to get a substring with indices out of range of the string.

@earlephilhower
Copy link
Collaborator

It's not a String problem, that bit has been stable for months as-is here and in the ESP32. I wouldn't waste time there. And strings are not a multiple of 4 in length, they're whatever size is allocated (or SSO'd).

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

But you stated that a PSTR may no longer be bound to address multiples of 4.
When running unsigned char String::concat(const __FlashStringHelper * str) , it calls strlen_P which was (if my memory serves me well) optimized to only check if the end was on a byte address and not a word address, but it does start with checking per word.

Edit:
Nope, I was wrong, it does check the initial start per byte: https://github.com/earlephilhower/newlib-xtensa/blob/d9b11922aaf1f796524f46ae7d69bac7e13668ca/newlib/libc/sys/xtensa/string_pgmspace.c#L39-L45

Edit2:

Shows

ello, world!
0x40239d60
, world!
0x40239d64
Hello, world!
0x40239d50

(i.e. str2 of (===str1 starting at 2nd char) was not merged w/str1, but str3, which could merge with str1 or str2 ended up merging w/str3 to ensure alignment.)

It does look like both strings str2 & str3 were merged, right?
So if you somehow change either string to have an odd offset (e.g. "o, world!") then one of then is not at a multiple of 4 bytes for the first byte.

@earlephilhower
Copy link
Collaborator

It does look like both strings str2 & str3 were merged, right?
So if you somehow change either string to have an odd offset (e.g. "o, world!") then one of then is not at a multiple of 4 bytes for the first byte.

Not following you there, sorry.

It did merge str 2 and 3, but only because the alignment matched. If the strings were "ello, world!" and "yellow, world" then the linker would not combine them. PSTRs don't have any reason other than speed of access to need a 4-byte alignment (they didn't in prior releases), so in fact I might want to consider dropping the alignment for even better savings. Now it's probably getting mostly duplicated strings as opposed to any substring matches.

Anyway, the crash in your code is in memmove. TRying to find if it;s source or destination. Could be an OOM condition in the code and not related to the change, but need more debug to see.

If you can find a small app snippet that I can compile as opposed to the full app (which I can't because PIO and me never worked right for the 8266) that would speed things.

earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Oct 1, 2019
GAS doesn't support the C language idiom of catenating two strings
together with quotes (i.e. "x" "y" === "xy").

Specify the section attribute fully in the section attribute, instead,
to allow this.

Fixes esp8266#6575 and probably esp8266#6574
@earlephilhower
Copy link
Collaborator

Try the change in #6577. It looks like your substring was returning NULL (string empty) and so the this pointer for the equalsIgnoreCase was NULL (I think, it's hard w/o a MCVE to be 100% certain), possibly due to the the issue in #6575 .

@dok-net
Copy link
Contributor

dok-net commented Oct 1, 2019

@earlephilhower In the meantime, could you please just:
Revert "Speed up empty String creation slightly (#6573)"
This reverts commit 1aeea12.

That's HEAD of master at the time of writing this, and breaks everything (like simplest WebServer demo).

@dok-net
Copy link
Contributor

dok-net commented Oct 1, 2019

Thank you :-)

@earlephilhower
Copy link
Collaborator

@dok-net, do you have a sample of code that breaks because of the String(const char *p=nullptr) PR? This is taking something that was tested in optimizing for #6569 and applying generally.

Not saying it's not a problem, but code inspection doesn't seem to show anything glaring, so maybe there's a corner case we're missing.

@earlephilhower
Copy link
Collaborator

Ah, I see it does kill HelloServer. Let me do some debug...

@earlephilhower
Copy link
Collaborator

@dok-net try the latest push to #6577, another 1-line fix to avoid String getting into an invalid state.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 1, 2019

I just applied the changes you made in WString.cpp for #6577 and it does now boot fine.
Not sure if we run into other issues, but at least it is now not crashing.

N.B. did not change anything in the pgmspace file.

Edit:
Hmm it may not crash on the String operations, but it still does not work.
Webserver does not start.
Will now apply the pgmspace.h patch.

Edit2:
With also the pgmspace.h changes applied, the unit does seem to work fine again (as far as can be tested in a few minutes)

@earlephilhower
Copy link
Collaborator

Thanks for the updates. The concatenated strings assembling w/o warnings or errors really bothers me. GAS just ignored the remainder of the line after the first ending quote instead of giving a syntax error ("unexpected junk after .string" or something I'd have hoped for).

As I now have 2 reports of success, I'll merge that PR. When you have more runtime on the chip, report back, please.

@dok-net
Copy link
Contributor

dok-net commented Oct 1, 2019

@earlephilhower Master works for me now, again.

@earlephilhower
Copy link
Collaborator

Great, thanks for the update @dok-net .

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Oct 1, 2019
@devyte
Copy link
Collaborator

devyte commented Oct 4, 2019

#6577 is merged, and no new reports are in. If there are further problems, please open a new issue.
Closing.

@devyte devyte closed this as completed Oct 4, 2019
@devyte devyte removed the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants