Skip to content

Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TD-er opened this issue Mar 15, 2019 · 9 comments
Closed

Comments

@TD-er
Copy link
Contributor

TD-er commented Mar 15, 2019

In ESPeasy we have some web page rendering all kinds of variables/strings as a cheat-sheet for the internal variable substitutions.
The same code built using core 2.4.2 and core 2.6.0 (both SDK 2.2.1 and SDK 2.2.2) shows different output.

Core 2.6.0:

  • %ssid% MikroTik0�$ MikroTik0�$
  • %sysname% CO2_defect$ CO2_defect$
  • {E} € %e2%82%ac�$
  • € € %e2%82%ac�$

Core 2.4.2:

  • %ssid% MikroTik MikroTik
  • %sysname% CO2_defect CO2_defect
  • {E} € %e2%82%ac
  • € € %e2%82%ac

This is an example of a conversion which appears to be correct in both:

  • {Y} ¥ %c2%a5
  • ¥ ¥ %c2%a5

These are the ones giving different results (may be others, but these are noticeable)

These are the result from a call to this function:

void repl(const String& key, const String& val, String& s, boolean useURLencode)
{
  if (useURLencode) {
    s.replace(key, URLEncode(val.c_str()));
  } else {
    s.replace(key, val);
  }
}


const char euro[4] = {0xe2, 0x82, 0xac, 0}; // Unicode euro symbol
const char yen[3]   = {0xc2, 0xa5, 0}; // Unicode yen symbol
repl(F("{E}"), euro, s, useURLencode);
repl(F("€"), euro, s, useURLencode);
repl(F("{Y}"), yen, s, useURLencode);
repl(F("¥"), yen, s, useURLencode);

repl(F("%ssid%"), WiFi.SSID(), s, useURLencode);
repl(F("%sysname%"), Settings.Name, s, useURLencode);

The ones that work just fine are the ones that were converted to a String object before calling this function.
The ones that fail are all char arrays, but not all of these fail like can be seen with the "yen" conversion.

It looks like the ones that fail have an odd number of characters (e.g. the example sysname is 10 characters).
These char arrays are all 0-terminated, or else they would have given issues in core 2.4.2 too.
The ones that were converted to a String before calling this repl function, were probably padded with more zeroes due to how the String implementation now does its allocation.

@devyte
Copy link
Collaborator

devyte commented Mar 15, 2019

@TD-er
Your repl() function takes as args const references to Strings. A reference is in essence a glorified pointer that has specific restrictions (e.g.: you can't change a reference once it's inited).
When you call repl() like so:

repl(F("{E}"), euro, s, useURLencode);
repl(F("€"), euro, s, useURLencode);
repl(F("{Y}"), yen, s, useURLencode);
repl(F("¥"), yen, s, useURLencode);

you're passing in a F("blah") macro and a pointer to const char [] instead of references to String objects. What exactly do you expect? There is no String object anywhere for which to pass a reference into repl(). I have to wonder how this code builds at all.

If repl() were to take in String args by value instead of reference, then the F() and const char[] would be implicitly cast to a String object that would be passed in. However, in this case passing in a String object would create a duplicate temp String to be passed in, which is not very good.

I suggest one of the following:

  • implementing repl overloads for the specific types you need and forwarding to a central implementation with String
  • a templated function similar to this (untested, no idea if it works):
template <typename keyType, typename valType>
void repl(const keyType &key, const valType &val, bool useURLencode)
{
  if (useURLencode) {
    s.replace(key, URLEncode(val.c_str()));
  } else {
    s.replace(key, val);
  }
}
  • just simply constructing temp String objects explicitly, which makes clear what is going on, like this:
repl(String(F("{E}")), String(euro), s, useURLencode);
repl(String(F("&euro;")), String(euro), s, useURLencode);
repl(String(F("{Y}")), String(yen), s, useURLencode);
repl(String(F("&yen;")), yen, s, useURLencode);

Closing due to user error.

@devyte devyte closed this as completed Mar 15, 2019
@TD-er
Copy link
Contributor Author

TD-er commented Mar 15, 2019

I really don't understand what's wrong here.
I am changing s, which is a String object.
The others are just const types and the compiler should perform the conversion to String when I hand it a char*
I can make it a template function, but then the compiler will create 2 (or more) implementations of the same function that's being templated.
And this is just one example. I guess the code we have is filled with these kind of calls.

@devyte
Copy link
Collaborator

devyte commented Mar 15, 2019

The others are just const types and the compiler should perform the conversion to String

You have this:

void repl(const String& key, const String& val, String& s, boolean useURLencode)

The arg types of key and val are not String, but references to String (String &). The compiler can't convert to a reference, because there is no object that the reference can point to.
You can expect the compiler to convert a const char * to a String, but not to a String *. This is similar: the compiler should not be converting to String &. Like I said, I don't know how this even builds.

If you want to go the template repl way, it may be better like so:

void repl<String>
template <typename keyType, typename valType>
void repl(const keyType &key, const valType &val, bool useURLencode)
{
  if (useURLencode) {
    s.replace(String(key), URLEncode(val.c_str()));
  } else {
    s.replace(String(key), String(val));
  }
}

What does URLEncode() return? If something other than a String, then it needs a wrapper with a temp String obj as well.

There might be a way to avoid the duplication due to the temp Strings by using std::move, but I'm not sure, I don't think I've ever seen move semantics for String being used.

@TD-er
Copy link
Contributor Author

TD-er commented Mar 15, 2019

Is there some way to make the compiler not accept the automatic casts from char* (or other variants) to a String& ?
Otherwise I have to manually look for all calls to functions which accept a String& or const String&
And that's probably going to take weeks and then there will still instances be missed.
In our code that's about 500 function calls using String&, so each and every call to those should be checked.

@devyte
Copy link
Collaborator

devyte commented Mar 15, 2019

I said:

I don't know how this even builds

I'd have to investigate why your code is building at all, and I won't have the bandwidth until the end of the month.

One thing you could do is search for all functions that take a String & as part of its definition. Once you have function signatures, you could:

  • change them (e.g.: add an unused arg), and look for all the places that fail to compile (i.e.: the 500 places you mentioned where the functions get called), then fix each one as appropriate. That shouldn't really take too long if you use appropriate tools.
  • Remove the reference from the function to convert to pass-by-value. Except for some obscure cases, this should "just work", but depending on the call stack, and how the functions get called, RAM usage and heap fragmentation could increase a bit.

@TD-er
Copy link
Contributor Author

TD-er commented Mar 17, 2019

I've been looking into the changes of the String class: https://github.com/esp8266/Arduino/pull/5690/files

Also I do not agree that this is not a bug, since it is a bug.
The compiler does call the constructor of the String class (constructor with a single argument), which means in my example it will call the constructor String(const char *cstr = "");

String::String(const char *cstr) {
init();
if(cstr)
copy(cstr, strlen(cstr));
}

The code called in init() doesn't do that much exciting stuff.

But the copy does:

String & String::copy(const char *cstr, unsigned int length) {
if(!reserve(length)) {
invalidate();
return *this;
}
setLen(length);
strcpy(wbuffer(), cstr);
return *this;
}

This is calling reserve(), which calls changeBuffer():

unsigned char String::reserve(unsigned int size) {
if(buffer() && capacity() >= size)
return 1;
if(changeBuffer(size)) {
if(len() == 0)
wbuffer()[0] = 0;
return 1;
}
return 0;
}
unsigned char String::changeBuffer(unsigned int maxStrLen) {
// Can we use SSO here to avoid allocation?
if (maxStrLen < sizeof(sso_buf)) {
if (sso() || !buffer()) {
// Already using SSO, nothing to do
setSSO(true);
return 1;
} else { // if bufptr && !sso()
// Using bufptr, need to shrink into sso_buff
char temp[sizeof(sso_buf)];
memcpy(temp, buffer(), maxStrLen);
free(wbuffer());
setSSO(true);
memcpy(wbuffer(), temp, maxStrLen);
return 1;
}
}
// Fallthrough to normal allocator
size_t newSize = (maxStrLen + 16) & (~0xf);
// Make sure we can fit newsize in the buffer
if (newSize > CAPACITY_MAX) {
return false;
}
uint16_t oldLen = len();
char *newbuffer = (char *) realloc(sso() ? nullptr : wbuffer(), newSize);
if(newbuffer) {
size_t oldSize = capacity() + 1; // include NULL.
if (sso()) {
// Copy the SSO buffer into allocated space
memcpy(newbuffer, sso_buf, sizeof(sso_buf));
}
if (newSize > oldSize)
{
memset(newbuffer + oldSize, 0, newSize - oldSize);
}
setSSO(false);
setCapacity(newSize - 1);
setLen(oldLen); // Needed in case of SSO where len() never existed
setBuffer(newbuffer);
return 1;
}
return 0;
}

Not sure what would be done when not in SSO mode, but to me it looks like this internal SSO-buffer should be initialized with zeroes first before the tekst is copied.
The copy does only copy the data and not the terminating 0. (using memcpy and not strcpy)

char temp[sizeof(sso_buf)];
memcpy(temp, buffer(), maxStrLen);
free(wbuffer());
setSSO(true);
memcpy(wbuffer(), temp, maxStrLen);

N.B. the use of memcpy is the right choice here, but then you must make sure to have the next byte 0-terminated.
Also the reserve does only take into account the length computed by strlen, which does not include the 0-terminator position, so I guess that may also lead to some issues sometimes. (have to check)

I have to check it first, but I guess that initializing the array at line 164 with zeroes will get rid of these problems.
For example:

 char temp[sizeof(sso_buf)] = {0};

It should also lead to some issues when just creating a String with a const char* as argument.
And then one with less than the size of the SSO block.

@TD-er
Copy link
Contributor Author

TD-er commented Mar 17, 2019

Just made a quick test and this does fix this issue:

diff --git a/cores/esp8266/WString.cpp b/cores/esp8266/WString.cpp
index 4d4fa3b..8a62e2f 100644
--- a/cores/esp8266/WString.cpp
+++ b/cores/esp8266/WString.cpp
@@ -129,6 +129,7 @@ String::~String() {
 // /*********************************************/

 inline void String::init(void) {
+    memset(sso_buf, 0, sizeof(sso_buf));
     setSSO(false);
     setCapacity(0);
     setLen(0);
@@ -161,11 +162,11 @@ unsigned char String::changeBuffer(unsigned int maxStrLen) {
             return 1;
         } else { // if bufptr && !sso()
             // Using bufptr, need to shrink into sso_buff
-            char temp[sizeof(sso_buf)];
+            char temp[sizeof(sso_buf)] = {0};
             memcpy(temp, buffer(), maxStrLen);
             free(wbuffer());
             setSSO(true);
-            memcpy(wbuffer(), temp, maxStrLen);
+            memcpy(wbuffer(), temp, sizeof(sso_buf));
             return 1;
         }
     }

I will try to look into a few more places of the code and see if some similar issues arise and als try to make some tests that are reproducible for this.
Then I will make a PR for it.

@TD-er
Copy link
Contributor Author

TD-er commented Mar 17, 2019

When looking at the other parts of the code, I noticed these functions to convert a float or double to a String:

String::String(float value, unsigned char decimalPlaces) {
init();
char buf[33];
*this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf);
}
String::String(double value, unsigned char decimalPlaces) {
init();
char buf[33];
*this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf);
}

There is no check in here for the max. number of decimal places.
So unless dtostrf does this check, it could be that calling it with 31 (or more) decimals will lead to some issues. (not a 0 terminator any more or out of bound issue)

TD-er added a commit to TD-er/Arduino that referenced this issue Mar 18, 2019
A fix for some issue introduced in PR esp8266#5690
See discussion in esp8266#5883
@TD-er
Copy link
Contributor Author

TD-er commented Mar 18, 2019

Another note on the conversions made by the compiler.
See section 12.3 Conversions in the C++ standard and then 12.3.1.

Thing is, the closing of this issue made me doubt about my C++ insights, so I asked a former colleague of mine, who quotes C++ standard section numbers like some quote bible verses and he assured me it was indeed a constructor being called.
So then I started to look further into the code of the String class and its recent changes.
Also mainly because I'm trying to be as lazy as can be, not willing to alter numerous lines of my own code.

Maybe @earlephilhower can also have a look, since he made the SSO feature. (which I like a lot by the way)

earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Mar 20, 2019
Fixes esp8266#5883 and supercedes esp8266#5890

The replace() function was using len() while in the middle of buffer
operations.  In SSO mode len() is not stored separately and is a call to
strlen(), which may not be legal if you're in the middle of overwriting
the SSO buffer, as was the case in ::replace when the replacement string
was longer than the find string.  This caused potential garbage at the
end of the string when accessed.  Instead, just cache the length in a
local while doing the operation.

Also make setLength() explicitly write a 0 in the SSO buffer at the
specific offset.  It's probably not needed, but is safe to do and makes
logical sense.

Add in test cases from esp8266#5890 as well as some new ones that fail on the
unmodified core.
earlephilhower added a commit that referenced this issue Mar 20, 2019
* Fix String::replace() 

Fixes #5883 and supercedes #5890

The replace() function was using len() while in the middle of buffer
operations.  In SSO mode len() is not stored separately and is a call to
strlen(), which may not be legal if you're in the middle of overwriting
the SSO buffer, as was the case in ::replace when the replacement string
was longer than the find string.  This caused potential garbage at the
end of the string when accessed.  Instead, just cache the length in a
local while doing the operation.

Add in test cases from #5890 as well as some new ones that fail on the
unmodified core.

* Fix stack smashing error on 64b

When pointers are 8 bytes long, the size of a String is larger than 16
chars.  Increase the allocated array we're using in the test to avoid a
"stack smashing" error.

* Manually call destructor in test

Just for clarity, manually call the destructor for the Strings() that
are "placement new'd" in the String tests.  It is a no-op for the
existing test, since thanks to SSO there are no memory allocations, but
will help in case someone adds tests later which include longer strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants