Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

TD-er · 2019-03-15T19:38:45Z

In ESPeasy we have some web page rendering all kinds of variables/strings as a cheat-sheet for the internal variable substitutions.
The same code built using core 2.4.2 and core 2.6.0 (both SDK 2.2.1 and SDK 2.2.2) shows different output.

Core 2.6.0:

%ssid% MikroTik0�$ MikroTik0�$
%sysname% CO2_defect$ CO2_defect$
{E} € %e2%82%ac�$
€ € %e2%82%ac�$

Core 2.4.2:

%ssid% MikroTik MikroTik
%sysname% CO2_defect CO2_defect
{E} € %e2%82%ac
€ € %e2%82%ac

This is an example of a conversion which appears to be correct in both:

{Y} ¥ %c2%a5
¥ ¥ %c2%a5

These are the ones giving different results (may be others, but these are noticeable)

These are the result from a call to this function:

void repl(const String& key, const String& val, String& s, boolean useURLencode)
{
  if (useURLencode) {
    s.replace(key, URLEncode(val.c_str()));
  } else {
    s.replace(key, val);
  }
}


const char euro[4] = {0xe2, 0x82, 0xac, 0}; // Unicode euro symbol
const char yen[3]   = {0xc2, 0xa5, 0}; // Unicode yen symbol
repl(F("{E}"), euro, s, useURLencode);
repl(F("&euro;"), euro, s, useURLencode);
repl(F("{Y}"), yen, s, useURLencode);
repl(F("&yen;"), yen, s, useURLencode);

repl(F("%ssid%"), WiFi.SSID(), s, useURLencode);
repl(F("%sysname%"), Settings.Name, s, useURLencode);

The ones that work just fine are the ones that were converted to a String object before calling this function.
The ones that fail are all char arrays, but not all of these fail like can be seen with the "yen" conversion.

It looks like the ones that fail have an odd number of characters (e.g. the example sysname is 10 characters).
These char arrays are all 0-terminated, or else they would have given issues in core 2.4.2 too.
The ones that were converted to a String before calling this repl function, were probably padded with more zeroes due to how the String implementation now does its allocation.

The text was updated successfully, but these errors were encountered:

devyte · 2019-03-15T20:46:25Z

@TD-er
Your repl() function takes as args const references to Strings. A reference is in essence a glorified pointer that has specific restrictions (e.g.: you can't change a reference once it's inited).
When you call repl() like so:

repl(F("{E}"), euro, s, useURLencode);
repl(F("&euro;"), euro, s, useURLencode);
repl(F("{Y}"), yen, s, useURLencode);
repl(F("&yen;"), yen, s, useURLencode);

you're passing in a F("blah") macro and a pointer to const char [] instead of references to String objects. What exactly do you expect? There is no String object anywhere for which to pass a reference into repl(). I have to wonder how this code builds at all.

If repl() were to take in String args by value instead of reference, then the F() and const char[] would be implicitly cast to a String object that would be passed in. However, in this case passing in a String object would create a duplicate temp String to be passed in, which is not very good.

I suggest one of the following:

implementing repl overloads for the specific types you need and forwarding to a central implementation with String
a templated function similar to this (untested, no idea if it works):

template <typename keyType, typename valType>
void repl(const keyType &key, const valType &val, bool useURLencode)
{
  if (useURLencode) {
    s.replace(key, URLEncode(val.c_str()));
  } else {
    s.replace(key, val);
  }
}

just simply constructing temp String objects explicitly, which makes clear what is going on, like this:

repl(String(F("{E}")), String(euro), s, useURLencode);
repl(String(F("&euro;")), String(euro), s, useURLencode);
repl(String(F("{Y}")), String(yen), s, useURLencode);
repl(String(F("&yen;")), yen, s, useURLencode);

Closing due to user error.

TD-er · 2019-03-15T20:56:22Z

I really don't understand what's wrong here.
I am changing s, which is a String object.
The others are just const types and the compiler should perform the conversion to String when I hand it a char*
I can make it a template function, but then the compiler will create 2 (or more) implementations of the same function that's being templated.
And this is just one example. I guess the code we have is filled with these kind of calls.

devyte · 2019-03-15T21:18:31Z

The others are just const types and the compiler should perform the conversion to String

You have this:

void repl(const String& key, const String& val, String& s, boolean useURLencode)

The arg types of key and val are not String, but references to String (String &). The compiler can't convert to a reference, because there is no object that the reference can point to.
You can expect the compiler to convert a const char * to a String, but not to a String *. This is similar: the compiler should not be converting to String &. Like I said, I don't know how this even builds.

If you want to go the template repl way, it may be better like so:

void repl<String>
template <typename keyType, typename valType>
void repl(const keyType &key, const valType &val, bool useURLencode)
{
  if (useURLencode) {
    s.replace(String(key), URLEncode(val.c_str()));
  } else {
    s.replace(String(key), String(val));
  }
}

What does URLEncode() return? If something other than a String, then it needs a wrapper with a temp String obj as well.

There might be a way to avoid the duplication due to the temp Strings by using std::move, but I'm not sure, I don't think I've ever seen move semantics for String being used.

TD-er · 2019-03-15T21:29:20Z

Is there some way to make the compiler not accept the automatic casts from char* (or other variants) to a String& ?
Otherwise I have to manually look for all calls to functions which accept a String& or const String&
And that's probably going to take weeks and then there will still instances be missed.
In our code that's about 500 function calls using String&, so each and every call to those should be checked.

devyte · 2019-03-15T21:43:12Z

I said:

I don't know how this even builds

I'd have to investigate why your code is building at all, and I won't have the bandwidth until the end of the month.

One thing you could do is search for all functions that take a String & as part of its definition. Once you have function signatures, you could:

change them (e.g.: add an unused arg), and look for all the places that fail to compile (i.e.: the 500 places you mentioned where the functions get called), then fix each one as appropriate. That shouldn't really take too long if you use appropriate tools.
Remove the reference from the function to convert to pass-by-value. Except for some obscure cases, this should "just work", but depending on the call stack, and how the functions get called, RAM usage and heap fragmentation could increase a bit.

TD-er · 2019-03-17T20:44:16Z

I've been looking into the changes of the String class: https://github.com/esp8266/Arduino/pull/5690/files

Also I do not agree that this is not a bug, since it is a bug.
The compiler does call the constructor of the String class (constructor with a single argument), which means in my example it will call the constructor String(const char *cstr = "");

Arduino/cores/esp8266/WString.cpp

Lines 32 to 36 in d19fc3b

    
           String::String(const char *cstr) { 
        
               init(); 
        
               if(cstr) 
        
                   copy(cstr, strlen(cstr)); 
        
           }

The code called in init() doesn't do that much exciting stuff.

But the copy does:

Arduino/cores/esp8266/WString.cpp

Lines 203 to 211 in d19fc3b

    
           String & String::copy(const char *cstr, unsigned int length) { 
        
               if(!reserve(length)) { 
        
                   invalidate(); 
        
                   return *this; 
        
               } 
        
               setLen(length); 
        
               strcpy(wbuffer(), cstr); 
        
               return *this; 
        
           }

This is calling reserve(), which calls changeBuffer():

Arduino/cores/esp8266/WString.cpp

Lines 144 to 197 in d19fc3b

    
           unsigned char String::reserve(unsigned int size) { 
        
               if(buffer() && capacity() >= size) 
        
                   return 1; 
        
               if(changeBuffer(size)) { 
        
                   if(len() == 0) 
        
                       wbuffer()[0] = 0; 
        
                   return 1; 
        
               } 
        
               return 0; 
        
           } 
        
           unsigned char String::changeBuffer(unsigned int maxStrLen) { 
        
               // Can we use SSO here to avoid allocation? 
        
               if (maxStrLen < sizeof(sso_buf)) { 
        
                   if (sso() || !buffer()) { 
        
                       // Already using SSO, nothing to do 
        
                       setSSO(true); 
        
                       return 1; 
        
                   } else { // if bufptr && !sso() 
        
                       // Using bufptr, need to shrink into sso_buff 
        
                       char temp[sizeof(sso_buf)]; 
        
                       memcpy(temp, buffer(), maxStrLen); 
        
                       free(wbuffer()); 
        
                       setSSO(true); 
        
                       memcpy(wbuffer(), temp, maxStrLen); 
        
                       return 1; 
        
                   } 
        
               } 
        
               // Fallthrough to normal allocator 
        
               size_t newSize = (maxStrLen + 16) & (~0xf); 
        
               // Make sure we can fit newsize in the buffer 
        
               if (newSize > CAPACITY_MAX) { 
        
                   return false; 
        
               } 
        
               uint16_t oldLen = len(); 
        
               char *newbuffer = (char *) realloc(sso() ? nullptr : wbuffer(), newSize); 
        
               if(newbuffer) { 
        
                   size_t oldSize = capacity() + 1; // include NULL. 
        
                   if (sso()) { 
        
                       // Copy the SSO buffer into allocated space 
        
                       memcpy(newbuffer, sso_buf, sizeof(sso_buf)); 
        
                   } 
        
                   if (newSize > oldSize) 
        
                   { 
        
                       memset(newbuffer + oldSize, 0, newSize - oldSize); 
        
                   } 
        
                   setSSO(false); 
        
                   setCapacity(newSize - 1); 
        
                   setLen(oldLen); // Needed in case of SSO where len() never existed 
        
                   setBuffer(newbuffer); 
        
                   return 1; 
        
               } 
        
               return 0; 
        
           }

Not sure what would be done when not in SSO mode, but to me it looks like this internal SSO-buffer should be initialized with zeroes first before the tekst is copied.
The copy does only copy the data and not the terminating 0. (using memcpy and not strcpy)

Arduino/cores/esp8266/WString.cpp

Lines 164 to 168 in d19fc3b

    
           char temp[sizeof(sso_buf)]; 
        
           memcpy(temp, buffer(), maxStrLen); 
        
           free(wbuffer()); 
        
           setSSO(true); 
        
           memcpy(wbuffer(), temp, maxStrLen);

N.B. the use of memcpy is the right choice here, but then you must make sure to have the next byte 0-terminated.
Also the reserve does only take into account the length computed by strlen, which does not include the 0-terminator position, so I guess that may also lead to some issues sometimes. (have to check)

I have to check it first, but I guess that initializing the array at line 164 with zeroes will get rid of these problems.
For example:

 char temp[sizeof(sso_buf)] = {0};

It should also lead to some issues when just creating a String with a const char* as argument.
And then one with less than the size of the SSO block.

TD-er · 2019-03-17T22:13:11Z

Just made a quick test and this does fix this issue:

diff --git a/cores/esp8266/WString.cpp b/cores/esp8266/WString.cpp
index 4d4fa3b..8a62e2f 100644
--- a/cores/esp8266/WString.cpp
+++ b/cores/esp8266/WString.cpp
@@ -129,6 +129,7 @@ String::~String() {
 // /*********************************************/

 inline void String::init(void) {
+    memset(sso_buf, 0, sizeof(sso_buf));
     setSSO(false);
     setCapacity(0);
     setLen(0);
@@ -161,11 +162,11 @@ unsigned char String::changeBuffer(unsigned int maxStrLen) {
             return 1;
         } else { // if bufptr && !sso()
             // Using bufptr, need to shrink into sso_buff
-            char temp[sizeof(sso_buf)];
+            char temp[sizeof(sso_buf)] = {0};
             memcpy(temp, buffer(), maxStrLen);
             free(wbuffer());
             setSSO(true);
-            memcpy(wbuffer(), temp, maxStrLen);
+            memcpy(wbuffer(), temp, sizeof(sso_buf));
             return 1;
         }
     }

I will try to look into a few more places of the code and see if some similar issues arise and als try to make some tests that are reproducible for this.
Then I will make a PR for it.

TD-er · 2019-03-17T23:12:49Z

When looking at the other parts of the code, I noticed these functions to convert a float or double to a String:

Arduino/cores/esp8266/WString.cpp

Lines 111 to 121 in d19fc3b

    
           String::String(float value, unsigned char decimalPlaces) { 
        
               init(); 
        
               char buf[33]; 
        
               *this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf); 
        
           } 
        
           String::String(double value, unsigned char decimalPlaces) { 
        
               init(); 
        
               char buf[33]; 
        
               *this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf); 
        
           }

There is no check in here for the max. number of decimal places.
So unless dtostrf does this check, it could be that calling it with 31 (or more) decimals will lead to some issues. (not a 0 terminator any more or out of bound issue)

A fix for some issue introduced in PR esp8266#5690 See discussion in esp8266#5883

TD-er · 2019-03-18T08:25:18Z

Another note on the conversions made by the compiler.
See section 12.3 Conversions in the C++ standard and then 12.3.1.

Thing is, the closing of this issue made me doubt about my C++ insights, so I asked a former colleague of mine, who quotes C++ standard section numbers like some quote bible verses and he assured me it was indeed a constructor being called.
So then I started to look further into the code of the String class and its recent changes.
Also mainly because I'm trying to be as lazy as can be, not willing to alter numerous lines of my own code.

Maybe @earlephilhower can also have a look, since he made the SSO feature. (which I like a lot by the way)

Fixes esp8266#5883 and supercedes esp8266#5890 The replace() function was using len() while in the middle of buffer operations. In SSO mode len() is not stored separately and is a call to strlen(), which may not be legal if you're in the middle of overwriting the SSO buffer, as was the case in ::replace when the replacement string was longer than the find string. This caused potential garbage at the end of the string when accessed. Instead, just cache the length in a local while doing the operation. Also make setLength() explicitly write a 0 in the SSO buffer at the specific offset. It's probably not needed, but is safe to do and makes logical sense. Add in test cases from esp8266#5890 as well as some new ones that fail on the unmodified core.

* Fix String::replace() Fixes #5883 and supercedes #5890 The replace() function was using len() while in the middle of buffer operations. In SSO mode len() is not stored separately and is a call to strlen(), which may not be legal if you're in the middle of overwriting the SSO buffer, as was the case in ::replace when the replacement string was longer than the find string. This caused potential garbage at the end of the string when accessed. Instead, just cache the length in a local while doing the operation. Add in test cases from #5890 as well as some new ones that fail on the unmodified core. * Fix stack smashing error on 64b When pointers are 8 bytes long, the size of a String is larger than 16 chars. Increase the allocated array we're using in the test to avoid a "stack smashing" error. * Manually call destructor in test Just for clarity, manually call the destructor for the Strings() that are "placement new'd" in the String tests. It is a no-op for the existing test, since thanks to SSO there are no memory allocations, but will help in case someone adds tests later which include longer strings.

devyte closed this as completed Mar 15, 2019

TD-er added a commit to TD-er/Arduino that referenced this issue Mar 18, 2019

Fix Small String Optimization misplaced 0 terminator

db88c82

A fix for some issue introduced in PR esp8266#5690 See discussion in esp8266#5883

TD-er mentioned this issue Mar 18, 2019

Fix Small String Optimization misplaced 0 terminator #5890

Closed

earlephilhower mentioned this issue Mar 20, 2019

Fix String::replace() #5897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 17, 2019 •

edited

Loading

TD-er commented Mar 17, 2019 •

edited

Loading

TD-er commented Mar 17, 2019

TD-er commented Mar 18, 2019

Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

Some strings seem to have bogus data at the end (core 2.6.0 vs. core 2.4.2) #5883

Comments

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 15, 2019

devyte commented Mar 15, 2019

TD-er commented Mar 17, 2019 • edited Loading

TD-er commented Mar 17, 2019 • edited Loading

TD-er commented Mar 17, 2019

TD-er commented Mar 18, 2019

TD-er commented Mar 17, 2019 •

edited

Loading

TD-er commented Mar 17, 2019 •

edited

Loading