Skip to content

Disallow creation of CFStrings from non-8bit c-strings #5165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 10, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Sources/CoreFoundation/CFString.c
Original file line number Diff line number Diff line change
Expand Up @@ -1294,6 +1294,12 @@ CF_PRIVATE CFStringRef __CFStringCreateImmutableFunnel3(
Boolean possiblyExternalFormat, Boolean tryToReduceUnicode, Boolean hasLengthByte, Boolean hasNullByte, Boolean noCopy,
CFAllocatorRef contentsDeallocator, UInt32 converterFlags) {

if (hasNullByte && !__CFStringEncodingIsSupersetOfASCII(encoding)) {
// Non-8bit encodings cannot be safely read as c-strings because they may contain many null bytes
// This was documented as invalid previously, but now we validate that eagerly here to prevent creating truncated strings or strings that incorrectly assume 8bit representation
HALT_MSG("CFStringCreateWithCString can only be called with an 8-bit encoding");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it might be possible that people are calling this with some non-8bit encoding, but only ASCII data, and getting away with it today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many encodings I suspect they couldn't get away with it (for example in UTF-16 ASCII text would have a lot of intermediate null bytes so depending on the endianness either string creation would fail (because there would be an odd number of bytes) or the string would be empty (because the first byte is a null byte)). So it's possible that someone could be relying on the failing (NULL return) or empty string behavior, but I think that's not super likely. It's possible that there's another encoding that we have that might work - I can double check - but I can't think of one off the top of my head that would behave correctly

}

CFMutableStringRef str = NULL;
CFVarWidthCharBuffer vBuf;
CFIndex size;
Expand Down Expand Up @@ -2232,6 +2238,10 @@ static inline const char * _CFStringGetCStringPtrInternal(CFStringRef str, CFStr

__CFAssertIsString(str);

// __CFStrHasNullByte(str) implies the string was created from a c-string
// All strings created from c-strings must be 8bit since c-strings are not possible with non-8bit encodings
// CFStringCreateWithCString validates that all strings created must have been created from bytes of an 8bit encoding, so __CFStrHasNullByte alone is sufficient here since it implies __CFStrIsEightBit
// For the non-null-terminated case, we must still validate that the underlying contents are an 8bit representation
if ((!requiresNullTermination && __CFStrIsEightBit(str)) || __CFStrHasNullByte(str)) {
// Note: this is called a lot, 27000 times to open a small xcode project with one file open.
// Of these uses about 1500 are for cStrings/utf8strings.
Expand Down