Skip to content

Commit 1b76440

Browse files
BUG: Add extra check for failing UTF-8 conversion (#32548)
1 parent 5e27d0a commit 1b76440

File tree

4 files changed

+12
-0
lines changed

4 files changed

+12
-0
lines changed

doc/source/whatsnew/v1.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ I/O
305305
timestamps with ``version="2.0"`` (:issue:`31652`).
306306
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`)
307307
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`)
308+
- Bug in :meth:`read_excel` where a UTF-8 string with a high surrogate would cause a segmentation violation (:issue:`23809`)
308309

309310

310311
Plotting

pandas/_libs/src/parse_helper.h

+3
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ int floatify(PyObject *str, double *result, int *maybe_int) {
3434
data = PyBytes_AS_STRING(str);
3535
} else if (PyUnicode_Check(str)) {
3636
tmp = PyUnicode_AsUTF8String(str);
37+
if (tmp == NULL) {
38+
return -1;
39+
}
3740
data = PyBytes_AS_STRING(tmp);
3841
} else {
3942
PyErr_SetString(PyExc_TypeError, "Invalid object type");
10.1 KB
Binary file not shown.

pandas/tests/io/excel/test_readers.py

+8
Original file line numberDiff line numberDiff line change
@@ -1044,3 +1044,11 @@ def test_excel_read_binary(self, engine, read_ext):
10441044

10451045
actual = pd.read_excel(data, engine=engine)
10461046
tm.assert_frame_equal(expected, actual)
1047+
1048+
def test_excel_high_surrogate(self, engine):
1049+
# GH 23809
1050+
expected = pd.DataFrame(["\udc88"], columns=["Column1"])
1051+
1052+
# should not produce a segmentation violation
1053+
actual = pd.read_excel("high_surrogate.xlsx")
1054+
tm.assert_frame_equal(expected, actual)

0 commit comments

Comments
 (0)