Skip to content

Commit 64b88e8

Browse files
troelsgfyoung
authored andcommitted
BUG: read_table and read_csv crash (#22750)
A missing null-pointer check made read_table and read_csv prone to crash on badly encoded text. Add null-pointer check. Closes gh-22748.
1 parent 2ab57b2 commit 64b88e8

File tree

3 files changed

+15
-1
lines changed

3 files changed

+15
-1
lines changed

doc/source/whatsnew/v0.24.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -756,6 +756,7 @@ I/O
756756

757757
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
758758
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
759+
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)
759760
- :func:`read_csv()` will correctly parse timezone-aware datetimes (:issue:`22256`)
760761
- :func:`read_sas()` will parse numbers in sas7bdat-files that have width less than 8 bytes correctly. (:issue:`21616`)
761762
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)

pandas/_libs/src/parser/io.c

+5-1
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,11 @@ void *buffer_rd_bytes(void *source, size_t nbytes, size_t *bytes_read,
150150
return NULL;
151151
} else if (!PyBytes_Check(result)) {
152152
tmp = PyUnicode_AsUTF8String(result);
153-
Py_XDECREF(result);
153+
Py_DECREF(result);
154+
if (tmp == NULL) {
155+
PyGILState_Release(state);
156+
return NULL;
157+
}
154158
result = tmp;
155159
}
156160

pandas/tests/io/parser/common.py

+9
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import sys
1010
from datetime import datetime
1111
from collections import OrderedDict
12+
from io import TextIOWrapper
1213

1314
import pytest
1415
import numpy as np
@@ -1609,3 +1610,11 @@ def test_skip_bad_lines(self):
16091610
val = sys.stderr.getvalue()
16101611
assert 'Skipping line 3' in val
16111612
assert 'Skipping line 5' in val
1613+
1614+
def test_buffer_rd_bytes_bad_unicode(self):
1615+
# Regression test for #22748
1616+
t = BytesIO(b"\xB0")
1617+
if PY3:
1618+
t = TextIOWrapper(t, encoding='ascii', errors='surrogateescape')
1619+
with pytest.raises(UnicodeError):
1620+
pd.read_csv(t, encoding='UTF-8')

0 commit comments

Comments
 (0)