Skip to content

Commit 10d3615

Browse files
Aliebcmroeschke
andauthored
BUG: Add type check for encoding_errors in pd.read_csv (pandas-dev#59075)
* BUG: Add type check for encoding_errors in pd.read_csv * BUG: Add type check for encoding_errors in pd.read_csv * pre-commit * Update pandas/io/parsers/readers.py Co-authored-by: Matthew Roeschke <[email protected]> * Unit test Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/io/parsers/readers.py Co-authored-by: Matthew Roeschke <[email protected]> * update the unit test for `encoding_errors` * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * add a unit test * update unit test * update unit test * update unit test * update unit test * Update pandas/tests/io/test_common.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/tests/io/test_common.py Co-authored-by: Matthew Roeschke <[email protected]> * update unit test * update unit test --------- Co-authored-by: Matthew Roeschke <[email protected]>
1 parent 42082a8 commit 10d3615

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -558,6 +558,7 @@ I/O
558558
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
559559
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
560560
- Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
561+
- Bug in :meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
561562
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
562563
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
563564
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)

pandas/io/parsers/readers.py

+8
Original file line numberDiff line numberDiff line change
@@ -674,6 +674,14 @@ def _read(
674674
# Extract some of the arguments (pass chunksize on).
675675
iterator = kwds.get("iterator", False)
676676
chunksize = kwds.get("chunksize", None)
677+
678+
# Check type of encoding_errors
679+
errors = kwds.get("encoding_errors", "strict")
680+
if not isinstance(errors, str):
681+
raise ValueError(
682+
f"encoding_errors must be a string, got {type(errors).__name__}"
683+
)
684+
677685
if kwds.get("engine") == "pyarrow":
678686
if iterator:
679687
raise ValueError(

pandas/tests/io/test_common.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,7 @@ def test_explicit_encoding(io_class, mode, msg):
555555
expected.to_csv(buffer, mode=f"w{mode}")
556556

557557

558-
@pytest.mark.parametrize("encoding_errors", [None, "strict", "replace"])
558+
@pytest.mark.parametrize("encoding_errors", ["strict", "replace"])
559559
@pytest.mark.parametrize("format", ["csv", "json"])
560560
def test_encoding_errors(encoding_errors, format):
561561
# GH39450
@@ -590,6 +590,17 @@ def test_encoding_errors(encoding_errors, format):
590590
tm.assert_frame_equal(df, expected)
591591

592592

593+
@pytest.mark.parametrize("encoding_errors", [0, None])
594+
def test_encoding_errors_badtype(encoding_errors):
595+
# GH 59075
596+
content = StringIO("A,B\n1,2\n3,4\n")
597+
reader = partial(pd.read_csv, encoding_errors=encoding_errors)
598+
expected_error = "encoding_errors must be a string, got "
599+
expected_error += f"{type(encoding_errors).__name__}"
600+
with pytest.raises(ValueError, match=expected_error):
601+
reader(content)
602+
603+
593604
def test_bad_encdoing_errors():
594605
# GH 39777
595606
with tm.ensure_clean() as path:

0 commit comments

Comments
 (0)