Skip to content

Commit ff652a5

Browse files
gfyoungjreback
authored andcommitted
BUG: Patch handling no NA values in TextFileReader
When cleaning `na_values` during initialization of `TextFileReader`, we return a `list` whenever we specify that `na_values` should be empty. However, the rest of the code expects a `set`. Closes #15835. Author: gfyoung <[email protected]> Closes #15881 from gfyoung/keep-default-na-excel and squashes the following commits: 0bb6f64 [gfyoung] BUG: Patch handling no NA values in TextFileReader
1 parent cd51bdd commit ff652a5

File tree

3 files changed

+12
-2
lines changed

3 files changed

+12
-2
lines changed

doc/source/whatsnew/v0.20.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -995,6 +995,7 @@ I/O
995995
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
996996
- Bug in ``pd.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
997997
- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)
998+
- Bug in ``pd.read_csv()`` when an index was specified and no values were specified as null values (:issue:`15835`)
998999
- Added checks in ``pd.read_csv()`` ensuring that values for ``nrows`` and ``chunksize`` are valid (:issue:`15767`)
9991000
- Bug in ``pd.tools.hashing.hash_pandas_object()`` in which hashing of categoricals depended on the ordering of categories, instead of just their values. (:issue:`15143`)
10001001
- Bug in ``.to_json()`` where ``lines=True`` and contents (keys or values) contain escaped characters (:issue:`15096`)

pandas/io/parsers.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2890,7 +2890,7 @@ def _clean_na_values(na_values, keep_default_na=True):
28902890
if keep_default_na:
28912891
na_values = _NA_VALUES
28922892
else:
2893-
na_values = []
2893+
na_values = set()
28942894
na_fvalues = set()
28952895
elif isinstance(na_values, dict):
28962896
na_values = na_values.copy() # Prevent aliasing.

pandas/tests/io/parser/na_values.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
import pandas.io.parsers as parsers
1212
import pandas.util.testing as tm
1313

14-
from pandas import DataFrame, MultiIndex
14+
from pandas import DataFrame, Index, MultiIndex
1515
from pandas.compat import StringIO, range
1616

1717

@@ -303,3 +303,12 @@ def test_na_values_uint64(self):
303303
expected = DataFrame([[str(2**63), 1], ['', 2]])
304304
out = self.read_csv(StringIO(data), header=None)
305305
tm.assert_frame_equal(out, expected)
306+
307+
def test_empty_na_values_no_default_with_index(self):
308+
# see gh-15835
309+
data = "a,1\nb,2"
310+
311+
expected = DataFrame({'1': [2]}, index=Index(["b"], name="a"))
312+
out = self.read_csv(StringIO(data), keep_default_na=False, index_col=0)
313+
314+
tm.assert_frame_equal(out, expected)

0 commit comments

Comments
 (0)