BUG: Patch handling of keep_default_na=False #19260

gfyoung · 2018-01-16T00:22:46Z

Patches very buggy behavior of keep_default_na=False whenever na_values is a dict

Respect keep_default_na for column that doesn't exist in na_values dictionary
Don't crash / break when na_value is a scalar in the na_values dictionary.

In addition, clarifies documentation on the handling of the keep keep_default_na parameter with respect to na_filter and na_values.

Closes #19227.

cc @neilser

jreback · 2018-01-16T00:31:00Z

doc/source/io.rst

+    the default NaN values are used for parsing.
+  * If `keep_default_na` is False, and `na_values` are not specified, only
+    the NaN values specified `na_values` are used for parsing.
+  * If `keep_default_na` is False, and `na_values` are not specified, no


is this the same as the 3rd bullet?

No, that first line is a just copy-paste fail.

jreback · 2018-01-16T00:31:20Z

doc/source/io.rst

-  If na_values are specified and keep_default_na is ``False`` the default NaN
-  values are overridden, otherwise they're appended to.
+  Whether or not to include the default NaN values when parsing the data.
+  Provided that `na_filter` is True, depending on whether `na_values` is


the 2nd sentence is slightly confusing here

jreback · 2018-01-16T00:32:29Z

pandas/tests/io/parser/na_values.py

@@ -224,6 +224,47 @@ def test_na_values_keep_default(self):
                                  'seven']})
        tm.assert_frame_equal(xp.reindex(columns=df.columns), df)

+        # see gh-19227: keep_default_na=False should be enforced


can you split this test up, getting a bit long

jreback · 2018-01-16T00:44:57Z

doc/source/io.rst

-  If na_values are specified and keep_default_na is ``False`` the default NaN
-  values are overridden, otherwise they're appended to.
+  Whether or not to include the default NaN values when parsing the data.
+  Depending on whether `na_values` is passed in, the behavior is as follows:


jreback · 2018-01-16T00:45:44Z

pandas/io/parsers.py

@@ -3098,15 +3110,17 @@ def _clean_na_values(na_values, keep_default_na=True):
        na_fvalues = set()
    elif isinstance(na_values, dict):
        na_values = na_values.copy()  # Prevent aliasing.
-        if keep_default_na:
-            for k, v in compat.iteritems(na_values):


might be nice to have a comment or 2 to explain the logic here for future readers

Makes sense. Done.

codecov · 2018-01-16T08:11:33Z

Codecov Report

Merging #19260 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19260      +/-   ##
==========================================
+ Coverage   91.56%   91.56%   +<.01%     
==========================================
  Files         148      148              
  Lines       48874    48878       +4     
==========================================
+ Hits        44751    44755       +4     
  Misses       4123     4123

Flag	Coverage Δ
#multiple	`89.94% <100%> (ø)`	⬆️
#single	`41.69% <21.42%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.49% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04a6815...63faf8d. Read the comment docs.

jreback · 2018-01-16T11:22:19Z

pandas/io/parsers.py

-            (k, _floatify_na_values(v)) for k, v in na_values.items()  # noqa
-        )
+
+            na_values[k] = v


this is an anti-pattern to modify the dict that you are iterating. can you create you create a new one here?

@jreback : I realize that, but I think that's why up above, someone wrote na_values = na_values.copy(). The reference to the original input is destroyed and created a "new" dictionary. Do you just want me to change the assigned variable name?

yeah you should create a new empty dict and then assign to it (it doesn't have to be a copy). iterating and modifying is a no-no.

I've decided to create a copy because at the bottom, we return na_values and na_fvalues regardless of the logic branch. However, I'll iterate over the old_na_values instead.

Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.

gfyoung · 2018-01-17T19:28:56Z

@jreback : The OSX builds on Travis seemed to have stalled completely (all of the other tests have been done for a LONG time now).

jreback · 2018-01-18T00:23:20Z

thanks @gfyoung

gfyoung added Bug IO CSV read_csv, to_csv labels Jan 16, 2018

gfyoung added this to the 0.23.0 milestone Jan 16, 2018

jreback reviewed Jan 16, 2018

View reviewed changes

gfyoung force-pushed the na-values-csv branch from 8ef3ae5 to 9cc889c Compare January 16, 2018 00:42

jreback reviewed Jan 16, 2018

View reviewed changes

gfyoung force-pushed the na-values-csv branch from 9cc889c to dd5880f Compare January 16, 2018 01:05

jreback requested changes Jan 16, 2018

View reviewed changes

gfyoung force-pushed the na-values-csv branch from dd5880f to ecbe462 Compare January 17, 2018 04:41

gfyoung force-pushed the na-values-csv branch from ecbe462 to 63faf8d Compare January 17, 2018 05:22

jreback approved these changes Jan 18, 2018

View reviewed changes

jreback merged commit c9532f0 into pandas-dev:master Jan 18, 2018

gfyoung deleted the na-values-csv branch January 18, 2018 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Patch handling of keep_default_na=False #19260

BUG: Patch handling of keep_default_na=False #19260

gfyoung commented Jan 16, 2018 •

edited

Loading

jreback Jan 16, 2018

gfyoung Jan 16, 2018

jreback Jan 16, 2018

gfyoung Jan 16, 2018

jreback Jan 16, 2018

gfyoung Jan 16, 2018 •

edited

Loading

jreback Jan 16, 2018

jreback Jan 16, 2018

gfyoung Jan 16, 2018

codecov bot commented Jan 16, 2018 •

edited

Loading

jreback Jan 16, 2018

gfyoung Jan 16, 2018 •

edited

Loading

jreback Jan 16, 2018

gfyoung Jan 17, 2018

gfyoung commented Jan 17, 2018 •

edited

Loading

jreback commented Jan 18, 2018

BUG: Patch handling of keep_default_na=False #19260

BUG: Patch handling of keep_default_na=False #19260

Conversation

gfyoung commented Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 16, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

gfyoung Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Jan 17, 2018 • edited Loading

jreback commented Jan 18, 2018

gfyoung commented Jan 16, 2018 •

edited

Loading

gfyoung Jan 16, 2018 •

edited

Loading

codecov bot commented Jan 16, 2018 •

edited

Loading

gfyoung Jan 16, 2018 •

edited

Loading

gfyoung commented Jan 17, 2018 •

edited

Loading