-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Patch handling of keep_default_na=False #19260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/source/io.rst
Outdated
the default NaN values are used for parsing. | ||
* If `keep_default_na` is False, and `na_values` are not specified, only | ||
the NaN values specified `na_values` are used for parsing. | ||
* If `keep_default_na` is False, and `na_values` are not specified, no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same as the 3rd bullet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that first line is a just copy-paste fail.
doc/source/io.rst
Outdated
If na_values are specified and keep_default_na is ``False`` the default NaN | ||
values are overridden, otherwise they're appended to. | ||
Whether or not to include the default NaN values when parsing the data. | ||
Provided that `na_filter` is True, depending on whether `na_values` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 2nd sentence is slightly confusing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pandas/tests/io/parser/na_values.py
Outdated
@@ -224,6 +224,47 @@ def test_na_values_keep_default(self): | |||
'seven']}) | |||
tm.assert_frame_equal(xp.reindex(columns=df.columns), df) | |||
|
|||
# see gh-19227: keep_default_na=False should be enforced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split this test up, getting a bit long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
8ef3ae5
to
9cc889c
Compare
If na_values are specified and keep_default_na is ``False`` the default NaN | ||
values are overridden, otherwise they're appended to. | ||
Whether or not to include the default NaN values when parsing the data. | ||
Depending on whether `na_values` is passed in, the behavior is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
pandas/io/parsers.py
Outdated
@@ -3098,15 +3110,17 @@ def _clean_na_values(na_values, keep_default_na=True): | |||
na_fvalues = set() | |||
elif isinstance(na_values, dict): | |||
na_values = na_values.copy() # Prevent aliasing. | |||
if keep_default_na: | |||
for k, v in compat.iteritems(na_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to have a comment or 2 to explain the logic here for future readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
9cc889c
to
dd5880f
Compare
Codecov Report
@@ Coverage Diff @@
## master #19260 +/- ##
==========================================
+ Coverage 91.56% 91.56% +<.01%
==========================================
Files 148 148
Lines 48874 48878 +4
==========================================
+ Hits 44751 44755 +4
Misses 4123 4123
Continue to review full report at Codecov.
|
(k, _floatify_na_values(v)) for k, v in na_values.items() # noqa | ||
) | ||
|
||
na_values[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an anti-pattern to modify the dict that you are iterating. can you create you create a new one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : I realize that, but I think that's why up above, someone wrote na_values = na_values.copy()
. The reference to the original input is destroyed and created a "new" dictionary. Do you just want me to change the assigned variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you should create a new empty dict and then assign to it (it doesn't have to be a copy). iterating and modifying is a no-no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to create a copy because at the bottom, we return na_values
and na_fvalues
regardless of the logic branch. However, I'll iterate over the old_na_values
instead.
dd5880f
to
ecbe462
Compare
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
ecbe462
to
63faf8d
Compare
@jreback : The OSX builds on Travis seemed to have stalled completely (all of the other tests have been done for a LONG time now). |
thanks @gfyoung |
Patches very buggy behavior of
keep_default_na=False
wheneverna_values
is a dictkeep_default_na
for column that doesn't exist inna_values
dictionaryna_value
is a scalar in thena_values
dictionary.In addition, clarifies documentation on the handling of the keep
keep_default_na
parameter with respect tona_filter
andna_values
.Closes #19227.
cc @neilser