-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG, ENH: Improve infinity parsing for read_csv #13274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84f91d4
to
e9d3ac9
Compare
@@ -252,3 +252,4 @@ Bug Fixes | |||
- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`) | |||
|
|||
- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`) | |||
- Bug in ``pd.lib.maybe_convert_numeric`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a more user friendly example (the user doesn't see this function as its internal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'll rewrite it to refer to read_csv
and the Python engine.
ok looks reasonable. pls run parser asv's just to be sure (I doubt we have something that specifically has infs in it; you can add if you want). |
btw, not averse to signed inf (e.g. |
|
pls just run the current benchmarks - don't need a new one you don't need libpython on Linux |
I'm just running |
e9d3ac9
to
784f07a
Compare
784f07a
to
8455f53
Compare
@jreback : Added |
strcpy(inf_str, data); | ||
for ( ; *inf_ptr; ++inf_ptr) *inf_ptr = tolower(*inf_ptr); | ||
|
||
if (0 == strcmp(inf_str, "-inf")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't you using strcasecomp
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I just do include "headers/portable.h"
and then I can call it? That would indeed save me pain of this char
copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't do C very much :> but I think yes; though its included the same way as strcmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, fair enough. Will give it a shot and see. Otherwise, I'll ask Google.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not sure if this will be more clear to make a if/else if for len 3 and another for len 4 (I 'c' lots of c code doing things like this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Can break up for clarity.
8455f53
to
9d60901
Compare
1) Python infinity parsing bug Initially an attempt to fix a Python parsing bug of mixed-case infinity strings, the bug was traced back via lib.maybe_convert_numeric to the 'floatify' method in pandas/src/parse_helper.h. In addition to correcting the bug and adding tests for it, this commit also moves the infinity-parsing test from CParser-only to common. 2) Interpret '+inf' as positive infinity This is consistent with the Python API, where float('+inf') is interpreted as positive infinity.
9d60901
to
f37b130
Compare
lgtm, ping on green. we don't have an associated issue right? |
Nope. Discovered it when I was refactoring |
@jreback : Travis giving the green light . Ready to merge if there is nothing else. |
nice PR @gfyoung only comment is that don't put Bug Fixes at the very end, causes conflicts, stick them somewhat randomly in the spaces within the current list (that's why I leave the space) |
@jreback : Ah, fair point. Bad habit on my part. Thanks for the reminder! |
Bug was traced back via
lib.maybe_convert_numeric
to thefloatify
function inpandas/src/parse_helper.h
. In addition to correcting the bug and adding tests for it, this PR also moves thetest_inf_parsing
test fromc_parser_only.py
tocommon.py
in thepandas/io/tests/parser
dir.+inf
as positive infinity for both enginesfloat('+inf')
in Python is interpreted as positive infinity, so we should allow it too in parsing.