Skip to content

BUG: Check for NaN after data conversion to numeric #13314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented May 28, 2016

In an attempt to squash a Python parser bug in which weirdly-formed floats weren't being checked for nan, the bug was traced back to a bug in the maybe_convert_numeric function of pandas/src/inference.pyx. Added tests for the bug in test_lib.py and adjusted the original nan tests in na_values.py to test all of the engines.

@gfyoung gfyoung force-pushed the nan-check-post-numeric-conversion branch from 5a15c7f to 9e17995 Compare May 28, 2016 22:47
@codecov-io
Copy link

codecov-io commented May 29, 2016

Current coverage is 84.22%

Merging #13314 into master will not change coverage

@@             master     #13314   diff @@
==========================================
  Files           138        138          
  Lines         50667      50667          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42672      42672          
  Misses         7995       7995          
  Partials          0          0          

Powered by Codecov. Last updated by 70be8a9...9e17995

@@ -291,6 +291,7 @@ Bug Fixes



- Bug in ``pd.read_csv()`` with ``engine='python'`` in which NaN values weren't being detected after data was converted to numeric values (:issue:`13314`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks around NaN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv labels May 29, 2016
@jreback jreback added this to the 0.18.2 milestone May 29, 2016
@jreback
Copy link
Contributor

jreback commented May 29, 2016

I suspect tools/tests/test_util.py for to_numeric might also be affected, can you add a test there as well?

Can you run related perf tests just to see if any effect.

@gfyoung
Copy link
Member Author

gfyoung commented May 29, 2016

No need I think: to_numeric doesn't pass in any NaN values to the funxtion so this branch has no effect.

@gfyoung
Copy link
Member Author

gfyoung commented May 29, 2016

Ran perf tests on windows and didn't see any noticeable changes

@gfyoung gfyoung force-pushed the nan-check-post-numeric-conversion branch from 9e17995 to 07f0538 Compare May 29, 2016 22:25
@gfyoung
Copy link
Member Author

gfyoung commented May 29, 2016

@jreback : Made the requested changes, and Travis is giving the green light. Ready to merge if there are no other concerns.

@jreback jreback closed this in 721be62 May 30, 2016
@gfyoung gfyoung deleted the nan-check-post-numeric-conversion branch May 30, 2016 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants