BUG: Parse trailing NaN values for the Python parser #13320

gfyoung · 2016-05-29T22:35:20Z

Fixes bug in which the Python parser failed to detect trailing NaN values in rows

codecov-io · 2016-05-30T02:25:01Z

Current coverage is 84.22%

Merging #13320 into master will decrease coverage by <.01%

@@             master     #13320   diff @@
==========================================
  Files           138        138          
  Lines         50710      50667    -43   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42712      42672    -40   
+ Misses         7998       7995     -3   
  Partials          0          0

Powered by Codecov. Last updated by 8bbd2bc...7515029

jreback · 2016-05-30T14:22:21Z

doc/source/whatsnew/v0.18.2.txt

@@ -296,6 +296,7 @@ Bug Fixes


 - Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)
+- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing NaN values were not being parsed (:issue:`13320`)


double backticks around NaN

jreback · 2016-05-30T14:23:16Z

rebase after #13325

jreback · 2016-05-30T14:43:27Z

ok merged #13325 so go ahead and rebase this and na_filter one

gfyoung · 2016-05-30T14:45:36Z

Sounds good. Don't have access to my computer at the moment, but will do ASAP.

jreback · 2016-05-31T13:11:24Z

pandas/src/inference.pyx

@@ -1132,15 +1132,15 @@ def map_infer(ndarray arr, object f, bint convert=1):
    return result


-def to_object_array(list rows):
+def to_object_array(list rows, int width=0):
    cdef:


can you add a doc-string. what exactly does width do?

Doc-string added

jreback · 2016-05-31T20:31:29Z

pandas/src/inference.pyx

    cdef:
        Py_ssize_t i, j, n, k, tmp
        ndarray[object, ndim=2] result
        list row

    n = len(rows)

-    k = 0
+    k = width


actually I would make width=None the default. If its specified, then you can skip the first part of the check which is just to figure out k.

Why? That's just asking for data loss (i.e. when k < len(row)). Having width=0 as default also helps to keep the checking largely intact from before.

maybe you misunderstand. This ONLY matters if width is NOT NONE. If its None then you compute as now. The point is you a) you are not testing if width exceeds the width of the structure or what happens if its less. (so I guess you can mitigate to mean at least themax(width or 0, max_found_width)

The former case is the whole point of this PR.

You didn't address the data loss issue when k < len(row)

well what should you do here? e.g. you are specifying width, which could be less < len(row).?

I thought the doc-string I added would illuminate the meaning of width and the purpose it serves?

gfyoung · 2016-06-01T03:01:54Z

@jreback : Rebased and Travis is happy. Ready to merge if there are no other concerns.

jreback · 2016-06-01T11:14:00Z

thanks!

gfyoung force-pushed the trailing-nan-conversion branch 3 times, most recently from f9db081 to 7515029 Compare May 29, 2016 23:34

jreback added Bug IO CSV read_csv, to_csv labels May 30, 2016

jreback added this to the 0.18.2 milestone May 30, 2016

jreback reviewed May 30, 2016
View reviewed changes

gfyoung force-pushed the trailing-nan-conversion branch 7 times, most recently from e8704ce to 7d72579 Compare May 30, 2016 23:52

jreback reviewed May 31, 2016
View reviewed changes

gfyoung force-pushed the trailing-nan-conversion branch 2 times, most recently from 20462b5 to f6f7cc2 Compare May 31, 2016 17:46

jreback reviewed May 31, 2016
View reviewed changes

BUG: Parse trailing NaN values for the Python parser

590874d

gfyoung force-pushed the trailing-nan-conversion branch from f6f7cc2 to 590874d Compare May 31, 2016 21:52

jreback closed this in 45bab82 Jun 1, 2016

gfyoung deleted the trailing-nan-conversion branch June 1, 2016 11:18

		@@ -296,6 +296,7 @@ Bug Fixes


		- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)
		- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing NaN values were not being parsed (:issue:`13320`)

Uh oh!

BUG: Parse trailing NaN values for the Python parser #13320

BUG: Parse trailing NaN values for the Python parser #13320

Uh oh!

Conversation

gfyoung commented May 29, 2016

Uh oh!

codecov-io commented May 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 84.22%

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented May 30, 2016

Uh oh!

jreback commented May 30, 2016

Uh oh!

gfyoung commented May 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung May 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung commented Jun 1, 2016

Uh oh!

jreback commented Jun 1, 2016

Uh oh!

Uh oh!

codecov-io commented May 30, 2016 •

edited

Loading

gfyoung May 31, 2016 •

edited

Loading