-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Parse trailing NaN values for the Python parser #13320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f9db081
to
7515029
Compare
Current coverage is 84.22%@@ master #13320 diff @@
==========================================
Files 138 138
Lines 50710 50667 -43
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42712 42672 -40
+ Misses 7998 7995 -3
Partials 0 0
|
@@ -296,6 +296,7 @@ Bug Fixes | |||
|
|||
|
|||
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`) | |||
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing NaN values were not being parsed (:issue:`13320`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
rebase after #13325 |
ok merged #13325 so go ahead and rebase this and |
Sounds good. Don't have access to my computer at the moment, but will do ASAP. |
e8704ce
to
7d72579
Compare
@@ -1132,15 +1132,15 @@ def map_infer(ndarray arr, object f, bint convert=1): | |||
return result | |||
|
|||
|
|||
def to_object_array(list rows): | |||
def to_object_array(list rows, int width=0): | |||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string. what exactly does width
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc-string added
20462b5
to
f6f7cc2
Compare
cdef: | ||
Py_ssize_t i, j, n, k, tmp | ||
ndarray[object, ndim=2] result | ||
list row | ||
|
||
n = len(rows) | ||
|
||
k = 0 | ||
k = width |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I would make width=None
the default. If its specified, then you can skip the first part of the check which is just to figure out k
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? That's just asking for data loss (i.e. when k < len(row)
). Having width=0
as default also helps to keep the checking largely intact from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you misunderstand. This ONLY matters if width
is NOT NONE. If its None
then you compute as now. The point is you a) you are not testing if width exceeds the width of the structure or what happens if its less. (so I guess you can mitigate to mean at least themax(width or 0, max_found_width)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The former case is the whole point of this PR.
- You didn't address the data loss issue when
k < len(row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well what should you do here? e.g. you are specifying width, which could be less < len(row).?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the doc-string I added would illuminate the meaning of width
and the purpose it serves?
f6f7cc2
to
590874d
Compare
@jreback : Rebased and Travis is happy. Ready to merge if there are no other concerns. |
thanks! |
Fixes bug in which the Python parser failed to detect trailing
NaN
values in rows