-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_csv() ignores na_filter=False for index columns #7518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll mark it as a bug, but the 2nd soln looks fine to me. Trying to have the parser do too much is in general a problem IMHO. |
@jreback, the parser already knows how to distinguish NaNs, or not to distinguish them, right? Isn't that what The obvious user expectation is that |
I marked it as a bug. You are welcome to do a pull-request. My point was that their are close to 50 options for the parser, so their are obviously some untested paths. |
This bug has been fixed and the issue can be closed. |
@gfyoung do we have a test for this? |
Using 0.14.0.
pandas.io.parsers.read_csv
is supposed to ignore blank-looking values ifna_filter=False
, but it does not do this forindex_col
columns.foo.csv:
The default behavior gives a dataframe with a NaN in place of the empty value from this last row:
This gives the same dataframe with a blank string instead of a NaN. So far so good:
My expectation was that this next version would give a dataframe with no NaN values in the index, but it does not:
Because it unexpectedly includes NaNs, I've been fighting with issue 4862 in
unstack
for hours :-(.In order to get the desired behavior, a DF with no NaNs in the index, I have to read the data without a multi-index, then
set_index
afterwards:As a temporary fix, perhaps the documentation ought to clarify the behavior of
na_filter
with respect toindex_col
.The text was updated successfully, but these errors were encountered: