-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
index_col and usecols do not work reliably together in read_csv #9098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👍 |
I wish we could fix this issue. I would like to contribute but the huge codebase looks intimidating. Here is what I noticed so far:
And here is the reduced reproduction test case for (4): import pandas as pd
from io import StringIO
data = """\
Gene Control_1 Control_2 Tumour_1 Tumour_2
TP53 6 6 7 6
BRCA2 6 7 7 9\
"""
expected_result = {
'Control_1': {'TP53': 6, 'BRCA2': 6},
'Control_2': {'TP53': 6, 'BRCA2': 7},
}
# when index_col is in usecols:
df = pd.read_table(StringIO(data), usecols=[0, 1, 2], index_col=0, header=0)
assert df.to_dict() == expected_result # evaluates to True and passes :)
# when index_col is not in usecols:
df = pd.read_table(StringIO(data), usecols=[1, 2], index_col=0, header=0)
assert df.to_dict() == expected_result # fails :( For the latter case df is malformed:
and Here is my question (@jreback ?): is the latter case (when |
is your example on master? we have had a number of fixes related to cc @gfyoung |
I just cloned the repo and tested with pd.read_table(StringIO(data), usecols=[1, 2], index_col=0, header=0) still does not work as I would expect. Importantly the test cases from #12408 and from this issue run perfectly fine on version from master :) |
so what cases could we close if we have some validation tests? want to do a PR? |
Then again, given the confusion, we should either make this clear that |
@gfyoung that is really what I started to think after I submitted my comment. Thanks for clarification. Probably a sentence or two in the docs could improve the situation significantly. |
@krassowski : Add the test anyways (more tests are good), just without |
Just ran into this and reported here: |
This code shows 3 situations.
here are the results
fun(123, 10, 4), an exception occurs but when
index_col
is ommitted and laterset_index
is used then it works ok.fun(123, 10, 5), this worked ok.
fun(123, 20, 4), this worked ok but it picked up the wrong value for the index
pandas.__version__
is '0.15.2'64 bit archlinux
$ python --version
Python 3.4.2
The text was updated successfully, but these errors were encountered: