-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel with multi-indexed column ignores index_col=None #11733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We don't support writing to this format (multi-index columns w/ no row index) because it's ambiguous on the way back in. But it seems reasonable to support reading it, I'll take a look. |
I got the same problem |
From #15180 (comment) - consider changing default of |
@chris-b1 @jreback I have a proposed fix here: stephenrauch@1204b31 Before I do all the docs and stuff I wanna make sure I am headed in the right direction. Thanks. |
Per review comment from @jreback here is proposed api for read_excel. It is the same as read_csv.
Updates are here: So if this proposed API looks ok, I will do the PR. |
Thanks @stephenrauch, that api looks good to me, please open it as a PR. One subtlety for the docs, if not using a MultiIndex header,
|
@chris-b1 @jreback Are there any test cases (or maybe some other description) that show how |
Not sure csv actually works this way. But for current Excel behavior: parses a row index In [297]: pd.read_excel('temp.xlsx', sheetname='Sheet1')
Out[297]:
a b c
row1 1 2 4
row2 1 2 4
row3 1 2 4 No row index In [298]: pd.read_excel('temp.xlsx', sheetname='Sheet2')
Out[298]:
a b c
0 1 2 4
1 1 2 4
2 1 2 4 |
Has this problem been solved? I met the same question. |
As of now I still see the same issue. when using multi headers with read_excel, pandas always assigns the first column as index. |
Another vote for a fix. I'm running 0.22 and it seems so hackish to have to write the code to save the 'index' to a named column and then reset the index. Pandas is a 'great' module - thank you - |
Vote for index_col=False to fix this |
Just encountered this issue and I am looking for a fix. @stephenrauch has the PR been made for this? |
@rileymcdowell no PRs have been made but would welcome any if you are interested |
I'll put together a PR from @stephenrauch's work in the next couple of days. |
I've dug into this and ran into a decision point. Consider the following spreadsheet (Taken from the test suite). Right now, this is interpreted by the
The situation that brought me to find this github issue is that in this example, I expect cells
The test suite explicitly covers the existing functionality of the former. This behavior differs from that of the A workaround is to allow a sentinel value of Any thoughts about how best to tackle this? |
Hmm well I disagree since this representation matches what you'd see with a normal data frame representation, but regardless of opinions I think it just speaks to what @chris-b1 mentioned earlier that this is really ambiguous so there's not necessarily a right answer
Can you clarify this with an example? CSV doesn't have the concept of a merged cell like you have with the
Isn't Thanks for the investigation! |
@rileymcdowell I would agree with @WillAyd that the former of the two behaviors you describe is the most intuitive way to interpret an Excel file. If you can I would encourage you to submit a PR with this behavior. |
From SO: http://stackoverflow.com/questions/34020061/excel-to-pandas-dataframe-using-first-column-as-index
@chris-b1 another one on the multi-index excel issues .. :-)
Small test case: content of excel file:
gives:
It's not super clear in the formatting of the dataframe, but the [1, 1] is the index and [A, key] are seen as the level names of the multi-indexed columns.
The text was updated successfully, but these errors were encountered: