-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_excel failed with empty rows after MultiIndex header #40649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahawryluk, can you update the title to a more meaningful one?
@dsaxton thanks for catching those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahawryluk - looks good. A few questions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, cc @jreback
@@ -554,7 +554,9 @@ def parse( | |||
header_name, _ = pop_header_name(data[row], index_col) | |||
header_names.append(header_name) | |||
|
|||
has_index_names = is_list_like(header) and len(header) > 1 | |||
has_index_names = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what the inference rules are
@@ -702,7 +702,7 @@ cdef class TextReader: | |||
ic = (len(self.index_col) if self.index_col | |||
is not None else 0) | |||
|
|||
if lc != unnamed_count and lc - ic > unnamed_count: | |||
if (lc != unnamed_count and lc - ic > unnamed_count) or ic == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a similar comment to below (e.g. for future readers what are the inference rules)
columns = MultiIndex.from_tuples([("a", "A"), ("b", "B")]) | ||
expected = DataFrame(data, columns=columns) | ||
data = "a,b\nA,B\n,\n1,2\n3,4" | ||
result = parser.read_csv(StringIO(data), header=[0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with index_col
not None (do we already tests this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these two tests both have MultiIndex
headers and index_col
not None:
pandas/pandas/tests/io/parser/test_header.py
Line 126 in 526d52f
def test_header_multi_index(all_parsers): |
pandas/pandas/tests/io/parser/test_header.py
Line 219 in 526d52f
def test_header_multi_index_common_format1(all_parsers, kwargs): |
@jreback I've added the new comments; let me know if anything else is needed. Thanks |
thanks @ahawryluk very nice |
Prior to this fix, a blank data row after a MultiIndex header was a interpreted as containing a blank index name, but that only works if the user has specified an index column. If index_col is None all subsequent rows should be treated as data, even if the first one is empty.