-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel skips single-column empty rows #40214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good - two requests below.
pandas/io/excel/_base.py
Outdated
@@ -598,6 +598,7 @@ def parse( | |||
skiprows=skiprows, | |||
nrows=nrows, | |||
na_values=na_values, | |||
skip_blank_lines=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not expose this parameter in read_excel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried that (#40095) but the behaviour when skip_blank_lines=True
isn't very useful---it only skips empty spreadsheet rows if the entire spreadsheet is contained in column A. The original intent of skip_blank_lines
was to skip \n
lines in CSV files, but not lines with empty data elements such as ""\n
or ,,,,\n
, so there isn't an equivalent feature for spreadsheets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can we make a comment about this and reference your original PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. Do you mean an inline comment in the code, or in the description of this pull request?
Not sure if this closes the original issue, as the parameter would have to be exposed in order to truly fix. pandas/pandas/io/parsers/python_parser.py Line 596 in fad96e1
|
@lithomas1 I'm nervous about breaking the CSV behaviour, because it seems deliberate. I found one test where an input line pandas/pandas/tests/io/parser/common/test_common_basic.py Lines 412 to 414 in 154026c
Maybe this behaviour should be documented better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; @lithomas1 - agreed that this PR does not fully close the issue as stated there, however I'm -0 on adding the skip_blank_lines argument to read_excel. read_csv needs this because you can't tell a blank line from e.g. ,,,,
from the resulting DataFrame; this does not exist for excel files and it is straightforward to find and drop blank lines with df.dropna(how='all')
.
@ahawryluk - can you merge master
@simonjayhawkins @jreback - This is a bit of an odd case, wanted to get some eyes to make sure it's being handled appropriately. This PR fixes the following bug (not a regression, AFAICT) on master: when reading an excel file, blank rows would be skipped if and only if a single column is being read. This PR makes it so that blank rows are never skipped. However, the previous workaround was to supply Since this is a rather localized case, my thinking is to have this be a bugfix for 1.3 and leave the regression in 1.1.x and 1.2.x unresolved. |
sgtm. does the OP need to change to not close the issue? |
@simonjayhawkins, thanks for taking a look at this. My preference is to close #39808 based on
But I'm happy to edit the PR if everyone prefers to leave #39808 open. |
The issue could be closed once that is agreed as they way forward on the issue itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. back to you @rhshadrach
for 1.3 |
Thanks @ahawryluk! I've left the issue open for now, but will put my thoughts there shortly. |
Ref #39808
This is the fix proposed in the discussion of #40095