-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel with openpyxl and missing dimension #39486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This approach can be used along with calling |
This can probably be a follow up issue, but running this patch on the openpyxl generates 39 rows vs 9 rows for xlrd. The first 9 rows seem equivalent for both. The latter rows with openpyxl are all NaNs. |
Agreed @asishm - I think that is the cause for #39181 as well, and my guess is that adding a call to |
My previous comment was not correct - if there are empty cells then we still get null values coming through and need to trim, e.g. the following would result in a DataFrame with rows 7 through 17 np.nan. I've modified the added
|
…enpyxl_header � Conflicts: � doc/source/whatsnew/v1.2.2.rst
Current minimum version for openpyxl is 2.6.0, this patch would need 2.6.1 (reset_dimensions doesn't exist before this, accesses protected members). Is increasing the minimum version here okay to do for 1.2.2? cc @jreback @simonjayhawkins |
yep that's fine (just update all the locations, install, show_versions, and put it in the whatsnew note) |
can you rebase |
@jreback I've merged master and increased the minimum version for openpyxl. |
:-< |
Thanks @jreback, turns out that this needs openpyxl 3.0.0. Prior versions still suffer from not reading the full excel file even after calling |
can we easily fallback to behavior if older version of openpyxl is installed? (just because this is a minor release) |
Yes - good idea and I think it should be easy enough. Because I don't understand what happened between openpyxl 2.6.4 and 3.0.0 (there are no release notes for the 3.0.0 that I can find), my thought is to not modify behavior if openpyxl < 3.0.0. If that is the case, then what should be done with the minimum version here? Keep at 2.6.0 until pandas 1.3 (or 2.0)? |
yes exactly so we don't actually bump the min in 1.2.2 (or just bump to 2.6.1) and just fallback (even if slow / whatever), and make the bump for 1.3 |
@jreback - changes made. One thing I noticed is that the minimum version in |
thanks @rhshadrach happy to have the min version bump for 1.3 |
@meeseeksdev backport 1.2.x |
Targeted for 1.2.2 because this resolves an issue that occurs from the change of default engine from xlrd to openpyxl in 1.2.