Skip to content

BUG: cannot open xlsx file with active filter #53238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
azhu26 opened this issue May 15, 2023 · 9 comments
Closed
2 of 3 tasks

BUG: cannot open xlsx file with active filter #53238

azhu26 opened this issue May 15, 2023 · 9 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs IO Excel read_excel, to_excel

Comments

@azhu26
Copy link

azhu26 commented May 15, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd;

pd.read_excel(excel_file, sheet_name=sheet, header=0);

Issue Description

I recently upgraded pandas to the latest 2.0.1 version. the problem appears after this upgrade: if the input excel file has an active filter (filtered on the header row and set some values to be filtered out). then the read_excel will throw exception: value must be either numerical or a string containing a wildcard. the filtered values contain blank cell.

if leaving the filter with "select all", then there is no exception from read_excel. so this is definitely from the new version of pandas. my previous version is 1.2.3 and I never had this problem before. (but the same issue may be raised in earlier version but don't know for sure)

Expected Behavior

the read_excel should be able to proceed without raising exception. if it can honor the excel filter, that is better.

Installed Versions

2.0.1

@azhu26 azhu26 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 15, 2023
@topper-123
Copy link
Contributor

Thanks for the report, @azhu26.

We need a working example in order to look into this. Can you upload a excel file with this issue.

@topper-123 topper-123 added IO Excel read_excel, to_excel Needs Info Clarification about behavior needed to assess issue labels May 15, 2023
@topper-123
Copy link
Contributor

Can you also make the title more descriptive.

@azhu26
Copy link
Author

azhu26 commented May 16, 2023

thank you for looking into this.

The problem occurs when the filter on "blank" cells, see attached example below.

Also, not sure how to change the title; when open the ticket to report, not aware a place to input the title though.

Please let me know if anything that I can help.

test-filter.xlsx

@topper-123 topper-123 changed the title BUG: BUG: cannot open xlsx file with active filter May 16, 2023
@topper-123
Copy link
Contributor

Ok thanks @azhu26.

I tried this on the dev branch with openpyxl v.3.0.10 and everything worked as intended. I then tried upgrading to openpyxl v3.1.2 and I got the error you describe ValueError: Value must be either numerical or a string containing a wildcard.

I then tried open the file directly in openpyxl:

  • using v3.1.2: failed with message: ValueError: Unable to read workbook: could not read worksheets from ../test-filter.xlsx. This is most probably because the workbook source files contain some invalid XML. Please see the exception for more details.
  • using v.3.0.10: it opened without any issue.

So this is likely an openpyxl issue on v.3.1, and not a pandas issue. Can you verify that the file can be opened if you downgrade openpyxl to v.3.0.

@topper-123 topper-123 added Closing Candidate May be closeable, needs more eyeballs and removed Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2023
@azhu26
Copy link
Author

azhu26 commented May 16, 2023

Hi, Terji: thank you very much.

Yes, I can confirm: after downgrading openpyxl to 3.0.10, the exception disappeared and everything works smoothly.

@topper-123
Copy link
Contributor

Ok, I'll close this issue then.

Can you bring up this issue with the openpyxl people, to see if this is an issues with openpyxl?

@rhshadrach
Copy link
Member

Related: #51557, #51392

@azhu26
Copy link
Author

azhu26 commented May 16, 2023

Hi, Terji:

thanks. Yes, there is an existing open issue on openpyxl and I added my comments there. Thanks again for quick efforts.

@whuwangyong
Copy link

related: versions 3.1.0 and up fail to open files saved with active filters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants