-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: read_excel (xlrd engine) add parameter for ignore_workbook_corruption #40274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Since it's an issue unique to XLS files, would |
Yes, as you know xlrd dropped support to all formats, except xls. So in my opinion it's an appropriate name Edit: just thinking, maybe also openpyxl and odf have a flag to ignore corrupted files? If yes maybe a more general name could be used Edit 2: openpyxl seems to not support such a flag, I am not sure but I didn't found anything similar also in odfpy. So yes, |
take |
@rhshadrach Is this issue okay to pick up and implement? |
I'm generally negative on adding engine-specific keywords. pandas calls This would require:
|
@rhshadrach Okay! I can do that if that's the official route that we want to take for this issue! |
cc @phofl for any thoughts |
agree with @rhshadrach An engine specific kwarg is not a great idea, engine_kwargs makes more sense |
@rmhowe425 - I think we have a good way forward. Please go ahead with this if you're still interested! |
* Fixing merge conflicts * Fixing merge conflict * Fixing documentation issues * standardized usage of engine_kwargs, fixed unit tests & doc strings * Fixing documentation issues * Fixing implementation logic and unit tests * Fixing implementation logic * Fixing formatting issues * Fixing error for test Docstring validation, typing, and other manual pre-commit hooks * Fixing documentation error * Standardizing engine_kwarg types * Fixing minor issues with unit tests and documentation * Fixing documentation issue * Fixing a formatting / documentation error * Fixing documentation errors * Fixing documentation errors * Fixing documentation errors * Fixing documentation errors * Fixing documentation errors * Adding an extra blank line to troubleshoot documentation error * Adding an extra blank line to troubleshoot documentation error * Fixing documentation issues * Fixing formatting errors * Fixing formatting errors * Fixing formatting errors * Fixing logic and formatting issues in unit tests * Fixing issues with merge conflict * Fixing formatting issue * Update pandas/io/excel/_base.py --------- Co-authored-by: Matthew Roeschke <[email protected]>
Is your feature request related to a problem?
yes, sometimes, expecially when created by 3rd party application old excel file can be corrupted:
https://www.codeforests.com/2020/05/28/fix-the-compdocerror-for-xlrd/
https://stackoverflow.com/questions/12705527/reading-excel-files-with-xlrd
https://stackoverflow.com/questions/34550624/python-xlrd-read-excel-file-error/
https://titanwolf.org/Network/Articles/Article?AID=f1370e16-c9ef-4892-a00e-b1bafee2f6a8#gsc.tab=0
Describe the solution you'd like
xlrd's team recently in version 2.0 (december 2020) solved this with a flag:
ignore_workbook_corruption
, please see https://xlrd.readthedocs.io/en/latest/changes.htmlSo my desire would be to just have this flag inside the pandas's function
read_excel
API breaking implications
any, I guess
Describe alternatives you've considered
Open the file directly with xlrd and pass it to pandas
I may be unaware of other solutions
The text was updated successfully, but these errors were encountered: