-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
reading xslx returns incorrect data due to bug in openpyxl load_workbook being called with use_iterators=True #1629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you provide a test case, I'll take a look. |
So will pandas switch to using xlrd in preference of openpyxl? |
Don't know yet. it's actually python3 support in xlwt which would be helpful. |
sample xlsx file and code to reproduce here: |
I added a pull request to openpyxl to fix this issue there: |
Don't know about making xlrd>= 0.9.0 required in 0.11, punting to 0.12, unless |
Indeed it is (at least when recent xlrd is used). Thanks. |
The situation is described in https://bitbucket.org/ericgazoni/openpyxl/issue/124/rawcellis_date-returns-false-positive
and results in datetime being returned instead of float causing incorrect data.
Could this method argument be exposed to the ExcelFile init with a default of True?
using False will work around the issue, presumably at the cost of performance.
Iteration code would then be needed in pandas ExcelFile._parse_xlsx
... As this bug has existed for months in openpyxl without comment and code is still described as "very raw" in openpyxl
https://bitbucket.org/ericgazoni/openpyxl/src/0082a961cf8b/openpyxl/reader/iter_worksheet.py#cl-27
Alternatively, if xlrd 0.8.0 is released with xlsx support, pandas could use that instead.
The text was updated successfully, but these errors were encountered: