Skip to content

reading xslx returns incorrect data due to bug in openpyxl load_workbook being called with use_iterators=True #1629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ruidc opened this issue Jul 16, 2012 · 8 comments · Fixed by #3164
Labels
Bug IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@ruidc
Copy link
Contributor

ruidc commented Jul 16, 2012

The situation is described in https://bitbucket.org/ericgazoni/openpyxl/issue/124/rawcellis_date-returns-false-positive
and results in datetime being returned instead of float causing incorrect data.

Could this method argument be exposed to the ExcelFile init with a default of True?

using False will work around the issue, presumably at the cost of performance.
Iteration code would then be needed in pandas ExcelFile._parse_xlsx

... As this bug has existed for months in openpyxl without comment and code is still described as "very raw" in openpyxl

https://bitbucket.org/ericgazoni/openpyxl/src/0082a961cf8b/openpyxl/reader/iter_worksheet.py#cl-27

Alternatively, if xlrd 0.8.0 is released with xlsx support, pandas could use that instead.

@ghost
Copy link

ghost commented Mar 18, 2013

If you provide a test case, I'll take a look.
xlrd 0.9.0 is out with py3 support, need to do some work there anyway.

@ruidc
Copy link
Contributor Author

ruidc commented Mar 18, 2013

So will pandas switch to using xlrd in preference of openpyxl?
I'll see if I can prepare a simple test case.

@ghost
Copy link

ghost commented Mar 18, 2013

Don't know yet. it's actually python3 support in xlwt which would be helpful.

@ruidc
Copy link
Contributor Author

ruidc commented Mar 18, 2013

sample xlsx file and code to reproduce here:
https://bitbucket.org/ruidc/openpyxl/commits/2cc76b66a0543b145a2eedd7860b7e0ef143c4c7

@dieterv77
Copy link
Contributor

I added a pull request to openpyxl to fix this issue there:
https://bitbucket.org/ericgazoni/openpyxl/pull-request/31/fix-issue-124-use-same-check-for-dates-in/diff

@ghost
Copy link

ghost commented Mar 27, 2013

Don't know about making xlrd>= 0.9.0 required in 0.11, punting to 0.12, unless
someone disagrees.

@jtratner
Copy link
Contributor

jtratner commented Sep 5, 2013

@y-p @ruidc is this resolved now? I think it's covered by using xlrd, right?

@ruidc
Copy link
Contributor Author

ruidc commented Sep 5, 2013

Indeed it is (at least when recent xlrd is used). Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants