reading xslx returns incorrect data due to bug in openpyxl load_workbook being called with use_iterators=True #1629

ruidc · 2012-07-16T13:19:03Z

The situation is described in https://bitbucket.org/ericgazoni/openpyxl/issue/124/rawcellis_date-returns-false-positive
and results in datetime being returned instead of float causing incorrect data.

Could this method argument be exposed to the ExcelFile init with a default of True?

using False will work around the issue, presumably at the cost of performance.
Iteration code would then be needed in pandas ExcelFile._parse_xlsx

... As this bug has existed for months in openpyxl without comment and code is still described as "very raw" in openpyxl

Alternatively, if xlrd 0.8.0 is released with xlsx support, pandas could use that instead.

ghost · 2013-03-18T04:58:57Z

If you provide a test case, I'll take a look.
xlrd 0.9.0 is out with py3 support, need to do some work there anyway.

ruidc · 2013-03-18T08:19:32Z

So will pandas switch to using xlrd in preference of openpyxl?
I'll see if I can prepare a simple test case.

ghost · 2013-03-18T08:24:48Z

Don't know yet. it's actually python3 support in xlwt which would be helpful.

ruidc · 2013-03-18T13:50:55Z

dieterv77 · 2013-03-20T02:28:00Z

ghost · 2013-03-27T18:44:12Z

Don't know about making xlrd>= 0.9.0 required in 0.11, punting to 0.12, unless
someone disagrees.

jtratner · 2013-09-05T00:07:35Z

@y-p @ruidc is this resolved now? I think it's covered by using xlrd, right?

ruidc · 2013-09-05T06:55:20Z

Indeed it is (at least when recent xlrd is used). Thanks.

ghost mentioned this issue Mar 25, 2013

ENH: Use xlrd >=0.9.0 for both xls/xlsx, sidesteps GH1629 #3164

Merged

ruidc closed this as completed Sep 5, 2013

jtratner mentioned this issue Oct 3, 2013

TST: Add skip test to excelwriter contextmanager #5095

Merged

Provide feedback