Skip to content

BUG: Fix to read decimal seconds from Excel. #6934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2014
Merged

BUG: Fix to read decimal seconds from Excel. #6934

merged 1 commit into from
Apr 24, 2014

Conversation

jmcnamara
Copy link
Contributor

Fix to allow decimal seconds to be read from Excel dates and times
into datetime objects. #5945.

This required a fix to the xlrd module to return milliseconds from Excel dates and times. That fix was recently released to PyPI in xlrd version 0.9.3.

Tests, version updates and release note included.

@@ -2,7 +2,7 @@ python-dateutil==2.1
pytz==2013b
openpyxl==1.6.2
xlsxwriter==0.4.6
xlrd==0.9.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this one (so that the test will skip)

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

this needs to be able to raise on an older version of xlrd (and a test that will correctly skip) on the same.

@jmcnamara
Copy link
Contributor Author

@jreback

Patch updated with your comments, squashed and re-pushed:

  • I made the ci/requirements-3.2.txt change to xlrd version 0.9.2 for testing (although I'd prefer to change that back to 0.9.3 before merging since it is bound to cause confusion because of inconsistency).
  • I made the LooseVersion() changes in pandas/io/excel.py and pandas/io/tests/test_excel.py.
  • I tested with an older version of xlrd. The tests all skip and the module throws an exception:
>>> import pandas as pd
>>>
>>> pd.read_excel('decimal_seconds_1900.xls', 'Sheet1')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/io/excel.py", line 108, in read_excel
    return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
  File "pandas/io/excel.py", line 131, in __init__
    "support, current version " + xlrd.__VERSION__)
ImportError: pandas requires xlrd >= 0.9.3 for excel support, current version 0.9.2

ver = tuple(map(int, xlrd.__VERSION__.split(".")[:2]))
if ver < (0, 9): # pragma: no cover
raise ImportError("pandas requires xlrd >= 0.9.0 for excel "
if LooseVersion(xlrd.__VERSION__) < LooseVersion("0.9.3"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. I think this is too restrictive, its only the this new feature that requires 0.9.3. So instead should require 0.9.0 and raise and Import error as before. If its >= 0.9.0 but less than 0.9.3, just set a module level flag (e.g. something like _XLRD_093 or something to False (and True otherwise); then raise if you are actually parsing an excel file and encounter a timedelta that needs this. Then we don't force users to upgrade unless they need the new capability.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@jmcnamara see my comment above. The ci tests multiple versions of every library so that back compat can be maintained. 0.9.2 should work unless I need to the new capability. Its not realistic to force a user to upgrade xlrd unless they actually need to use the new capabilities (because even though its the current pypi release, what's going to happen is people may not upgrade to it at the same time as they upgrade pandas or vice-versa).

@jreback jreback added this to the 0.14.0 milestone Apr 23, 2014
@jmcnamara
Copy link
Contributor Author

@jrebeck

Ok. Fair enough.

I'll revert most of the changes and just put a conditional in the date conversion to use the millisecond method if available or use the 0.9.2 method if not.

@jmcnamara
Copy link
Contributor Author

I've reworked the PR to handle millisecond times when xlrd >= 0.9.3 is available or to use the previous method if not.

I've added a test case for both conditions and also for the twin Excel epochs for good measure.

else:
value = datetime.datetime(*dt)
# Use the xlrd <= 0.9.2 date handling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this works, just doesn't handle decimal seconds right? is it appropriate to raise if you detect that? (can you even)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. xlrd in previous versions rounded the milliseconds into the seconds. It isn't possible to detect (at least not easily) since all dates and times are stored as floats (with no distinction).

@jreback
Copy link
Contributor

jreback commented Apr 24, 2014

hmm...it is skipping on 3.2 for a different reason (xlwt is not installed). so not running your test. https://travis-ci.org/jmcnamara/pandas/jobs/23636210

so really only testing excel in 2.7...(the locale build is not even running the tests)...

let me add these deps to 2.6

hold a sec

@jreback
Copy link
Contributor

jreback commented Apr 24, 2014

ok...rebase on master...I added the deps to the 2.6 build; so should skip correctly there (the 3.2 build doesn't really matter) because almost all tests that need to do writing are skipping anyhow (because xlwt is not installed)

Fix to allow decimal seconds to be read from Excel dates and times
into datetime objects. #5945.
@jmcnamara
Copy link
Contributor Author

Rebase done so let's see what Travis says.

I tested locally with xlrd 0.9.2 and 0.9.3 and both passed.

jreback added a commit that referenced this pull request Apr 24, 2014
BUG: Fix to read decimal seconds from Excel.
@jreback jreback merged commit a5cab86 into pandas-dev:master Apr 24, 2014
@jreback
Copy link
Contributor

jreback commented Apr 24, 2014

looks good...thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants