Write Excel Round-Trippable Timedeltas Regression #21502

nathanielatom · 2018-06-15T20:46:04Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
from io import BytesIO

pd.__version__ # 0.22 or 0.23.1

import openpyxl
openpyxl.__version__ # 2.5.4

import xlsxwriter
xlsxwriter.__version__ # 1.0.5

df = pd.DataFrame({'Times': pd.TimedeltaIndex(['00:01:00', '00:02:00', '00:03:00', np.nan])})
memfile = BytesIO()
df.to_excel(memfile, engine='openpyxl') # 'xlsxwriter' or 'openpyxl'
memfile.seek(0)
reread = pd.read_excel(memfile)
reread.iloc[0] == df.iloc[0]
# raises TypeError (reread has float type) when engine='xlsxwriter' or pd.__version__ == '0.23.1'
# if pd.__version__ == '0.22' and engine='openpyxl' evaluates to False

Problem description

Writing a Dataframe with timedeltas to excel using the openpyxl engine has changed behaviour from pandas version 0.22 to 0.23.1. Previously, the excel file produced could be read in as a Dataframe again with values matching the original Dataframe. If the xlsxwriter engine or pandas 0.23.1 is used, the timedeltas become floats in the re-read Dataframe.

Engine \ Pandas Version	0.22	0.23.1
openpyxl	√	X
xlsxwriter	X	X

Expected Output

One the one hand, it's nice that pandas now has consistent behaviour between engines, however I think the desired behaviour would be for both engines to match the old openpyxl behaviour rather than the old xlsxwriter behaviour, as this would enable Dataframes written to excel to be used again when read back from excel.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8

pandas: 0.22.0 or 0.23.1
pytest: 3.6.1
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: 0.10.3
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.5
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.8
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-06-15T20:55:53Z

This would have been a result of #19900

Are you saying you were getting those objects back as Timedelta objects or Datetimes before? I think the main motivator to get rid of this in the first place was that it not only was inconsistent across the engines but would implicitly convert a Timedelta to a Datetime in the round-trip process, neither I would think are desirable

nathanielatom · 2018-06-15T21:48:04Z

It looks like before they were object columns with datetime.time instances. These just got printed the same as timedeltas.

reread.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 1 columns):
Times    3 non-null object
dtypes: object(1)
memory usage: 64.0+ bytes

reread.iloc[0].values
Out[8]: array([datetime.time(0, 1)], dtype=object)

Is it possible to make the original timedelta recoverable?

Found this to convert back, but it requires manually tracking the timedelta columns:

pd.TimedeltaIndex(reread['Times'], unit='D').round(freq='s')
Out[36]: TimedeltaIndex(['00:01:00', '00:02:00', '00:03:00', NaT], dtype='timedelta64[ns]', freq=None)

WillAyd · 2018-06-15T22:01:19Z

Yea I think that speaks to the danger that was present with the previous implementation - you could very easily mistake a Timedelta for a Time object and they are not the same.

I'm going to close this under the assumption that a lossless conversion of Timedelta information to/from Excel simply is not possible because Excel does not support Timedeltas natively. If you disagree and can think of a way to do that feel free to reopen - investigation and PRs are always welcome!

To answer your last question, you should be able to construct a Timedelta from a float value directly. For specific usage questions however, you are better asking on Stack Overflow both to get an answer and help others out in the future that may have the same question (we typically reserve this tracker for issues and enhancement requests)

nathanielatom · 2018-06-15T22:07:02Z

Ok, thank you. I didn't know if there was a way to use metadata to track which columns were timedeltas.

WillAyd added IO Excel read_excel, to_excel Timedelta Timedelta data type labels Jun 15, 2018

WillAyd closed this as completed Jun 15, 2018

WillAyd added this to the No action milestone Jun 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write Excel Round-Trippable Timedeltas Regression #21502

Write Excel Round-Trippable Timedeltas Regression #21502

nathanielatom commented Jun 15, 2018 •

edited

Loading

INSTALLED VERSIONS

WillAyd commented Jun 15, 2018

nathanielatom commented Jun 15, 2018 •

edited

Loading

WillAyd commented Jun 15, 2018

nathanielatom commented Jun 15, 2018

Write Excel Round-Trippable Timedeltas Regression #21502

Write Excel Round-Trippable Timedeltas Regression #21502

Comments

nathanielatom commented Jun 15, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Jun 15, 2018

nathanielatom commented Jun 15, 2018 • edited Loading

WillAyd commented Jun 15, 2018

nathanielatom commented Jun 15, 2018

nathanielatom commented Jun 15, 2018 •

edited

Loading

Output of `pd.show_versions()`

nathanielatom commented Jun 15, 2018 •

edited

Loading