Skip to content

API: Wrong DateOffset behaviour with DST changes #16980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tapia opened this issue Jul 16, 2017 · 6 comments
Closed

API: Wrong DateOffset behaviour with DST changes #16980

tapia opened this issue Jul 16, 2017 · 6 comments
Labels
API Design Docs Frequency DateOffsets Timezones Timezone data dtype

Comments

@tapia
Copy link

tapia commented Jul 16, 2017

Given a timezone-aware Timestamp:

foo = pd.Timestamp('2016-10-30 00:00:00', tz=pytz.timezone('Europe/Helsinki'))

(Please note that 2016-10-30 is a 25-hour day, due to a DST change. This day the hour changes from +0300 to +0200)

I'm trying to get the next day. If I understand correctly the DateOffset behaviour, all these lines should be equivalent:

foo + pd.tseries.frequencies.to_offset('D')
foo + pd.tseries.offsets.Day()
foo + pd.DateOffset()
foo + pd.DateOffset(1)
foo + pd.DateOffset(days=1)

But the first four return a wrong date, presumably because they're not adding a day, but 24 hours:

Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')

And for some reason, the last one returns the correct date:

Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.6.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8
LOCALE: es_ES.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@tapia tapia changed the title DateOffset behaviour with DST changes BUG: Wrong DateOffset behaviour with DST changes Jul 16, 2017
@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

xref to #8774 and #7825

the first 4 are equivalent to Day, so what you are asking is why are these different.

In [44]: foo + pd.DateOffset(days=1)
Out[44]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')

In [45]: foo + pd.offsets.Day()
Out[45]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')

Day is a sub-class of Tick which like Minute, Second, is applicable to the UTC time directly (IOW these are converted to UTC added, then reconverted to local). So D is effectively 24 hours and not days.

While other offsets e.g. Month and such. seek to preserve the DST semantics exactly.

DateOffset(days=1) works like Month and such and respects DST.

So this is a bit confusing. I don't really recall why it was structured like this. You are welcome to delve into the linked issues and code and see if you can write up a better explanation (which maybe we want to add to the docs).

@jreback jreback added Frequency DateOffsets Timezones Timezone data dtype Docs labels Jul 16, 2017
@tapia
Copy link
Author

tapia commented Jul 16, 2017

The problem is that I need a way to write a string offset ('D', '10m', whatever) that is timezone aware. If I'm going to add a day, I need to add an actual day, not 24 hours. How can I achieve this using to_offset?

I can't use DateOffset to do this, because my code receives the offset string as a parameter, and writing a parser that "overrides" the Pandas parser looks like a very bad idea.

@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2017

I'm not sure if you can, but at the very least, what you can do is check for "D" in the string parameter and use pd.DateOffset(days=n) as the workaround. Does that work for you?

@tapia
Copy link
Author

tapia commented Jul 17, 2017

Yes, of course, that's what I ended doing. But, regardless of how I workaround the problem, I still think this is a bug in Pandas.

Thank you guys for your help :-)

@gfyoung gfyoung changed the title BUG: Wrong DateOffset behaviour with DST changes API: Wrong DateOffset behaviour with DST changes Jul 17, 2017
@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

If you want to have a detailed look inside the code/tests for this would be great. I don't remember exactly why this is the way it is.

@jbrockmendel jbrockmendel mentioned this issue Dec 19, 2017
39 tasks
@jreback jreback added this to the No action milestone Apr 9, 2018
@jreback
Copy link
Contributor

jreback commented Apr 9, 2018

consolidated to #20633

@jreback jreback closed this as completed Apr 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Frequency DateOffsets Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants