Skip to content

Timestamp subtraction should work for differing timezones #15249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
filmor opened this issue Jan 27, 2017 · 13 comments
Closed

Timestamp subtraction should work for differing timezones #15249

filmor opened this issue Jan 27, 2017 · 13 comments
Labels
API Design Datetime Datetime data dtype Timedelta Timedelta data type Timezones Timezone data dtype

Comments

@filmor
Copy link
Contributor

filmor commented Jan 27, 2017

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
In [2]: pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-691a8df26ecd> in <module>()
----> 1 pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")

pandas\tslib.pyx in pandas.tslib._Timestamp.__sub__ (pandas\tslib.c:23697)()

TypeError: Timestamp subtraction must have the same timezones or no timezones

Problem description

If both timestamps have a timezone specified, the result of this operation is perfectly well-defined. It's quite surprising that I have to riddle my code with lhs.tz_convert("UTC") - rhs.tz_convert("UTC") lines to get the difference of timestamps.

Expected Output

Timedelta('-393 days +06:29:07.057926')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.4.0
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jan 27, 2017

we could certainly make this work

generally though if someone is trying to subtract different time zones it's an error (they meant to convert)

but maybe that's not typical

@jreback jreback added API Design Timedelta Timedelta data type Datetime Datetime data dtype Timezones Timezone data dtype labels Jan 27, 2017
@Vutsuak16
Copy link
Contributor

Vutsuak16 commented Feb 7, 2017

@jreback should it be implemented for 'addition operation' too?
Because it will raise the same error while adding two timestamps of different timezones
AND
which side of the operator should be given preference here, i.e. to subtract we ultimately need them to be in same timezones. So should tz=Europe/Berlin should be converted to UTC or UTC should be converted to Europe/Berlin?

@jreback
Copy link
Contributor

jreback commented Feb 7, 2017

addition of timestamps is not a meaningful operation

you will have to handle both sub and rsub
both are converted to UTC before subtraction

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

I am actually now a bit -0 on this as this is a touch too magical. This should be an explicit operation. In reality it doesn't come up that much, so making this a user-explicit operation should not be much of a burden.

But will leave the issue open for discussion / implementation.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2017

closing this, we are moving towards things being very explicit.

@jreback jreback closed this as completed Dec 15, 2017
@jreback jreback added this to the No action milestone Dec 15, 2017
@filmor
Copy link
Contributor Author

filmor commented Dec 15, 2017

I really don't see any magic here. Every non-naive Timestamp specifies a unique point on a common time-axis, completely independent of the timezone being used, so their difference is unambiguously defined as the distance on that time-axis.

Contrary to what you said before, subtracting timestamps with different timezones /is/ a valid action that can happen in particular in library code, e.g. if I provide a function time_to_some_event(ts), where event is given by some backend system's value (i.e. machine to machine communication, so usually UTC) when ts comes from user code, so it will probably have a non-UTC timezone attached.

If you want to be explicit, what you should not allow is subtraction of naive timestamps, as this gives you essentially random values if the naive timestamps don't happen to be UTC.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2017

we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).

this is complicated by the fact that we generally turn strings into naive timestamps, xref

xref #18435

subtracting in different timezones would be valid, but is just plain confusing, forcing folks to put things in the same timezone to subtract is not very burdensome and is much much more explicit.

@jorisvandenbossche
Copy link
Member

we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).

That is not really true, for comparisons we allow this:

In [13]: pd.Timestamp('2016-01-01', tz='Europe/Brussels') > pd.Timestamp('2017-01-01', tz='UTC')
Out[13]: False

In [14]: pd.DatetimeIndex([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[14]: array([False], dtype=bool)

In [15]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[15]: 
0    False
dtype: bool

In [16]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Series([pd.Timestamp('2017-01-01', tz='UTC')])
Out[16]: 
0    False
dtype: bool

@jreback
Copy link
Contributor

jreback commented Dec 15, 2017

see my comment in the linked issue
these should all raise (on the list)

@jorisvandenbossche
Copy link
Member

these should all raise

I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?

@filmor
Copy link
Contributor Author

filmor commented Dec 15, 2017

Also, the actual issue in the linked comment is about tz-naive vs tz-aware timestamps. Of course neither comparison nor difference make sense if tz-naive timestamps are involved (even naive vs naive dubious, cf. pd.Timestamp("DST-Day 02:30") < pd.Timestamp("DST-Day 02:31")).

@foolcage
Copy link

They are just time..and should always contain tz info natively,and so they could compare.

@guyer
Copy link

guyer commented Nov 30, 2019

I know this is long closed, but if all of your Timestamps are in the same timezone, why are you squandering cycles on time zones at all? They're only relevant if you have different events in different time zones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype Timedelta Timedelta data type Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

6 participants