-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: timezone-aware Period
#45736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there a reason you can't pass freq as a separate argument to ts_right? |
Thanks @jbrockmendel. The In addition, the integration with the (Including where this is done under the hood, e.g. when selecting a single Don't get me wrong, I agree that timestamps denote infinitely short moments in time and have no 'duration' associated with them. But it's a workaround until |
Semantically Periods are spans of time (akin to timedeltas) which are supposed to be agnostic to timezone arithmetic in the first place, so I don't think it makes much sense to add timezone to a period |
Thanks for your reply @mroeschke. Could you please expand on that? I've seen similar comments before, but I never understand the reasoning behind them. The analogy to timestamps is very clear to me:
The natural relation between periods and timestamps is that some timestamps fall inside the period, and other do not. If the timestamp is UTC-offset-agnostic, I can verify if it falls inside a timezone-agnostic period. Likewise, I should be able to verify if a timestamp with a UTC-offset falls inside a timezone-aware period. I.e., the timestamp |
Now, that is not to say that I don't see that timezone-awareness in periods is more complex than in timestamps. Here are some of the difficulties:
NB: The latter example is what I'm currently using timezone-aware timestamps with frequency for. I hope I've been able to make my case :) But please share your perspectives and/or try to convince me otherwise. |
One final thought. I don't want to go into possible implementation details here, but I think it's important to realize that a timezone can be expressed in two ways: as an offset from UTC (e.g., '+01:00'), or as tied to a geographic location (e.g., 'Europe/Berlin').
|
a period could have time zone aware end points but i don't see a good reason for a time zone aware duration (not to mention a massive increase in complexity) same reason we are killing the freq attribute in a Timestamp which is an instant |
If the end points are timezone-aware, the duration (which I understand to be the |
exactly so why would you then need a time zone if a Period? or rather what does it mean at all? what if u have naive endpoints? what if it had time zone aware endpoints? this is such a huge ambiguity that it is unworkable |
Let me try to make it more clear:
Could you please elaborate, where you think the ambiguity comes in? If it's not clear, why such a timezone-aware period might be needed: it is needed whenever the actual duration is relevant in order to aggregate values. I will give some use-cases that I can think of:
Note that this is not much different from the case where months have variable lengths. Just that the variation comes from another source, namely that a timezone ( |
@rwijtvliet you are missing the point having start/end points as tz aware is possible ; these describe instants of time. but have a tz attribute on a period itself doesn't make sense; it is literally a duration / time delta, which doesn't have a notion of a time zone geographic location is also not relevant (well it maybe relevant to you but not to a generic concept; and out of scope) |
@jreback can you show me an example of such a period with a tz-aware start/end point? |
These dont' exist but I suppose could. These would do almost exactly what you want and not break the idiom |
Do you have thoughts on how they would they be constructed or used? I can currently do something like But if there is indeed a way to implement that for |
If the timezone is important for your application, since it sounds you are really interested in a start date w/ timezone + a duration as a "period", I think you may want to use Interval / IntervalIndex instead |
@rwijtvliet in the not-too-distant future we'll have Timestamps that have resolution other than nanosecond, e.g. second or millisecond, and those will continue to have timezone support. That is a bit like what you are suggesting, but won't go all the way up to e.g. MonthStart. For your use case, could you just pin the freq to the Timestamp object under a new name? |
As reference, R's I think it's very natural to want to have Periods that are (in some way) timezone-aware. Thanks @rwijtvliet for making the case. I only don't know if the |
Ok thanks guys for all your replies. My current takeaway is
|
I had a quick look at >>> i = pd.date_range('2022', freq='MS', periods=5, tz='Europe/Berlin')
>>> idx = pd.IntervalIndex.from_arrays(i, i + i.freq, closed='left')
>>> idx[2].left, idx[2].right # so far, so good
Timestamp('2022-03-01 00:00:00+0100', tz='Europe/Berlin', freq='MS'), Timestamp('2022-04-01 00:00:00+0200', tz='Europe/Berlin') # (the 'freq' attribute will deprecate shortly)
>>> s = pd.Series([1,2,30,400,-500], idx) Some quick questions for me to judge their usefulness:
def freq(i:pd.Interval):
if pd.Timedelta(hours=23) <= i.length <= pd.Timedelta(hours=25): # got to consider variable day length due to DST-changeover
return pd.offsets.Day()
elif pd.Timedelta(days=27, hours=23) <= i.length <= pd.Timedelta(days=31, hours=1): # got to consider variable number of days and their variable length due to DST-changeover
if i.left.day == 1 and i.left == i.left.floor('D'):
return pd.offsets.MonthBegin()
# ... and many other checks for various edge cases and particulars of time intervals based on human calendars. If I'm honest: even if I get this to work, and I find a way to use |
Can you clarify what you mean here? There is no DatetimeInterval. |
Ah, @jbrockmendel , I meant Here's a concrete example of one of the difficulties I see with the "add import pandas as pd
# Example: running costs of a machine that is always on.
idx = pd.date_range("2022", freq="MS", periods=12, tz="Europe/Berlin")
costs = pd.DataFrame({"USD_per_h": 1.0}, idx) # fixed running costs for sake of example
# The total costs can be calculated with this function.
def costs_USD(ts, costs_USD_per_h):
timedelta = (ts + ts.freq) - ts
hours = timedelta.total_seconds() / 3600
return costs_USD_per_h * hours
# (A) As long as I'm working with entire series/dataframes, this works, as the `freq` attribute of the index is available, also in the future:
costs["USD"] = costs_USD(costs.index, costs.USD_per_h)
# Note this is correct, with odd number in March and Oct which have DST-change
# costs
# USD_per_h USD
# 2022-01-01 00:00:00+01:00 1.0 744.0
# 2022-02-01 00:00:00+01:00 1.0 672.0
# 2022-03-01 00:00:00+01:00 1.0 743.0
# 2022-04-01 00:00:00+02:00 1.0 720.0
# 2022-05-01 00:00:00+02:00 1.0 744.0
# 2022-06-01 00:00:00+02:00 1.0 720.0
# 2022-07-01 00:00:00+02:00 1.0 744.0
# 2022-08-01 00:00:00+02:00 1.0 744.0
# 2022-09-01 00:00:00+02:00 1.0 720.0
# 2022-10-01 00:00:00+02:00 1.0 745.0
# 2022-11-01 00:00:00+01:00 1.0 720.0
# 2022-12-01 00:00:00+01:00 1.0 744.0
# (B) But, I might deal with a single timestamp and single value; and a 'lone timestamp' will be passed to the function:
costs_in_january = costs_USD(costs.index[0], costs.USD_per_h[0]) # FutureWarning
# (C) Or, I (for whatever reason) might want/have to apply a function row-wise; here too a 'lone timestamp' is passed:
costs['USD'] = costs.apply(lambda row: costs_USD(row.name, row.USD_per_h), axis=1) Surely, there are workarounds once the My point concerning your suggestion in particular is that, yes, I could add def costs_USD2(ts, costs_USD_per_h):
if hasattr(ts, "old_freq"): # or catch AttributeError
freq = ts.old_freq
else:
freq = ts.freq
timedelta = (ts + freq) - ts
hours = timedelta.total_seconds() / 3600
return costs_USD_per_h * hours And then (A) would still work, and for (B) I could do some prep-work before calling: ts = costs.index[0]
ts.old_freq = costs.index.freq
costs_in_january = costs_USD2(ts, costs.USD_per_h[0]) # No FutureWarning However, (C) would not work anymore, and in general, I'd need to remember to add the |
@rwijtvliet i appreciate the discussion but freq is being removed partially because of the added complexity i don't know what to tell you except try interval index |
Yes, I understand and appreciate that, @jreback. Having a However, there is no alternative available that is better suited.
I guess what surprises me the most is that, from your comments, I'm apparently the only person that needs this functionality. But if that's how it is, and you also see no options for me, I'll have to get creative and add workarounds. I just wanted to make sure that that is really how it is, before I start - because I really doubt it. Thanks for your time and input; if you have any questions about my use case or if I can help clarify things, feel free to contact me directly or in this thread. |
If pinning an old_freq attr isn't viable, can you make the timezones unnecessary? Could tz_convert to UTC, then do everything with Periods. |
Sorry didn't mean to close |
(damn I shouldn't be doing this on my phone 😬) Thanks @jbrockmendel for your continued constructive input. I thought about that as well, but I see many problems there, too. E.g.
|
If I were interested in trying to write up this functionality for a timezone-aware I'm thinking about wrappers around the The It's probably a lot more involved than I'm naively thinking, and it'll be a while before I could actually start on it. But that gives us some time to discuss if it's possible and worth doing - I would appreciate some transparent opinions and open discussion on this. |
Surely possible, but daunting. Going back to the old_freq idea, what if you just patched DatetimeIndex's
|
That's an interesting suggestion that I hadn't thought about. This does indeed work: import pandas as pd
def apply_wrapper(Index):
if hasattr(Index, '_iswrapped'):
return # Avoid wrapping multiple times
getitem_original = Index.__getitem__
def getitem_wrapped(self, item):
ts = getitem_original(self, item)
if not hasattr(ts, 'freq'):
ts.freq = self.freq
return ts
Index.__getitem__ = getitem_wrapped
# ToDo: add similar thing for `__iter__`
Index._iswrapped = True
apply_wrapper(pd.DatetimeIndex) I still want to make that |
If we ever decide to implement a timezone-aware period, I think the way to go is to inherit from Relying on |
Is your feature request related to a problem?
A
Timestamp
with afreq
attribute can be used as a replacement/workaround for aPeriod
, if time-zone awareness is needed. Below is an example.With the deprecation of the
.freq
attribute for individual timestamps, this workaround will stop working in a future release.Describe the solution you'd like
Implementation of timezone-awareness in the
Period
class.API breaking implications
Period
andPeriodIndex
initializers take an additionaltz
parameter.Describe alternatives you've considered
The alternative is what I'm currently using (
Timestamp.freq
) but will be deprecated.Additional context
Current implementation of this functionality:
The text was updated successfully, but these errors were encountered: