BUG: fix origin epoch when freq is Day and harmonize epoch between timezones #34474

hasB4K · 2020-05-30T00:44:46Z

Follow-up of #31809.

The purpose of this PR is to fix the current behavior on master:

import pandas as pd
import numpy as np

start, end = "2000-08-02 23:30:00+0500", "2000-12-02 00:30:00+0500"
rng = pd.date_range(start, end, freq="7min")
ts = pd.Series(np.random.randn(len(rng)), index=rng)

result_1 = ts.resample("D", origin="epoch").count()
result_2 = ts.resample("24H", origin="epoch").count()

print(f"result_1:\n\n{result_1}\n")
print(f"result_2:\n\n{result_2}\n")

Outputs on master:

result_1:

2000-08-02 00:00:00+05:00      5
2000-08-03 00:00:00+05:00    205
2000-08-04 00:00:00+05:00    206
2000-08-05 00:00:00+05:00    206
2000-08-06 00:00:00+05:00    206
                            ... 
2000-11-28 00:00:00+05:00    206
2000-11-29 00:00:00+05:00    206
2000-11-30 00:00:00+05:00    205
2000-12-01 00:00:00+05:00    206
2000-12-02 00:00:00+05:00      5
Freq: D, Length: 123, dtype: int64

result_2:

2000-08-02 05:00:00+05:00     48
2000-08-03 05:00:00+05:00    205
2000-08-04 05:00:00+05:00    206
2000-08-05 05:00:00+05:00    206
2000-08-06 05:00:00+05:00    205
                            ... 
2000-11-27 05:00:00+05:00    206
2000-11-28 05:00:00+05:00    206
2000-11-29 05:00:00+05:00    206
2000-11-30 05:00:00+05:00    205
2000-12-01 05:00:00+05:00    168
Freq: 24H, Length: 122, dtype: int64

Expected Outputs (with this PR):

result_1:

2000-08-02 00:00:00+05:00      5
2000-08-03 00:00:00+05:00    205
2000-08-04 00:00:00+05:00    206
2000-08-05 00:00:00+05:00    206
2000-08-06 00:00:00+05:00    206
                            ... 
2000-11-28 00:00:00+05:00    206
2000-11-29 00:00:00+05:00    206
2000-11-30 00:00:00+05:00    205
2000-12-01 00:00:00+05:00    206
2000-12-02 00:00:00+05:00      5
Freq: D, Length: 123, dtype: int64

result_2:

2000-08-02 00:00:00+05:00      5
2000-08-03 00:00:00+05:00    205
2000-08-04 00:00:00+05:00    206
2000-08-05 00:00:00+05:00    206
2000-08-06 00:00:00+05:00    206
                            ... 
2000-11-28 00:00:00+05:00    206
2000-11-29 00:00:00+05:00    206
2000-11-30 00:00:00+05:00    205
2000-12-01 00:00:00+05:00    206
2000-12-02 00:00:00+05:00      5
Freq: 24H, Length: 123, dtype: int64

I thought of two possible solutions while fixing that:

Consider 'epoch' as a UNIX epoch: pd.Timestamp(0, tz=index_tz)
Consider that 'epoch' is just an helper to fix the origin: pd.Timestamp("1970-01-01", tz=index_tz)

To have similar results between timezones, I thought the solution 2 was a better choice. The solution 2 could also help us to set the behavior origin=epoch as the default one in a future version since the results would be quite similar with origin=start_day (that is not quite the case with a DST timezone, I can provide further details if needed on why this is the case).

(The solution 1 is correct in theory but is giving results that could be hard to explain to the end user.)

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry (follow-up PR, not necessary I think)

jreback

lgtm @mroeschke if you’d have a quick look

pandas/core/resample.py

…mezones

hasB4K · 2020-06-01T10:45:34Z

It should be good now @jreback 😉. I had an issue with the CI, but #34509 has been merged since.

jreback · 2020-06-01T13:08:33Z

pandas/core/resample.py

    if isinstance(freq, Tick):
+        index_tz = first.tz


shouldn't this actually be close to where you set the origin (e.g. in the TimeGrouper)?

I don't have access to the variables first or end to infer the index timezone in TimeGrouper.__init__ so I don't think I can do what you suggest. 😕

We used to infer the index_tz exactly on the same place before #31809 as you can see here on the tag v.1.0.4:

pandas/pandas/core/resample.py

Line 1650 in 1ce1c3c

tz = first.tz

In my opinion that make sense since the API of groupby (shared with the resample API) can accept any kind of grouper, so the bins (that needs the information of first and end) can only be calculated when we apply the grouper against a Series or a DataFrame.

import pandas as pd import numpy as np start, end = "2000-08-02 23:30:00+0500", "2000-12-02 00:30:00+0500" rng = pd.date_range(start, end, freq="7min") ts_1 = pd.Series(np.random.randn(len(rng)), index=rng) ts_2 = pd.Series(np.random.randn(len(rng)), index=rng) # => The example line that create this impossibility: grouper = pd.Grouper(freq="D", origin="epoch") result_1 = ts_1.groupby(grouper).mean() result_2 = ts_2.groupby(grouper).mean()

jreback · 2020-06-01T22:18:06Z

thanks @hasB4K very nice

hasB4K added API - Consistency Internal Consistency of API/Behavior Resample resample method labels May 30, 2020

hasB4K requested review from WillAyd and jreback May 30, 2020 00:45

hasB4K force-pushed the bugfix_origin_epoch branch from 9de9dcd to 9480311 Compare May 30, 2020 00:47

jreback approved these changes Jun 1, 2020

View reviewed changes

jreback added this to the 1.1 milestone Jun 1, 2020

mroeschke reviewed Jun 1, 2020

View reviewed changes

pandas/core/resample.py Outdated Show resolved Hide resolved

hasB4K force-pushed the bugfix_origin_epoch branch from 9480311 to 0635651 Compare June 1, 2020 06:49

hasB4K requested a review from mroeschke June 1, 2020 07:00

BUG: fix origin epoch when freq is Day and harmonize epoch between ti…

2771381

…mezones

hasB4K force-pushed the bugfix_origin_epoch branch from 0635651 to c42e788 Compare June 1, 2020 08:39

CLN: small refactor in resample.py

c78e29b

hasB4K force-pushed the bugfix_origin_epoch branch from c42e788 to c78e29b Compare June 1, 2020 11:52

jreback reviewed Jun 1, 2020

View reviewed changes

mroeschke approved these changes Jun 1, 2020

View reviewed changes

jreback merged commit 3970153 into pandas-dev:master Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fix origin epoch when freq is Day and harmonize epoch between timezones #34474

BUG: fix origin epoch when freq is Day and harmonize epoch between timezones #34474

Uh oh!

hasB4K commented May 30, 2020 •

edited

Loading

Uh oh!

jreback left a comment

Uh oh!

Uh oh!

hasB4K commented Jun 1, 2020

Uh oh!

jreback Jun 1, 2020

Uh oh!

hasB4K Jun 1, 2020 •

edited

Loading

Uh oh!

jreback commented Jun 1, 2020

Uh oh!

Uh oh!

Uh oh!

BUG: fix origin epoch when freq is Day and harmonize epoch between timezones #34474

BUG: fix origin epoch when freq is Day and harmonize epoch between timezones #34474

Uh oh!

Conversation

hasB4K commented May 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hasB4K commented Jun 1, 2020

Uh oh!

jreback Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

hasB4K Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 1, 2020

Uh oh!

Uh oh!

hasB4K commented May 30, 2020 •

edited

Loading

hasB4K Jun 1, 2020 •

edited

Loading