Skip to content

BUG - anchoring dates for resample with Day(n>1) #24159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 13, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1516,6 +1516,7 @@ Groupby/Resample/Rolling

- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` with ``as_index=False`` leading to the loss of timezone information (:issue:`15884`)
- Bug in :meth:`DatetimeIndex.resample` when downsampling across a DST boundary (:issue:`8531`)
- Bug in date anchoring for :meth:`DatetimeIndex.resample` with offset :class:`Day` when n > 1 (:issue:`24127`)
- Bug where ``ValueError`` is wrongly raised when calling :func:`~pandas.core.groupby.SeriesGroupBy.count` method of a
``SeriesGroupBy`` when the grouping variable only contains NaNs and numpy version < 1.13 (:issue:`21956`).
- Multiple bugs in :func:`pandas.core.Rolling.min` with ``closed='left'`` and a
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1588,7 +1588,7 @@ def _get_range_edges(first, last, offset, closed='left', base=0):
day_nanos = delta_to_nanoseconds(timedelta(1))

# #1165
if (is_day and day_nanos % offset.nanos == 0) or not is_day:
if (is_day and not offset.nanos % day_nanos) or not is_day:
return _adjust_dates_anchored(first, last, offset,
closed=closed, base=base)

Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/groupby/test_timegrouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ def test_groupby_with_timegrouper(self):

expected = DataFrame(
{'Quantity': 0},
index=date_range('20130901 13:00:00',
'20131205 13:00:00', freq='5D',
index=date_range('20130901',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this test have to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior for Day(n>1) is inconsistent with all other offsets and their multiples, in that it is not getting anchored on the offset when being resampled. This was due to the reversal of the modulo op (which was valid for n=1, but not otherwise).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the result/expected before and the result/expected after this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> s = pd.Series(range(50), index=pd.date_range('11/11/2000 06:00:00', periods=50, freq='H'))
>>> s.resample('2D').mean()

Before:

# Result - note that the freq is not set either 
>>> s.resample('2D').mean()
2000-11-11 06:00:00    48
2000-11-13 06:00:00     2
dtype: int64

# Expected - freq should be set to 2D and index should be anchored to 0hour of the day
2000-11-11    42
2000-11-13     8
Freq: 2D, dtype: int64

After: both match expected from above

The correct analogous behavior with hourly resample can be seen with the current master code

>>> s = pd.Series(range(200), index=pd.date_range('11/11/2000 06:23', periods=200, freq='Min'))
>>> s.resample('2H').mean()
2000-11-11 06:00:00     48
2000-11-11 08:00:00    148
Freq: 2H, dtype: int64

Notice that the daterange didn't start on the offset, but the result was anchored to the hour start

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks

'20131205', freq='5D',
name='Date', closed='left'))
expected.iloc[[0, 6, 18], 0] = np.array([24, 6, 9], dtype='int64')

Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/resample/test_datetime_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1463,3 +1463,24 @@ def f(data, add_arg):
result = df.groupby("A").resample("D").agg(f, multiplier)
expected = df.groupby("A").resample('D').mean().multiply(multiplier)
assert_frame_equal(result, expected)

@pytest.mark.parametrize('n1, freq1, n2, freq2', [
(60, 'Min', 1, 'H'),
(1440, 'Min', 1, 'D'),
(24, 'H', 1, 'D'),
(60, 'S', 1, 'Min'),
(3600, 'S', 1, 'H'),
(86400, 'S', 1, 'D')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a frequency where the n=0.5? e.g. 0.5D and 12 H

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

])
def test_resample_equivalent_offsets(self, n1, freq1, n2, freq2):
for i in range(1, 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can parametrize on this too

n1_ = n1 * i
n2_ = n2 * i
s = pd.Series(0, index=pd.date_range('19910905 13:00',
'19911005 07:00',
freq=freq1))
s = s + range(len(s))

result1 = s.resample(str(n1_) + freq1).mean()
result2 = s.resample(str(n2_) + freq2).mean()
assert_series_equal(result1, result2)