Skip to content

BUG - anchoring dates for resample with Day(n>1) #24159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 13, 2018

Conversation

ms7463
Copy link
Contributor

@ms7463 ms7463 commented Dec 8, 2018

@pep8speaks
Copy link

pep8speaks commented Dec 8, 2018

Hello @ArtinSarraf! Thanks for updating the PR.

Comment last updated on December 08, 2018 at 22:17 Hours UTC

@codecov
Copy link

codecov bot commented Dec 8, 2018

Codecov Report

Merging #24159 into master will decrease coverage by 49.17%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24159       +/-   ##
===========================================
- Coverage    92.2%   43.02%   -49.18%     
===========================================
  Files         162      162               
  Lines       51700    51700               
===========================================
- Hits        47670    22245    -25425     
- Misses       4030    29455    +25425
Flag Coverage Δ
#multiple ?
#single 43.02% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/resample.py 23.36% <0%> (-73.63%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-98.62%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 120 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c911151...be07f5b. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 8, 2018

Codecov Report

Merging #24159 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24159      +/-   ##
==========================================
+ Coverage    92.2%   92.21%   +0.01%     
==========================================
  Files         162      162              
  Lines       51700    51763      +63     
==========================================
+ Hits        47669    47733      +64     
+ Misses       4031     4030       -1
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43.01% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.81% <100%> (-0.18%) ⬇️
pandas/core/arrays/categorical.py 95.3% <0%> (-0.1%) ⬇️
pandas/util/testing.py 87.41% <0%> (-0.1%) ⬇️
pandas/core/frame.py 96.91% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.27% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.9% <0%> (ø) ⬆️
pandas/core/arrays/datetimes.py 98.56% <0%> (ø) ⬆️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/io/pytables.py 92.31% <0%> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0b9eea...37252a0. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

i think there are a number of open issues related to this, see if you can find them and add these as test cases.

(86400, 'S', 1, 'D')
])
def test_resample_equivalent_offsets(self, n1, freq1, n2, freq2):
for i in range(1, 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can parametrize on this too

@jreback jreback requested a review from mroeschke December 9, 2018 14:26
@jreback jreback added Bug Frequency DateOffsets Resample resample method labels Dec 9, 2018
@ms7463
Copy link
Contributor Author

ms7463 commented Dec 9, 2018

@jreback - I don't see any open issues that seem to be directly relevant to this issue. I see a few resampling ones, but they have more to do with the behavior of the closed parameter. This issue is specific to resampling with pd.offsets.Day(n>1). If you can point me to any issues I can take a look and add test cases for it.

@@ -43,8 +43,8 @@ def test_groupby_with_timegrouper(self):

expected = DataFrame(
{'Quantity': 0},
index=date_range('20130901 13:00:00',
'20131205 13:00:00', freq='5D',
index=date_range('20130901',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this test have to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior for Day(n>1) is inconsistent with all other offsets and their multiples, in that it is not getting anchored on the offset when being resampled. This was due to the reversal of the modulo op (which was valid for n=1, but not otherwise).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the result/expected before and the result/expected after this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> s = pd.Series(range(50), index=pd.date_range('11/11/2000 06:00:00', periods=50, freq='H'))
>>> s.resample('2D').mean()

Before:

# Result - note that the freq is not set either 
>>> s.resample('2D').mean()
2000-11-11 06:00:00    48
2000-11-13 06:00:00     2
dtype: int64

# Expected - freq should be set to 2D and index should be anchored to 0hour of the day
2000-11-11    42
2000-11-13     8
Freq: 2D, dtype: int64

After: both match expected from above

The correct analogous behavior with hourly resample can be seen with the current master code

>>> s = pd.Series(range(200), index=pd.date_range('11/11/2000 06:23', periods=200, freq='Min'))
>>> s.resample('2H').mean()
2000-11-11 06:00:00     48
2000-11-11 08:00:00    148
Freq: 2H, dtype: int64

Notice that the daterange didn't start on the offset, but the result was anchored to the hour start

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks

(24, 'H', 1, 'D'),
(60, 'S', 1, 'Min'),
(3600, 'S', 1, 'H'),
(86400, 'S', 1, 'D')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a frequency where the n=0.5? e.g. 0.5D and 12 H

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Off to you @jreback

@jreback jreback added this to the 0.24.0 milestone Dec 13, 2018
@jreback jreback merged commit b7ee829 into pandas-dev:master Dec 13, 2018
@jreback
Copy link
Contributor

jreback commented Dec 13, 2018

thanks @ArtinSarraf

I have a feeling this might solve some open resample issues, if you'd have a look thru the tracker (resample is a label)

if you find any pls send a PR with tests!

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 13, 2018

@jreback - I just looked through all the open issues labeled with "resample" I didn't see any other ones that I think this would have solved, since this is very specific to daily frequencies.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.DataFrame.resample behaves inconsistently when the rule is set to 24H/1D and 48H/2D
5 participants