-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG - anchoring dates for resample with Day(n>1) #24159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @ArtinSarraf! Thanks for updating the PR.
Comment last updated on December 08, 2018 at 22:17 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #24159 +/- ##
===========================================
- Coverage 92.2% 43.02% -49.18%
===========================================
Files 162 162
Lines 51700 51700
===========================================
- Hits 47670 22245 -25425
- Misses 4030 29455 +25425
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24159 +/- ##
==========================================
+ Coverage 92.2% 92.21% +0.01%
==========================================
Files 162 162
Lines 51700 51763 +63
==========================================
+ Hits 47669 47733 +64
+ Misses 4031 4030 -1
Continue to review full report at Codecov.
|
i think there are a number of open issues related to this, see if you can find them and add these as test cases. |
(86400, 'S', 1, 'D') | ||
]) | ||
def test_resample_equivalent_offsets(self, n1, freq1, n2, freq2): | ||
for i in range(1, 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can parametrize on this too
@jreback - I don't see any open issues that seem to be directly relevant to this issue. I see a few resampling ones, but they have more to do with the behavior of the |
@@ -43,8 +43,8 @@ def test_groupby_with_timegrouper(self): | |||
|
|||
expected = DataFrame( | |||
{'Quantity': 0}, | |||
index=date_range('20130901 13:00:00', | |||
'20131205 13:00:00', freq='5D', | |||
index=date_range('20130901', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this test have to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior for Day(n>1) is inconsistent with all other offsets and their multiples, in that it is not getting anchored on the offset when being resampled. This was due to the reversal of the modulo op (which was valid for n=1, but not otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the result/expected before and the result/expected after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> s = pd.Series(range(50), index=pd.date_range('11/11/2000 06:00:00', periods=50, freq='H'))
>>> s.resample('2D').mean()
Before:
# Result - note that the freq is not set either
>>> s.resample('2D').mean()
2000-11-11 06:00:00 48
2000-11-13 06:00:00 2
dtype: int64
# Expected - freq should be set to 2D and index should be anchored to 0hour of the day
2000-11-11 42
2000-11-13 8
Freq: 2D, dtype: int64
After: both match expected from above
The correct analogous behavior with hourly resample can be seen with the current master code
>>> s = pd.Series(range(200), index=pd.date_range('11/11/2000 06:23', periods=200, freq='Min'))
>>> s.resample('2H').mean()
2000-11-11 06:00:00 48
2000-11-11 08:00:00 148
Freq: 2H, dtype: int64
Notice that the daterange didn't start on the offset, but the result was anchored to the hour start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks
(24, 'H', 1, 'D'), | ||
(60, 'S', 1, 'Min'), | ||
(3600, 'S', 1, 'H'), | ||
(86400, 'S', 1, 'D') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a frequency where the n=0.5? e.g. 0.5D and 12 H
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Off to you @jreback
thanks @ArtinSarraf I have a feeling this might solve some open resample issues, if you'd have a look thru the tracker (resample is a label) if you find any pls send a PR with tests! |
@jreback - I just looked through all the open issues labeled with "resample" I didn't see any other ones that I think this would have solved, since this is very specific to daily frequencies. |
git diff upstream/master -u -- "*.py" | flake8 --diff