-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: resampling with NaT in TimedeltaIndex (#13223) #14649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
303541e
b3ae4c7
fb7dc7d
6c17f67
25dcff5
55eccd9
fed1827
e15de4d
251826f
61fa8be
e0647ba
edd2939
23889d3
7b84eb6
dd368eb
d0a281f
2340fb8
1c106c8
4121c75
5441d39
29d81f3
3f91d5a
2eb6d38
d92a759
37fe2c4
f000a4e
cd67704
211ecd5
24a2155
524a9a0
0b07b07
04e1168
7ae4fd1
07ac39e
ee406d2
d652485
5f0b69a
ca6d88b
ed2a2e4
0b77680
c198e28
f5b7bcb
0159dc2
a00ad37
84bbeae
f4a03d9
5067708
09360d8
dc32350
0bf4532
c61b350
7740231
e097bf5
1123982
fdee922
38a34be
a347ecb
c52ff68
648ae4f
9da0e0b
8daf677
c9d4e0b
11c9479
d32acaa
b508a04
3d69988
54e71a7
1a75f49
2229c26
a1d3ff3
ae0a92a
27b0ba7
df6783f
a4bba28
0cfc950
5667a3a
470c327
a703abc
2203808
4ce9c0c
a37c610
94c6c0c
67d529a
1be66ac
15e8e9a
5dee1f1
026e748
aa53e4f
5eac08a
e0b37f9
a212738
7c5ebd5
3510956
56b5a30
7d04391
97c065e
03dca96
32df1e6
998c801
05d70f4
7d34d4d
c7c74ad
2621b31
2cad4dd
76e5185
6821291
e7956c4
37e5f78
fe15466
3cac2d5
acb9d01
61f6f63
d313808
087c2f1
b69c877
3ba68a7
de17fd9
a73e451
2c3f808
d968260
f0533e4
0d9d27c
0ad8976
ad3d886
a1b118c
043efa6
5e96fb0
a9c8239
ee19222
492b8f7
be2dad1
6a52c15
fe8420a
6333412
59b88ab
5f4a5b4
9ab57dc
bd24926
b1e29db
8bde21a
771e36c
2b45e44
bff47f2
f2e942e
92239f5
8c80b6b
783ae69
1e753d7
aa9d0cf
19c8032
163d18e
1c9d46a
32dd929
a20009f
bc1235e
2a3b05a
fb7af6e
79581ff
1a266ee
94720d9
7fa7752
1bcb671
56ccad8
39a46ff
9d3554c
5d28f26
59f977f
8c8dd88
80f30b4
83e24ca
7a42240
c577c19
156bfd2
22f9d0d
d2f32a0
1058988
c80bd19
18ac0b7
0caf685
1793637
da92411
7e3dd90
aff78d9
056c0a6
80280ec
686e9e0
71f621f
7e43c78
1dab800
a940605
66fb0a3
d96ff29
6f789e1
ec84ae3
34c6bd0
2e64614
bd169dc
f3e3cfe
abf1697
ecaeea1
0ab0813
de589c2
046d3be
3d6c5a8
48749ce
1e0fbd2
9c98e13
1f89060
b6d405d
a108651
1b53d88
e7201ca
57c7c87
a57e681
d1e1ba0
74f527f
a293d22
67cc021
cd24fa9
f49f905
7059d89
4cb730e
b199fbf
ed07df1
456e729
ca7207f
cd51bdd
ff652a5
eedcc8f
da0523a
faf6401
e50d397
e0b60c0
ca8ef49
0a37067
dbc1654
e4e87ec
ba30e3a
1fbdc23
b070d51
763197c
a0b089e
c112252
4502e82
0cfc08c
078dbdf
3c54c4b
ec144f3
30c749c
5372175
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,8 @@ | |
from pandas.core.base import AbstractMethodError, GroupByMixin | ||
|
||
from pandas.core.groupby import (BinGrouper, Grouper, _GroupBy, GroupBy, | ||
SeriesGroupBy, groupby, PanelGroupBy) | ||
SeriesGroupBy, groupby, PanelGroupBy, | ||
DataError) | ||
|
||
from pandas.tseries.frequencies import to_offset, is_subperiod, is_superperiod | ||
from pandas.tseries.index import DatetimeIndex, date_range | ||
|
@@ -1219,13 +1220,18 @@ def _get_time_delta_bins(self, ax): | |
raise TypeError('axis must be a TimedeltaIndex, but got ' | ||
'an instance of %r' % type(ax).__name__) | ||
|
||
if len(ax) > 0 and all(ax._isnan): | ||
raise DataError('all-nan groupings not valid') | ||
|
||
if not len(ax): | ||
binner = labels = TimedeltaIndex( | ||
data=[], freq=self.freq, name=ax.name) | ||
return binner, [], labels | ||
|
||
start = ax[0] | ||
end = ax[-1] | ||
# Addresses GH #13223 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comment on what you are doing here (and why not just selecting start/end) |
||
start = ax.min() | ||
end = ax.max() | ||
|
||
labels = binner = TimedeltaIndex(start=start, | ||
end=end, | ||
freq=self.freq, | ||
|
@@ -1234,6 +1240,13 @@ def _get_time_delta_bins(self, ax): | |
end_stamps = labels + 1 | ||
bins = ax.searchsorted(end_stamps, side='left') | ||
|
||
if ax.hasnans: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. though we actually need to ignore the nans, doesn't this actually create a NaT group? this is inconcistent with grouping. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NaT group will be ignored in aggregation. It is handled the same way as in function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comment here on what you are doing |
||
binner = binner.insert(0, tslib.NaT) | ||
labels = labels.insert(0, tslib.NaT) | ||
|
||
n_NaT = ax._isnan.sum() | ||
bins = np.insert(bins, 0, n_NaT) | ||
|
||
# Addresses GH #10530 | ||
if self.base > 0: | ||
labels += type(self.freq)(self.base) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1025,6 +1025,20 @@ def test_resample_timedelta_idempotency(self): | |
expected = series | ||
assert_series_equal(result, expected) | ||
|
||
def test_resample_timedelta_missing_values(self): | ||
# GH 13223 | ||
index = pd.to_timedelta(['0s', pd.NaT, '2s']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test with an all NaT index There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do |
||
series = pd.Series([2, 3, 5], index=index) | ||
result = series.resample('1s').mean() | ||
expected = pd.Series([2, np.nan, 5], index=pd.timedelta_range( | ||
start='0s', end='2s', freq='1s')) | ||
assert_series_equal(result, expected) | ||
|
||
# all NaT | ||
index = pd.to_timedelta([pd.NaT, pd.NaT, pd.NaT]) | ||
series = pd.Series([2, 3, 5], index=index) | ||
self.assertRaises(DataError, series.resample('1s').mean) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably the right idea (e.g. raising a nice message), e.g. we do this for all-nan groups (which is unfriendly)
|
||
|
||
def test_resample_rounding(self): | ||
# GH 8371 | ||
# odd results when rounding is needed | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give a better error message here. Is this consistent with how we handle all-nan grouping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All-nan grouping doesn't seem to be handled elsewhere. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we currently ignore all nan groups, that's why I think it is there, so this is consistent, maybe a comment is worth it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is all data is nan. Hmm, I would just give a better error message then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
ax.hasnans