You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a Series indexed by a PeriodIndex and downsampling, resampling fails when using how='mean' and where the series to be resampled does not span multiple lower-frequency bins.
For example:
ix = period_range(start="2012-01-01", end="2012-12-31", freq="M")
s = Series(np.random.randn(len(ix)), index=ix)
s.resample("A", how='mean')
Fails because the period range is entirely contained within a single year. I've been able to replicate this going from quarterly to annual, or monthly to quarterly, etc. As of 0.9.1-dev, crashes Python without an exception as Cython function group_mean_bin() attempts to index into an empty bins array.
if bins[len(bins) - 1] == len(values): # Crash
I don't know how fine-grained pandas is right now when aggregating partially-filled periods, but it could be nice to have an option to return a NaN when the higher-frequency window is only partially filled. For example, suppose we sum daily to monthly and take a percent change across months, and either the recording started partway through the first month or data is only available partway through the last month. Then the first or last period percent change will possibly show a dramatic swing, and the user may not realize its simply an artifact of the data availability, as opposed to a truly interesting move in the underlying process. When running alot of automated aggregations the user may wish to not aggregate any partially filled periods in order to protect themselves from reaching a false conclusion about the time-series trend at the beginning or end of the series.
The text was updated successfully, but these errors were encountered:
This issue seems like it can be resolved by uncommenting the Cython decorators at the top function group_mean_bin in src/groupby.pyx:
@cython.boundscheck(False)
This was the only case where I could see these commented out; didn't know if it was just an oversight or some testing was in progress at a point in the past.
I noticed when I install the pre-built binaries for Python 2.7 on Windows I don't run into quite the same error; in that case Python doesn't crash but the call to resample() nonetheless returns an NaN. I built the dev version using MinGW after which I noticed unexpected failures popping up elsewhere, so the compilation process may not have been good. I will also try building with Visual C++. In any case, the original problem of taking the mean when the data is all within one period is still outstanding.
Version 0.10.1
* tag 'v0.10.1': (195 commits)
RLS: set released to true
RLS: Version 0.10.1
TST: skip problematic xlrd test
Merging in MySQL support pandas-dev#2482
Revert "Merging in MySQL support pandas-dev#2482"
BUG: don't let np.prod overflow int64
RLS: note changed return type in DatetimeIndex.unique
RLS: more what's new for 0.10.1
RLS: some what's new for 0.10.1
API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893
Merging in MySQL support pandas-dev#2482
BUG: fix python 3 dtype issue
DOC: fix what's new 0.10 doc bug re pandas-dev#2651
BUG: fix C parser thread safety. verify gil release closepandas-dev#2608
BUG: usecols bug with implicit first index column. closepandas-dev#2654
BUG: plotting bug when base is nonzero pandas-dev#2571
BUG: period resampling bug when all values fall into a single bin. closepandas-dev#2070
BUG: fix memory error in sortlevel when many multiindex levels. closepandas-dev#2684
STY: CRLF
BUG: perf_HEAD reports wrong vbench name when an exception is raised
...
When using a Series indexed by a PeriodIndex and downsampling, resampling fails when using how='mean' and where the series to be resampled does not span multiple lower-frequency bins.
For example:
Fails because the period range is entirely contained within a single year. I've been able to replicate this going from quarterly to annual, or monthly to quarterly, etc. As of 0.9.1-dev, crashes Python without an exception as Cython function group_mean_bin() attempts to index into an empty bins array.
I don't know how fine-grained pandas is right now when aggregating partially-filled periods, but it could be nice to have an option to return a NaN when the higher-frequency window is only partially filled. For example, suppose we sum daily to monthly and take a percent change across months, and either the recording started partway through the first month or data is only available partway through the last month. Then the first or last period percent change will possibly show a dramatic swing, and the user may not realize its simply an artifact of the data availability, as opposed to a truly interesting move in the underlying process. When running alot of automated aggregations the user may wish to not aggregate any partially filled periods in order to protect themselves from reaching a false conclusion about the time-series trend at the beginning or end of the series.
The text was updated successfully, but these errors were encountered: