-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
QST: FutureWarning: Resampling with a PeriodIndex is deprecated, how to resample now? #57033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just hit this too. import pandas as pd
s = pd.Series(1, index=pd.period_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
s.resample("1h").ffill() This would create a series including ALL hours of 2024-01-02. If, instead, we first convert to import pandas as pd
s = pd.Series(1, index=pd.period_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
d.index = d.index.to_timestamp()
s.resample("1h").ffill() The output will only contain 1 hour of 2024-01-02. import pandas as pd
s = pd.Series(1, index=pd.date_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
s.resample("1h").ffill() Will there be no way to obtain the old behaviour of |
I agree. Now you'll need to do reindexing manually, while with periodIndex this was a one-liner. import pandas as pd
# some sample data
data = {2023: 1, 2024: 2}
df = pd.DataFrame(list(data.values()), index=pd.PeriodIndex(data.keys(), freq="Y"))
# Old style resampling, just a one-liner
old_style_resampling = df.resample("M").ffill()
print(old_style_resampling)
print(type(old_style_resampling.iloc[0][0]))
# Convert index to DatetimeIndex
df.index = pd.to_datetime(df.index.start_time)
last_date = df.index[-1] + pd.offsets.YearEnd()
df_extended = df.reindex(
df.index.union(pd.date_range(start=df.index[-1], end=last_date, freq="D"))
).ffill()
new_style_resampling = df_extended.resample("ME").ffill()
print(new_style_resampling)
print(type(new_style_resampling.iloc[0][0])) I also opt for keeping the periodIndex resampling. |
Related to my comments in #56588, I think that this is another example where Period is being deprecated too fast without a clear replacement in mind. |
Are all the relevant cases about upsampling and never downsampling? A big part of the motivation for deprecating was that PeriodIndexResampler._downsample is deeply broken in a way that didn't seem worth fixing. Potentially we could just deprecate downsampling and not upsampling? |
The upsampling example is just the one where it's very obvious what will be missing when periodindex resampling won't work any more. When downsampling would not work anymore I would have to convert the index, downsample and convert the index back again. Does not sound very compelling. The period index resampling (up and down) is very convenient when one has to combine different data sources in days, months, quarters and years. I can't remember a project where I did not use period resampling. The convenience was always an argument to use pandas instead of other libraries like polars where one has to handle all the conversions yourself. From my point of view the PeriodIndex was always one of the great things about Pandas. I have very limited experience with Pandas internals, so I don't understand how downsampling can be deeply broken so that it's not worth fixing when "just" converting to a datetime index would fix it? Can't the datetime indexing be used internally to fix it? |
I agree with @andreas-wolf. |
Any news about this issue? The deprecation was due this #53481, correct @jbrockmendel ?
I'm naively changing the lines "return self.asfreq()" to "return super()._downsample(how, **kwargs)", delivering the responsability to DatetimeIndexResampler._downsample. |
I don't have the bandwidth to give you a thorough answer. What I can tell you is that there are no plans to enforce this deprecation in 3.0. |
Is there an option to not deprecate resample with PeriodIndex? I think it's possible to fix the example in #53481 doing what I wrote above: if is_subperiod(ax.freq, self.freq):
# Downsampling
return self._groupby_and_aggregate(how, **kwargs)
elif is_superperiod(ax.freq, self.freq):
if how == "ohlc":
# GH #13083
# upsampling to subperiods is handled as an asfreq, which works
# for pure aggregating/reducing methods
# OHLC reduces along the time dimension, but creates multiple
# values for each period -> handle by _groupby_and_aggregate()
return self._groupby_and_aggregate(how)
return super()._downsample(how, **kwargs) #fixed here, it was return self.asfreq()
elif ax.freq == self.freq:
return super()._downsample(how, **kwargs) #fixed here, it was return self.asfreq()
raise IncompatibleFrequency(
f"Frequency {ax.freq} cannot be resampled to {self.freq}, "
"as they are not sub or super periods"
) About #58021 (comment):
if isinstance(obj.index, PeriodIndex):
obj.index = PeriodIndex(obj.index, freq=self.freq)
else:
obj.index = obj.index._with_freq(self.freq)
def test_monthly_convention_span(self):
rng = period_range("2000-01", periods=3, freq="M")
ts = Series(np.arange(3), index=rng)
# hacky way to get same thing
exp_index = period_range("2000-01-01", "2000-03-31", freq="D")
expected = ts.asfreq("D", how="start").reindex(exp_index)
expected = expected.ffill()
result = ts.resample("D").ffill()
tm.assert_series_equal(result, expected) we have AssertionError: Attributes of Series are different
Attribute "dtype" are different
[left]: int32
[right]: float64 which I think is expected (after reindex we have many NaN, changing the type to float64 and this continues after ffill) And, using the fix that I'm suggesting, we will have: FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[mean] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[sem] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[median] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[var] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[std] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[ohlc] - AttributeError: 'DataFrame' object has no attribute 'dtype'. Did you mean: 'dtypes'?
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[quantile] - AssertionError: Attributes of Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[count] - AssertionError: Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[size] - AssertionError: Series are different
FAILED test_period_index.py::TestPeriodIndex::test_resample_same_freq[nunique] - AssertionError: Series are different def test_resample_same_freq(self, resample_method):
# GH12770
series = Series(range(3), index=period_range(start="2000", periods=3, freq="M"))
expected = series
result = getattr(series.resample("M"), resample_method)()
tm.assert_series_equal(result, expected) is expecting the wrong behavior pointed out by #53481. |
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/77862775/pandas-2-2-futurewarning-resampling-with-a-periodindex-is-deprecated
Question about pandas
Pandas version 2.2 raises a warning when using this code:
This does not work:
I have PeriodIndex all over the place and I need to resample them a lot, filling gaps with ffill.
How to do this with Pandas 2.2?
The text was updated successfully, but these errors were encountered: