Skip to content

Commit fe17818

Browse files
hkhojastehjorisvandenbosschemroeschke
authored
BUG: .rolling() returns incorrect values when ts index is not nano seconds (pandas-dev#55173)
* Fix rolling microseconds for sum * - adding a test for rounding sum * Update pandas/tests/window/test_rolling.py Co-authored-by: Joris Van den Bossche <[email protected]> * Update test_rolling.py the df variable name fixed in the test * Reverted version varialbe in doc/source/conf.py * Units generalised to us/ms/s and test parameterised * Rolling max tests added, related to pandas-dev#55026 * whatsnew note for 2.1.2 added * Update doc/source/whatsnew/v2.1.2.rst Co-authored-by: Matthew Roeschke <[email protected]> * UTC timezone for _index_array of rolling * Validating tz-aware data Data conversion removed Tests merged * Conversion replaced by Timedelta.as_unit * fixes for failing tests * update whatsnew * type checking --------- Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>
1 parent 074ab2f commit fe17818

File tree

3 files changed

+42
-1
lines changed

3 files changed

+42
-1
lines changed

doc/source/whatsnew/v2.1.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ including other versions of pandas.
1414
Fixed regressions
1515
~~~~~~~~~~~~~~~~~
1616
- Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
17+
- Fixed regression in :meth:`~DataFrame.rolling` where non-nanosecond index or ``on`` column would produce incorrect results (:issue:`55026`, :issue:`55106`, :issue:`55299`)
1718
- Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
1819
- Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
1920
- Fixed regression in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` where if the option ``compute.use_numba`` was set to True, groupby methods not supported by the numba engine would raise a ``TypeError`` (:issue:`55520`)

pandas/core/window/rolling.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121

2222
from pandas._libs.tslibs import (
2323
BaseOffset,
24+
Timedelta,
2425
to_offset,
2526
)
2627
import pandas._libs.window.aggregations as window_aggregations
@@ -112,6 +113,8 @@
112113
from pandas.core.generic import NDFrame
113114
from pandas.core.groupby.ops import BaseGrouper
114115

116+
from pandas.core.arrays.datetimelike import dtype_to_unit
117+
115118

116119
class BaseWindow(SelectionMixin):
117120
"""Provides utilities for performing windowing operations."""
@@ -1887,7 +1890,12 @@ def _validate(self):
18871890
self._on.freq.nanos / self._on.freq.n
18881891
)
18891892
else:
1890-
self._win_freq_i8 = freq.nanos
1893+
try:
1894+
unit = dtype_to_unit(self._on.dtype) # type: ignore[arg-type]
1895+
except TypeError:
1896+
# if not a datetime dtype, eg for empty dataframes
1897+
unit = "ns"
1898+
self._win_freq_i8 = Timedelta(freq.nanos).as_unit(unit)._value
18911899

18921900
# min_periods must be an integer
18931901
if self.min_periods is None:

pandas/tests/window/test_rolling.py

+32
Original file line numberDiff line numberDiff line change
@@ -1950,3 +1950,35 @@ def test_numeric_only_corr_cov_series(kernel, use_arg, numeric_only, dtype):
19501950
op2 = getattr(rolling2, kernel)
19511951
expected = op2(*arg2, numeric_only=numeric_only)
19521952
tm.assert_series_equal(result, expected)
1953+
1954+
1955+
@pytest.mark.parametrize("unit", ["s", "ms", "us", "ns"])
1956+
@pytest.mark.parametrize("tz", [None, "UTC", "Europe/Prague"])
1957+
def test_rolling_timedelta_window_non_nanoseconds(unit, tz):
1958+
# Test Sum, GH#55106
1959+
df_time = DataFrame(
1960+
{"A": range(5)}, index=date_range("2013-01-01", freq="1s", periods=5, tz=tz)
1961+
)
1962+
sum_in_nanosecs = df_time.rolling("1s").sum()
1963+
# microseconds / milliseconds should not break the correct rolling
1964+
df_time.index = df_time.index.as_unit(unit)
1965+
sum_in_microsecs = df_time.rolling("1s").sum()
1966+
sum_in_microsecs.index = sum_in_microsecs.index.as_unit("ns")
1967+
tm.assert_frame_equal(sum_in_nanosecs, sum_in_microsecs)
1968+
1969+
# Test max, GH#55026
1970+
ref_dates = date_range("2023-01-01", "2023-01-10", unit="ns", tz=tz)
1971+
ref_series = Series(0, index=ref_dates)
1972+
ref_series.iloc[0] = 1
1973+
ref_max_series = ref_series.rolling(Timedelta(days=4)).max()
1974+
1975+
dates = date_range("2023-01-01", "2023-01-10", unit=unit, tz=tz)
1976+
series = Series(0, index=dates)
1977+
series.iloc[0] = 1
1978+
max_series = series.rolling(Timedelta(days=4)).max()
1979+
1980+
ref_df = DataFrame(ref_max_series)
1981+
df = DataFrame(max_series)
1982+
df.index = df.index.as_unit("ns")
1983+
1984+
tm.assert_frame_equal(ref_df, df)

0 commit comments

Comments
 (0)