Skip to content

BUG fixing module first() and DateOffset #52487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Bug fixes
- Bug in :meth:`ArrowDtype.__from_arrow__` not respecting if dtype is explicitly given (:issue:`52533`)
- Bug in :meth:`DataFrame.max` and related casting different :class:`Timestamp` resolutions always to nanoseconds (:issue:`52524`)
- Bug in :meth:`Series.describe` not returning :class:`ArrowDtype` with ``pyarrow.float64`` type with numeric data (:issue:`52427`)
- Fixed bug in :func:`first` when used with a :class:`DateOffset` (:issue:`45908`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no pd.first. Maybe :meth:DataFrame.first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change done

- Fixed bug in :func:`merge` when merging with ``ArrowDtype`` one one and a NumPy dtype on the other side (:issue:`52406`)
- Fixed segfault in :meth:`Series.to_numpy` with ``null[pyarrow]`` dtype (:issue:`52443`)

Expand Down
9 changes: 9 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
Period,
Tick,
Timestamp,
offsets,
to_offset,
)
from pandas._typing import (
Expand Down Expand Up @@ -9070,7 +9071,15 @@ def first(self, offset) -> Self:
if len(self.index) == 0:
return self.copy(deep=False)

if isinstance(offset, offsets.DateOffset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the offset = to_offset(offset) below make this check unnecessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that doing this check after to_offset would make it impossible to distinguish between MonthEnd, as that would also satisfy isinstance(offset, DateOffset)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they need to be handled differently because something like DateOffset(days=15) would always return True for if not isinstance(offset, Tick) and offset.is_on_offset(self.index[0]), and so would be shifted backward by offset.base (this is why in the linked issues, the result only has a single row for DateOffset input)

end_date = end = self.index[0] + offset
if end_date in self.index:
end = self.index.searchsorted(end_date, side="left")
return self.iloc[:end]
return self.loc[:end]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of this looks really similar to what we have below. can any of this be de-duplicated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


offset = to_offset(offset)

if not isinstance(offset, Tick) and offset.is_on_offset(self.index[0]):
# GH#29623 if first value is end of period, remove offset with n = 1
# before adding the real offset
Expand Down
48 changes: 48 additions & 0 deletions pandas/tests/frame/methods/test_first_and_last.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
Note: includes tests for `last`
"""
import numpy as np
import pytest

import pandas as pd
Expand Down Expand Up @@ -95,3 +96,50 @@ def test_empty_not_input(self, func):
result = getattr(df, func)(offset=1)
tm.assert_frame_equal(df, result)
assert df is not result

@pytest.mark.parametrize("start, periods", [("2010-03-31", 1), ("2010-03-30", 2)])
def test_last_day_of_months_with_date_offset(self, frame_or_series, start, periods):
x = frame_or_series([1] * 100, index=pd.date_range(start, periods=100))
result = x.first(pd.DateOffset(days=periods))
expected = frame_or_series(
[1] * periods, index=pd.date_range(start, periods=periods)
)
tm.assert_equal(result, expected)

def test_date_offset_multiple_days(self, frame_or_series):
x = frame_or_series([1] * 100, index=pd.date_range("2010-03-31", periods=100))
result = x.first(pd.DateOffset(days=2))
expected = frame_or_series(
[1] * 2, index=pd.date_range("2010-03-31", "2010-04-01")
)
tm.assert_equal(result, expected)

def test_first_with_date_offset(self):
# GH#51284
i = pd.to_datetime(["2018-04-09", "2018-04-10", "2018-04-11", "2018-04-12"])
x = DataFrame({"A": [1, 2, 3, 4]}, index=i)
result = x.first(pd.DateOffset(days=2))
expected = DataFrame(
{"A": [1, 2]}, index=pd.to_datetime(["2018-04-09", "2018-04-10"])
)
tm.assert_equal(result, expected)

def test_date_offset_15_days(self):
# GH#45908
i = pd.date_range("2018-04-09", periods=30, freq="2D")
x = DataFrame({"A": np.arange(30)}, index=i)
result = x.first(pd.DateOffset(days=15))
i2 = pd.date_range("2018-04-09", periods=8, freq="2D")
expected = DataFrame({"A": np.arange(8)}, index=i2)
tm.assert_equal(result, expected)

def test_first_with_date_offset_months(self, frame_or_series):
periods = 40
x = frame_or_series(
[1] * periods, index=pd.date_range("2010-03-31", periods=periods)
)
result = x.first(pd.DateOffset(months=1))
expected = frame_or_series(
[1] * 30, index=pd.date_range("2010-03-31", periods=30)
)
tm.assert_equal(result, expected)