Skip to content

BUG: first("1M") returning two months when first day is last day of month #38331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 12, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,7 @@ Datetimelike
- Bug in :meth:`Series.isin` with ``datetime64[ns]`` dtype and :meth:`.DatetimeIndex.isin` failing to consider timezone-aware and timezone-naive datetimes as always different (:issue:`35728`)
- Bug in :meth:`Series.isin` with ``PeriodDtype`` dtype and :meth:`PeriodIndex.isin` failing to consider arguments with different ``PeriodDtype`` as always different (:issue:`37528`)
- Bug in :class:`Period` constructor now correctly handles nanoseconds in the ``value`` argument (:issue:`34621` and :issue:`17053`)
- Bug in :meth:`DataFrame.first` and :meth:`Series.first` returning two months for offset one month when first day is last calendar day (:issue:`29623`)

Timedelta
^^^^^^^^^
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8426,7 +8426,10 @@ def first(self: FrameOrSeries, offset) -> FrameOrSeries:
return self

offset = to_offset(offset)
end_date = end = self.index[0] + offset
if offset._day_opt == "end" and offset.is_on_offset(self.index[0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use rollforward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the if-else? Then no, This would work for offsets like 1M, but not for something like 10 Days. That is also the thing which lead me to putting offset._day_opt == "end" in. But we could use it like

        if offset._day_opt == "end":
            end_date = end = offset.rollforward(self.index[0])
        else:
            end_date = end = self.index[0] + offset

if this would be preferrably

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an OK solution, but ideally there should be a way to do this with only public methods.

but not for something like 10 Days

can you give an example? (possibly merits a test so i dont try to incorrectly simplify this somewhere down the line) if its only for Tick offsets, special-casing wouldn't be that bad since we already special case them a few lines down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tick works, thanks.

test_first_subset in pandas/tests/frame/methods/test_first_and_last.py covers the behavior and avoids the simplification (as I learned myself a bit earlier :))

end_date = end = self.index[0]
else:
end_date = end = self.index[0] + offset

# Tick-like, e.g. 3 weeks
if isinstance(offset, Tick):
Expand Down
12 changes: 11 additions & 1 deletion pandas/tests/frame/methods/test_first_and_last.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""
import pytest

from pandas import DataFrame
from pandas import DataFrame, bdate_range
import pandas._testing as tm


Expand Down Expand Up @@ -69,3 +69,13 @@ def test_last_subset(self, frame_or_series):

result = ts[:0].last("3M")
tm.assert_equal(result, ts[:0])

@pytest.mark.parametrize("start, periods", [("2010-03-31", 1), ("2010-03-30", 2)])
def test_first_with_first_day_last_of_month(self, frame_or_series, start, periods):
# GH#29623
x = frame_or_series([1] * 100, index=bdate_range(start, periods=100))
result = x.first("1M")
expected = frame_or_series(
[1] * periods, index=bdate_range(start, periods=periods)
)
tm.assert_equal(result, expected)