-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: incorrect EA casting in groubpy.agg #38254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
99073b8
1ed9d8a
21b4f0f
56b42bb
e01487f
cf5bd53
02bffdb
f950c75
0544c8b
5a30b93
3410102
b481049
2047f3c
b96835f
7bc78eb
643fab6
eddb089
2572dda
5590ad7
5b73850
896a97a
5e611b2
961304f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -432,10 +432,13 @@ def test_agg_over_numpy_arrays(): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_agg_tzaware_non_datetime_result(): | ||
@pytest.mark.parametrize("as_period", [True, False]) | ||
def test_agg_tzaware_non_datetime_result(as_period): | ||
# discussed in GH#29589, fixed in GH#29641, operating on tzaware values | ||
# with function that is not dtype-preserving | ||
dti = pd.date_range("2012-01-01", periods=4, tz="UTC") | ||
if as_period: | ||
dti = dti.tz_localize(None).to_period("D") | ||
df = DataFrame({"a": [0, 0, 1, 1], "b": dti}) | ||
gb = df.groupby("a") | ||
|
||
|
@@ -454,6 +457,8 @@ def test_agg_tzaware_non_datetime_result(): | |
result = gb["b"].agg(lambda x: x.iloc[-1] - x.iloc[0]) | ||
expected = Series([pd.Timedelta(days=1), pd.Timedelta(days=1)], name="b") | ||
expected.index.name = "a" | ||
if as_period: | ||
expected = expected.astype(object) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the correct expected result? When subtracting Periods, you get offset objects, not Timedelta objects? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point. probably not great that tm.assert_series_equal can't tell the difference either |
||
tm.assert_series_equal(result, expected) | ||
|
||
|
||
|
@@ -627,7 +632,8 @@ def test_groupby_agg_err_catching(err_cls): | |
{"id1": [0, 0, 0, 1, 1], "id2": [0, 1, 0, 1, 1], "decimals": DecimalArray(data)} | ||
) | ||
|
||
expected = Series(to_decimal([data[0], data[3]])) | ||
# GH#38254 until _from_sequence is strict, we cannot reliably cast agg results | ||
expected = Series(to_decimal([data[0], data[3]])).astype(object) | ||
|
||
def weird_func(x): | ||
# weird function that raise something other than TypeError or IndexError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside
maybe_cast_result
, we already have a special case check for not casting back when having datetime or categorical data. You could add periods to that check, and then this line can be left as is, while still fixing the period issue.Then we can add the period test, without needing to change the decimal tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually what I find most objectionable about maybe_cast_result. It is a kludge that special-cases pandas-internal EAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as I said below, it would at least be a consistent "kludge"... ;)
Not calling the method here that was exactly written to be called here, it not making it any less kludgy IMO.
So I think it would be good to add the period test case here, and then focus on #38315 to properly fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only remaining usage of that kludge, so getting rid of it would certainly make it less kludgy.
It's going to be a while before 1.3, so I'm fine sticking a pin in this to see if a better solution is implemented between now and then. If it doesn't, we should do this for 1.3.