-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: fix case all-NaN/numeric object column in groupby #39655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6e1514a
62cc348
9d0ea88
383bf50
a21c4ed
9ae380a
7fb7108
baae4a3
277c0fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -392,3 +392,34 @@ def test_resample_groupby_agg(): | |
result = resampled.agg({"num": "sum"}) | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize("consolidate", [True, False]) | ||
def test_resample_groupby_agg_object_dtype_all_nan(consolidate): | ||
# https://github.com/pandas-dev/pandas/issues/39329 | ||
|
||
dates = pd.date_range("2020-01-01", periods=15, freq="D") | ||
df1 = DataFrame({"key": "A", "date": dates, "col1": range(15), "col_object": "val"}) | ||
df2 = DataFrame({"key": "B", "date": dates, "col1": range(15)}) | ||
df = pd.concat([df1, df2], ignore_index=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does the consolidation status of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't matter in this case. The dataframe that gets resampled (after the first groupby) seems to be consolidated anyway (I assume that creating the sub-dataframe to resample might incur consolidation?). But parametrized the test to cover both cases to be sure. |
||
if consolidate: | ||
df = df._consolidate() | ||
|
||
result = df.groupby(["key"]).resample("W", on="date").min() | ||
idx = pd.MultiIndex.from_arrays( | ||
[ | ||
["A"] * 3 + ["B"] * 3, | ||
pd.to_datetime(["2020-01-05", "2020-01-12", "2020-01-19"] * 2), | ||
], | ||
names=["key", "date"], | ||
) | ||
expected = DataFrame( | ||
{ | ||
"key": ["A"] * 3 + ["B"] * 3, | ||
"date": pd.to_datetime(["2020-01-01", "2020-01-06", "2020-01-13"] * 2), | ||
"col1": [0, 5, 12] * 2, | ||
"col_object": ["val"] * 3 + [np.nan] * 3, | ||
}, | ||
index=idx, | ||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this can you now just define as_array properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of simply consolidate first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already consolidated above, but if you have multiple dtypes, that doesn't help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, as the diff seems to be larger (because I remove the assert above and moved the comment), but basically the only code addition is this:
to handle the case of multiple blocks that need to be converted into an array (while now this raised an AssertionError because of
assert len(mgr.blocks) == 1