-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Preserve subclassing with groupby operations #28573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks good overall.
@@ -134,6 +134,23 @@ def func(dataf): | |||
result = df.groupby("X", squeeze=False).count() | |||
assert isinstance(result, DataFrame) | |||
|
|||
# https://github.com/pandas-dev/pandas/issues/28330 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we split these into several tests, but could be a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test change, pls merge master, ping on green.
@@ -134,6 +134,23 @@ def func(dataf): | |||
result = df.groupby("X", squeeze=False).count() | |||
assert isinstance(result, DataFrame) | |||
|
|||
# https://github.com/pandas-dev/pandas/issues/28330 | |||
# Test groupby operations on subclassed dataframes/series | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually these tests are different that the above ones, can you make a new test
cdf = tm.SubclassedDataFrame( | ||
[ | ||
{"val1": 1, "val2": 20}, | ||
{"val1": 1, "val2": 19}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls parameterize over as many functions that you can here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use reduction_func
from conftest. Probably worth adding something similar for transforming functions as well
@alkasm the fixes here look great. can you rebase and address comments about making the tests more thorough |
{"val1": 1, "val2": 20}, | ||
{"val1": 1, "val2": 19}, | ||
{"val1": 2, "val2": 27}, | ||
{"val1": 2, "val2": 12}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update to comments here
pls also merge master |
@alkasm can you rebase |
Sorry for the delay y'all. Should be able to clean up, write new tests, and rebase this week sometime. |
Due to the many changes from #28835 I found it easier to just make a new branch. Here's my current branch off master: https://github.com/alkasm/pandas/tree/alkasm/groupby-preserve-subclass (I can force push the new changes to this current branch, or just open a new PR if it's more helpful). However there's a few issues I'm running into getting this to work in all cases; and I'm having trouble locating the source of the issues. Borrowing from import pandas as pd
import numpy as np
from pandas.util.testing import SubclassedSeries, SubclassedDataFrame
from pandas.tests.groupby.conftest import reduction_kernels
objs = [
SubclassedDataFrame(
dict(a=[0, 0, 0, 1, 1, 1], b=range(6)), index=["A", "B", "C", "D", "E", "F"]
),
SubclassedSeries([0, 0, 0, 1, 1, 1], index=["A", "B", "C", "D", "E", "F"]),
]
for obj in objs:
g = obj.groupby(np.repeat([0, 1], 3))
for k in reduction_kernels:
if k in ("ngroup", "size"):
continue
args = {"nth": [0], "quantile": [0.5]}.get(k, [])
xform = g.transform(k, *args)
agg = g.agg(k, *args)
if not (type(obj) == type(xform) == type(agg)):
print(f"{k:>10}, {obj.__class__.__name__}, {xform.__class__.__name__}, {agg.__class__.__name__}") and from running this script, it's seen that three of the aggregation functions don't produce the proper subclasses (i.e. somewhere they are not using
I can't really find how these functions end up constructing their result as a |
can you merge master |
Based on a quick look, I think you're best bet is in core.groupby.generic I also recently noticed that core.apply has a place where it does |
can you merge master |
@alkasm can you fix up merge conflicts? |
can you merge master and will look again |
@alkasm do you have time to fix the merge conflicts? We're doing a release this week. |
Nice PR but looks like this has gone stale. Ping if you'd like to pick back up and can address merge conflicts / comments |
I recently came across this bug/PR while working on a project that uses a subclassed I just wanted to check on the current status here -- it seems like this PR was abandoned. If this were to be picked up, would you recommend starting from here or in a fresh PR? |
@JBGreisman if interested you can merge these changes into your own local branch and push up a new PR |
@JBGreisman thank you for picking this back up! |
@alkasm No worries -- thanks for laying the groundwork! |
Preserve subclassing with groupby operations
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff