Skip to content

Fixed metadata propagation in Dataframe.apply (issue #28283) #44041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 29, 2021

Conversation

moha-rk
Copy link
Contributor

@moha-rk moha-rk commented Oct 15, 2021

Co-authored-by: Mohamad Rkein [email protected]
Co-authored-by: Rafael Rodrigues [email protected]

In reference to #28283

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@jreback jreback added the Apply Apply, Aggregate, Transform, Map label Oct 16, 2021
@jreback jreback added this to the 1.4 milestone Oct 16, 2021
Co-authored-by: Mohamad Rkein <[email protected]>
Co-authored-by: Rafael Rodrigues <[email protected]>
@jreback
Copy link
Contributor

jreback commented Oct 17, 2021

this appears to have xpassed some previously xfailed tests. @moha-rk you may need to adjust some other tests (as they maybe indirectly calling .apply)

@moha-rk
Copy link
Contributor Author

moha-rk commented Oct 24, 2021

this appears to have xpassed some previously xfailed tests. @moha-rk you may need to adjust some other tests (as they maybe indirectly calling .apply)

Thank you for your feedback, @jreback. I didn't notice the 'F's that appeared after my changes.

The method DataFrame.nunique() calls the method DataFrame.apply() on its implementation (pandas/core/frame.py), applying Series.nunique() (pandas/core/base.py) on each Series (column), so it already handles the metadata propagation. I don't know if that's a problem, but the "method" attribute used on the __finalize__ call will stay as apply.

As for "mode()", the same thing happens when it's called by a DataFrame (pandas/core/frame.py). However, there's a catch. When "mode()" is called directly from a Series (pandas/core/base.py), this apply call doesn't exist. This is a problem because "Series.mode()" returns a Series containing the mode(s). I created a test with the "not_implemented_mark" to observe this behaviour. I don't know if calling finalize on the Series.mode() function is a good idea, since this would cause it to be called one time for each column on the DataFrame call. Do you have any suggestion? For now, I'll leave it with the said mark.

Finally, the last method that changed was "DataFrame.transform()". This function creates a "FrameApply" object, and calls its function ".transform()". When following the series of function calls, we end on "transform_str_or_callable()" (pandas/core/apply.py), which uses the "DataFrame.apply()" method at its end, propagating the attributes. I removed the "not_implemented_mark".

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also merge master

@jreback jreback merged commit e2d0288 into pandas-dev:master Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants