Skip to content

Doc: Add warning to treat group chunks as immutable #19175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pdpark
Copy link

@pdpark pdpark commented Jan 10, 2018

@pdpark
Copy link
Author

pdpark commented Jan 10, 2018

New clean pull request with only the changes for this issue.

@@ -944,13 +944,17 @@ that is itself a series, and possibly upcast the result to a DataFrame:
So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in
the output as well as set the indices.

.. warning::
.. warnings::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its warning

first group to decide whether it can take a fast or slow code
path. This can lead to unexpected behavior if func has
side-effects, as they will take effect twice for the first
group.

* Apply should not perform in-place operations on the group chunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply functions

@codecov
Copy link

codecov bot commented Jan 12, 2018

Codecov Report

Merging #19175 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19175   +/-   ##
=======================================
  Coverage   92.19%   92.19%           
=======================================
  Files         169      169           
  Lines       50959    50959           
=======================================
  Hits        46980    46980           
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 42.29% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 913f71f...b98e0ce. Read the comment docs.

@pdpark
Copy link
Author

pdpark commented Jan 12, 2018

Requested changes made.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

@pandas-dev/pandas-core if someone wants to have a look here

@jorisvandenbossche
Copy link
Member

As discussed in one of the other issues (#14927), I think we should rather deprecate the special cased behaviour when the group is mutated, than documenting it.

@toobaz
Copy link
Member

toobaz commented Jul 8, 2018

As discussed in one of the other issues (#14927), I think we should rather deprecate the special cased behaviour when the group is mutated, than documenting it.

If you mean "the return value of apply() should not depend on whether the data was mutated, so the current special case is not worth documenting", I agree.

But my understanding is that the actual changes to the original DataFrame are unpredictable/not guaranteed if you mutate the groups, and that this PR warns about that.

Did I misunderstand? If not, maybe the warning can be made more specific.

@jorisvandenbossche
Copy link
Member

But my understanding is that the actual changes to the original DataFrame are unpredictable/not guaranteed if you mutate the groups, and that this PR warns about that.

Ah, but I was not talking about changes to the original dataframe. Does that ever happen? (that would mean the group is a view on the original dataframe). I would like to see a code example for that.

What I thought it was about that mutating the group (inplace), so the same object is returned from the function, leads to a different behaviour of how the results are back combined together.
And it is this different behaviour that I would want to deprecate.

@toobaz
Copy link
Member

toobaz commented Jul 9, 2018

Ah, but I was not talking about changes to the original dataframe. Does that ever happen?

To be honest I have no idea. If it does not happen, we might then say (after fixing #14927 ) that mutating the chunks can be freely done, because/but it won't affect the original df.

(if it can possibly happen, we might want to keep the warning, and make it refer specifically to the original df)

@pdpark pdpark force-pushed the doc-groupby-14180 branch from 4800ea6 to b98e0ce Compare October 16, 2018 15:17
@pdpark
Copy link
Author

pdpark commented Oct 16, 2018

How does this look?

@datapythonista
Copy link
Member

@toobaz @jorisvandenbossche are you happy to merge the latest version of this, or should we close the PR instead?

@datapythonista
Copy link
Member

Closing this if it's not going to be merged, feel free to reopen if anyone is interested in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: clarify dangers of fast apply in GroupBy apply docs
6 participants