-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
current implementation of DataFrame.apply where passed function has side effects is a real "gotcha" #10222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
there is no reason I have ever seen to actually modify something in an apply. that said, you are welcome to try to fix this if you want. It is doable. |
Thanks, I will look into it and see if it something I can tackle |
dupe of #2936 |
Just stumbled upon this. I don't mean to sound harsh below but I must vocalize a bit my suprise ;) Given the popularity of the pandas library, we can assume that this 'feature' will seriously impact scientific results and business decisions if the apply function does transform data (twice) and it goes unnoticed - easily possible in settings where many groups are formed in big data sets. As for possible improvements:
I hope the person knowing this piece of code in and out reads this. Greetings! |
@mpschr well, a pull-request to do either/both would be great! (both are not very difficult). |
From the documentation for DataFrame.apply:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
I am sure there are well-thought out reasons why it is currently implemented this way, but this behavior can be a real "gotcha". I grant that having side effects inside an apply function is not good standard practice, but I would argue that there are times when it is a good solution to a problem. ( I can elaborate on this if needed). Thankfully this behavior is documented, but I think it is reasonable to expect that most users will not always be mindful of secondary notes that exist throughout the documentation, and the source of problems caused by this behavior is not at all obvious.
While I don't understand the engineering issues regarding the fast/slow path, I would suggest that some better solution to the problem be introduced.
The text was updated successfully, but these errors were encountered: