-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: standardize usage in DataFrame vs SparseDataFrame ops #28027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As mentioned in the issue: we could also leave the SparseDataFrame code alone as is (it's deprecated, so I don't think we should change behaviour, and I would personally also not put effort in cleaning it up), but still refactor the DataFrame ops (which would mean less sharing of code, for now, but I don't think that is a problem since the sparse code will be removed anyway, so then there will be no duplication anymore) |
If I understand the suggestion correctly, this would involve copy/pasting all of the ops-based code that is currently shared. Or do you have something else in mind? |
Yes, copy/paste all ops-based code that sparse needs and is currently shared with the other frame ops to the sparse module (and then we basically don't touch that code anymore until it is removed), and then you are free to refactor the ops code without needing to care about the deprecated sparse (and eg don't need to care about the small differences like #28025) (note this is just talk without looking at the actual code, I don't know if it is practically feasible. I just wouldn't care in this specific case about temporarily duplicating some code if that allows to refactor more easily the code that we are going to keep). |
Thanks for clarifying. Does the status quo for SparseDataFrame include bugfixes until it is removed? If so, that would push towards continuing to share code. |
Well, if somebody wants to do a PR for a bug fix, that is of course welcome (and non-shared code can still receive bugfixes). But if you ask me if I find it worth your time to do bug fixes for SparseDataFrame, then I would say no. Even if it would be benefiting from a generic bug fix that you do in the ops code, I don't think it is a problem that sparse wouldn't automatically get that bug fix anymore |
I think I understand the suggestion, thanks. Would you be OK with moving forward with the continue-sharing approach in this PR and in the next pass I'll try to use the approach you've described? |
If we would go with a non-sharing appraoch, then I personally think we should do it already now and remove the sparse changes here. But that is still an "if" of course. |
This PR changes behavior, but we have not established that the behavior in question was intentional.
I understand your preference. My question is if the strength of that preference would make you -1 on moving forward with the less-preferred option (this) (conditional on resolving the behavior change) |
Well, that depends on the potential behaviour change .. :)
In the issue, I asked about a code example, to be able to see what the change is in an example (and not in a theoretical explanation of the code that I can only understand by reading and understanding the code). But that is actually another reason to stop the sharing, then we wouldn't need to need to solve this question, we could just ignore the potential difference with SparseDataFrame, and use our mental energy for discussing more interesting things :-) To be clear, I don't really know what effort it would take to not share the code (I would think that, in this PR, it would mean reverting the changes in the sparse code, and copying (the original version of) some of the methods in that non-sparse code that you changed to the sparse code; but I didn't try it of course), so I do maybe a bit too lightly about it |
@jreback if you're around, thoughts? after more experimentation, im increasing convinced that a) the small behavior change is probably slightly more correct, b) it really doesn't matter, and c) it is possible to rip out a lot of the sparse ops code after implementing _construct_result |
agree on all points here; +1 on consolidating code and then simplifying ops. trying to isolate sparse via copy paste i think is just a mess and not worth any effort. |
Then again, can you try to explain (show) the behaviour change? |
|
Essentially, binops propagate the default fill value more consistently? It can probably be called a bug fix (sorry about the close and reopen). |
OK, looking at this again, I think the different @jbrockmendel can you merge master to fix the merge conflicts? @jorisvandenbossche any objections to declaring #28027 (comment) a bug? |
done |
# non-unique columns case | ||
out.columns = self.columns | ||
return out | ||
# TODO: finalize? we do for SparseDataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalize doesn't do really anything for frames, prob can't hurt though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially in the frame+frame case it isn't obvious that is correct; I'd prefer to revisit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we should finalize I think, but since it was not done right now, fine to leave it for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Any other comments @jreback?
gentle ping. This and #28268 are pseudo-blockers for implementing blockwise arithmetic. (psuedo-blocker as in not actual blockers, but will make the implementation much easier) |
@jreback if we have to choose one more to get in more the weekend productivity streak ends, this should be it |
yeah this was fine (got lost in the queue). maybe rebase @jbrockmendel to be sure. |
Did we decide on removing SparseDataFrame? |
I think there were no objections to removing SparseSeries and
SparseDataFrame for 1.0.
…On Thu, Sep 12, 2019 at 3:09 PM Joris Van den Bossche < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/frame.py
<#28027 (comment)>:
> + new_data = ops.dispatch_to_series(self, other, func)
+ return self._construct_result(other, new_data, func)
+
+ def _construct_result(self, other, result, func):
+ """
+ Compat for DataFrame/SparseDataFrame op result wrapping.
+
+ `func` is included for compat with SparseDataFrame signature, is not
+ needed here.
+ """
+ out = self._constructor(result, index=self.index, copy=False)
+ # Pin columns instead of passing to constructor for compat with
+ # non-unique columns case
+ out.columns = self.columns
+ return out
+ # TODO: finalize? we do for SparseDataFrame
In principle we should finalize I think, but since it was not done right
now, fine to leave it for later?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#28027?email_source=notifications&email_token=AAKAOIRYPBPTJHTRAAI5L5DQJKOYRA5CNFSM4INMR4GKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCESZICA#discussion_r323927166>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITR2YW7ZKPA6N4UW5TQJKOYRANCNFSM4INMR4GA>
.
|
This should be merged regardless of the decision about removing SparseDataFrame.
|
Yep, let's merge this.
…On Thu, Sep 12, 2019 at 4:05 PM jbrockmendel ***@***.***> wrote:
This should be merged regardless of the decision about removing
SparseDataFrame.
1. The edits here are worthwhile for DataFrame alone
2. if we are going to remove SparseDataFrame that will take a while,
during which this will block other ops work
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#28027?email_source=notifications&email_token=AAKAOIXECOSQ4S7J73YRKGTQJKVLJA5CNFSM4INMR4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6TIJKY#issuecomment-531006635>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIRLCGDPWMADDIA4T63QJKVLJANCNFSM4INMR4GA>
.
|
thanks @jbrockmendel @TomAugspurger you will likely need to rebase on the removals PR |
I think that after this we're not far from being able to use the base class versions of _combine_frame, _combine_match_index, _combine_match_columns, and _combine_const. That'll be a good day.
The only actual behavior changed is in SparseDataFrame._combine_match_columns, which is changed to match the other methods. See #28025