-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: simplify operating-columnwise dispatch #40256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: simplify operating-columnwise dispatch #40256
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase after we merged the other one, but have some questions here
rebasd + green. questions? |
|
||
return self.split_and_operate(None, f, False) | ||
@maybe_split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find using a decorator & having _split very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah i find split_and_operate (the non-decorator alternative) confusing. maybe it can be simplified now that putmask no longer uses it (so one or more of its args can be removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i can re-write split_and_operate to be basically just the if self.ndim == 2 and self.shape[0] != 1
part of maybe_split
, but then every method that is decorated in this PR looks like:
def foo(self, *args, **kwargs):
if self.ndim == 2 and self.shape[0] != 1:
return self._split_and_operate(Block.foo, *args, **kwargs)
[...]
and avoiding that repetition seems like exactly the use case for a decorator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried this out and it gets more verbose bc we have to write out *args/**kwargs each time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing these approaches just makes this more confusing. pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will update shortly.
pandas/core/internals/blocks.py
Outdated
else: | ||
# operate column-by-column | ||
res_blocks = [] | ||
nbs = self._split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can remove _split entirely and just use the decorator, I think that might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trouble is we also use it in BlockManager.get_bool_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, i think i may now understand the confusion.
is the confusing thing "why are we iterating over self._split() here instead of using the decorator? i.e. why not use the decorator everywhere?"?
if so, the reason is bc in this method, the conditions under which we want to split are slightly different from the conditions checked in the decorator. im hopeful we can get some further simplification, but dont want to get it by jamming too much complexity into maybe_split (which is what got split_and_operate confusing in the first place)
updated to re-write and re-use split_and_operate |
rebase, but need to relook |
rebased, green ex two issues on master being addressed elsewhere |
ok improves on existing (tangled code), splitting block ops not very clear where its happening, but I am sure future PRs can help address. |
After this we have one usage of split_and_operate left and 2 places where we can almost use
@maybe_split
. In both cases I'm still trying to untangle things.