Skip to content

REF: simplify operating-columnwise dispatch #40256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 16, 2021

Conversation

jbrockmendel
Copy link
Member

After this we have one usage of split_and_operate left and 2 places where we can almost use @maybe_split. In both cases I'm still trying to untangle things.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase after we merged the other one, but have some questions here

@jreback jreback added the Refactor Internal refactoring of code label Mar 8, 2021
@jbrockmendel
Copy link
Member Author

rebase after we merged the other one, but have some questions here

rebasd + green. questions?


return self.split_and_operate(None, f, False)
@maybe_split
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find using a decorator & having _split very confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah i find split_and_operate (the non-decorator alternative) confusing. maybe it can be simplified now that putmask no longer uses it (so one or more of its args can be removed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i can re-write split_and_operate to be basically just the if self.ndim == 2 and self.shape[0] != 1 part of maybe_split, but then every method that is decorated in this PR looks like:

def foo(self, *args, **kwargs):
    if self.ndim == 2 and self.shape[0] != 1:
       return self._split_and_operate(Block.foo, *args, **kwargs)
    [...]

and avoiding that repetition seems like exactly the use case for a decorator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried this out and it gets more verbose bc we have to write out *args/**kwargs each time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing these approaches just makes this more confusing. pick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will update shortly.

else:
# operate column-by-column
res_blocks = []
nbs = self._split()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can remove _split entirely and just use the decorator, I think that might be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trouble is we also use it in BlockManager.get_bool_data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, i think i may now understand the confusion.

is the confusing thing "why are we iterating over self._split() here instead of using the decorator? i.e. why not use the decorator everywhere?"?

if so, the reason is bc in this method, the conditions under which we want to split are slightly different from the conditions checked in the decorator. im hopeful we can get some further simplification, but dont want to get it by jamming too much complexity into maybe_split (which is what got split_and_operate confusing in the first place)

@jbrockmendel
Copy link
Member Author

updated to re-write and re-use split_and_operate

@jreback
Copy link
Contributor

jreback commented Mar 16, 2021

rebase, but need to relook

@jbrockmendel
Copy link
Member Author

rebased, green ex two issues on master being addressed elsewhere

@jreback jreback added this to the 1.3 milestone Mar 16, 2021
@jreback
Copy link
Contributor

jreback commented Mar 16, 2021

ok improves on existing (tangled code), splitting block ops not very clear where its happening, but I am sure future PRs can help address.

@jreback jreback merged commit 126bb92 into pandas-dev:master Mar 16, 2021
@jbrockmendel jbrockmendel deleted the ref-split_and_operate-2 branch March 16, 2021 17:03
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants