-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: avoid broadcasting of Series to DataFrame in ops for ArrayManager #40482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: avoid broadcasting of Series to DataFrame in ops for ArrayManager #40482
Conversation
BTW i think the array manager build can be simplified with something like |
I am running into the following issue:
while
This happens if you do eg (from |
@@ -6675,6 +6678,9 @@ def _dispatch_frame_op(self, right, func: Callable, axis: Optional[int] = None): | |||
|
|||
right = right._values | |||
|
|||
if isinstance(right, TimedeltaArray): | |||
right = right._data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this is AM-only right?
- wont the array_op call just re-wrap this?
- if not, will something like DataFrame[int] + Series[timedelta64ns] fail to raise because of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if this is a hack as the commit message suggests, then a comment would be worthwhile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #40482 (comment) for the general explanation of in which case this happens. It was not my intention to keep this, I am hoping to find a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah I saw that, thought about commenting, but decided that "last time i handled that by reshaping to 2D" wouldn't be something you'd find that helpful.
if isna(y): | ||
mask = np.zeros(x.size, dtype=bool) | ||
else: | ||
mask = notna(xrav) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just an optimization, independent of the broadcasting thing?
@@ -291,6 +294,7 @@ def na_logical_op(x: np.ndarray, y, op): | |||
y = ensure_object(y) | |||
result = libops.vec_binop(x.ravel(), y.ravel(), op) | |||
else: | |||
x = ensure_object(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're doing this both here and on L293, then might as well do it just once after L289
Do we have cases that get here with x not object dtype?
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Appears this PR has been dormant for a while so closing. Feel free to reopen when you have time to work on this PR. |
@jorisvandenbossche if you can rebase |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
xref #39772
Currently, we broadcast a Series to a DataFrame for elementwise operations in the "op(df, series)" case. This is done (I think) to be able to perform the operation block-wise (which is only implemented for block/block case, and not for block/array).
But, when using the ArrayManager, this broadcasting is 1) not necessary (since we perform column-by-column anyway) and 2) costly.