-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: flex op with DataFrame, Series and ea vs ndarray #34277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This overlaps now with #34312 |
rvalues = rvalues.reshape(1, -1) | ||
|
||
rvalues = np.broadcast_to(rvalues, frame.shape) | ||
return type(frame)(rvalues, index=frame.index, columns=frame.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to turn this into a DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to re-implement the ensure-shape-shape-values code in ops.blockwise (see the first commit, which did it that way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be the same shape? The block op or array op should be able to handle this broadcasting automatically themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndarray handles broadcasting, but DTA/TDA dont. I checked with seberg a few months ago who said there isnt a perf penalty to operating on the broadcasted ndarray.
ive got plans to make it so we dont need to wrap the ndarray in a DataFrame (which we actually do in _align_method_FRAME too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but DTA/TDA dont.
You mean the "hidden" 2D version of DTA/TDA? (EAs in general handle broadcasting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EAs in general handle broadcasting
im not sure thats accurate in general (in particular the op(ea_len_1, arr_len_n)
), but that can be discussed separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in particular the op(ea_len_1, arr_len_n)
Ah, yes, indeed, that's a broadcasting that will not be generally supported (we probably should though, worth to open an issue about that).
Now, I was thinking about the op(frame_with_EA_column, array_len_n)
case, where the array is aligned on the columns of the frame. Which means that each EA (1 column) gets a scalar, I think (op(EA, scalar)
?
Or, we might want to change this to an array of len 1 in case the scalar loses information? (which can be the case where the 1D array-like is an EA / Series[EA] instead of a numpy array)
I could also port your test to my PR, so can you try to explain why this fix would be better? (it might be doing more now, but considering only what is required for fixing the original issue, i.e. the interval flex op test you added) |
please do! |
Only if you agree it's a good fix (I don't fully understand the rationale of the change in this PR, so I can't really judge at this time which fix is "better") |
I agree that #34312 is a valid fix to a legitimate regression, and is well-written, minimally-invasive. So I would have no problem with merging it on green (especially if the test is ported). It is especially worthwhile if we don't expect this to get merged for 1.1 The advantage of the approach in this PR is that it continues operating blockwise in cases where #34312 does not. It also gets a bunch of code out of _combine_series_frame, which will allow for streamlining in upcoming pass. |
OK, I added the test in the other PR, then we can continue the discussion here regardless of the regression fix. |
docbuild fail looks unrelated |
It's failing everywhere -> #34306 |
green |
thanks @jbrockmendel |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I have a couple ideas in mind that might make the edit in blocks here unnecessary, but want to profile them first, so will do that cleanup in a refactor following the bugfix.Updated with a much cleaner implementation, it also de-special-cases one of our timedelta64 tests.