Skip to content

BUG: flex op with DataFrame, Series and ea vs ndarray #34277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 25, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 20, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I have a couple ideas in mind that might make the edit in blocks here unnecessary, but want to profile them first, so will do that cleanup in a refactor following the bugfix. Updated with a much cleaner implementation, it also de-special-cases one of our timedelta64 tests.

@jbrockmendel jbrockmendel added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels May 21, 2020
@jorisvandenbossche
Copy link
Member

This overlaps now with #34312

rvalues = rvalues.reshape(1, -1)

rvalues = np.broadcast_to(rvalues, frame.shape)
return type(frame)(rvalues, index=frame.index, columns=frame.columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to turn this into a DataFrame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to re-implement the ensure-shape-shape-values code in ops.blockwise (see the first commit, which did it that way)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be the same shape? The block op or array op should be able to handle this broadcasting automatically themselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndarray handles broadcasting, but DTA/TDA dont. I checked with seberg a few months ago who said there isnt a perf penalty to operating on the broadcasted ndarray.

ive got plans to make it so we dont need to wrap the ndarray in a DataFrame (which we actually do in _align_method_FRAME too)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but DTA/TDA dont.

You mean the "hidden" 2D version of DTA/TDA? (EAs in general handle broadcasting)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EAs in general handle broadcasting

im not sure thats accurate in general (in particular the op(ea_len_1, arr_len_n)), but that can be discussed separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in particular the op(ea_len_1, arr_len_n)

Ah, yes, indeed, that's a broadcasting that will not be generally supported (we probably should though, worth to open an issue about that).

Now, I was thinking about the op(frame_with_EA_column, array_len_n) case, where the array is aligned on the columns of the frame. Which means that each EA (1 column) gets a scalar, I think (op(EA, scalar) ?
Or, we might want to change this to an array of len 1 in case the scalar loses information? (which can be the case where the 1D array-like is an EA / Series[EA] instead of a numpy array)

@jorisvandenbossche
Copy link
Member

I could also port your test to my PR, so can you try to explain why this fix would be better? (it might be doing more now, but considering only what is required for fixing the original issue, i.e. the interval flex op test you added)

@jbrockmendel
Copy link
Member Author

I could also port your test to my PR

please do!

@jorisvandenbossche
Copy link
Member

Only if you agree it's a good fix (I don't fully understand the rationale of the change in this PR, so I can't really judge at this time which fix is "better")

@jbrockmendel
Copy link
Member Author

Only if you agree it's a good fix (I don't fully understand the rationale of the change in this PR, so I can't really judge at this time which fix is "better")

I agree that #34312 is a valid fix to a legitimate regression, and is well-written, minimally-invasive. So I would have no problem with merging it on green (especially if the test is ported). It is especially worthwhile if we don't expect this to get merged for 1.1

The advantage of the approach in this PR is that it continues operating blockwise in cases where #34312 does not. It also gets a bunch of code out of _combine_series_frame, which will allow for streamlining in upcoming pass.

@jorisvandenbossche
Copy link
Member

OK, I added the test in the other PR, then we can continue the discussion here regardless of the regression fix.

@jbrockmendel
Copy link
Member Author

docbuild fail looks unrelated

@jorisvandenbossche
Copy link
Member

docbuild fail looks unrelated

It's failing everywhere -> #34306

@jreback jreback added this to the 1.1 milestone May 22, 2020
@jbrockmendel
Copy link
Member Author

green

@jreback jreback merged commit 014d8ea into pandas-dev:master May 25, 2020
@jreback
Copy link
Contributor

jreback commented May 25, 2020

thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants