-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: block-wise arithmetic for frame-with-frame #32779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did numpy recently change something that would make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookimg pretty good. do you have benchmarks?
if you can create some helper functions as indicated would be great.
pandas/core/ops/__init__.py
Outdated
nbs = blk._split_op_result(res_values) | ||
res_blks.extend(nbs) | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments here.(general + line comments as needed)
pandas/core/ops/__init__.py
Outdated
|
||
if not isinstance(blk_vals, np.ndarray): | ||
# 1D EA | ||
assert len(locs) == 1, locs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be more clear to make for example this section (368 to 374) as a function
so the structure of what you are doing is more clear.
also this has a linked issue IIRC? (you just commented today on it, I think) |
Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-17 21:35:14 UTC |
Refactored the new function here so as to have one main execution path. This makes the debugging assertions clunkier, but whats left is much smoother than the previous implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems nice overall. Can you ensure we have ASVs for this (I think we do) and post the results?
pandas/core/ops/array_ops.py
Outdated
result = masked_arith_op(left, right, op) | ||
with warnings.catch_warnings(): | ||
# suppress warnings from numpy about element-wise comparison | ||
warnings.simplefilter("ignore", DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What warning are we suppressing here, and is this future-proof? If this is the usual
In [2]: np.array([1, 2]) == 'a'
/Users/taugspurger/.virtualenvs/dask-dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
#!/Users/taugspurger/Envs/dask-dev/bin/python
Out[2]: False
then in the future the result will be array([False, False])
. Is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this warning-catching was necessary to make the npdev build, pass, im going to see if i can revert it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like catching this is needed in some form, but i can move the catching so as to cast a less-wide net.
Just pushed with a some new asvs, about 200x speedup in the best-case scenario:
|
gentle ping |
Sorry for the slow follow-up, will take a look at your latest changes tomorrow! |
That indeed nicely avoids copies in some cases! Personally, I still think that with something a little bit simpler (eg the "only do block-wise in case of identical block layout"), we can reduce the complexity quite a bit while at the same time preserving the performance speedup for the majority of use cases of wide frames. |
asv_bench/benchmarks/arithmetic.py
Outdated
operator.gt, | ||
operator.ge, | ||
operator.lt, | ||
operator.le, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to test it with all those different ops?
I think what we are interested in catching with the benchmark, is the general code handling wide dataframes (how the dispatching to the actual op is done, dispathing to block/column instead of series, etc), not the actual op right? So for those aspects, they all use the same code, and testing all ops seems overkill? (it just adds to the number of benchmarks needlessly, making it harder to run the benchmark suite / interpret the results)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to test it with all those different ops?
No strong opinion here.
making it harder to run the benchmark suite
Yah, this is a hassle. Best guess is eventually we're going to land on a "throw hardware at the problem" solution, but that doesn't help the local-running troubles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion here.
Then let's keep only 2 of them or so, I would say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about four: one comparison, one logical, floordiv (for which we have special handling), and one other arithmetic
warn = PerformanceWarning if box_with_array is not pd.DataFrame else None | ||
warn = None | ||
if box_with_array is not pd.DataFrame or tz_naive_fixture is None: | ||
warn = PerformanceWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this changed? We do / do not raise a warning on the array-level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do a warning when operating op(arr, length_1_object_array)
, which turns out to be the cases described here (in reverse)
Expanding on the last paragraph of my comment above: Personally, I still think that with something a little bit simpler (eg the "only do block-wise in case of identical block layout"), we can reduce the complexity quite a bit while at the same time preserving the block-wise performance speedup for the majority of use cases of wide frames (in the assumption that wide dataframes typically have uniform dtypes). Now, if both Jeff and Tom are fine with the current PR instead of my reduced proposal, I am not going to further block it. I made my stance clear, but if the majority prefers otherwise, I go with that. |
@jorisvandenbossche thanks for new round of comments. Looking into the non-slicing case you pointed out now. |
Updated to avoid copy in the case Joris identified. gentle ping @jreback @TomAugspurger |
I haven't been able to stay up to date with this. I wouldn't wait around for me. |
Travis is green, just hasnt updated the icon here |
rebased+green |
pandas/core/internals/managers.py
Outdated
blocks = [] | ||
for i, ml in enumerate(slobj): | ||
nb = blk.getitem_block([ml], new_mgr_locs=i) | ||
print(nb.shape, np.values.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra print here, canyou use a list-comprehension here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated+green
elif only_slice: | ||
# GH#33597 slice instead of take, so we get | ||
# views instead of copies | ||
for i, ml in zip(taker, mgr_locs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a list comprehesnion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above on 1317 i can, here it is less clean
pandas/core/ops/blockwise.py
Outdated
# else: | ||
# assert res_values.shape == lvals.shape, (res_values.shape, lvals.shape) | ||
|
||
for nb in nbs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this here
return new_mgr | ||
|
||
|
||
def _reset_block_mgr_locs(nbs: List["Block"], locs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though I would actually return the List['Block'] here as conceptually simpler to grok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though I would actually return the List['Block'] here as conceptually simpler to grok
that can give the misleading impression that args are not being altered in place.
ok thanks @jbrockmendel let's see how this goes |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff