Skip to content

PERF: block-wise arithmetic for frame-with-frame #32779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
May 19, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
1697252
PERF: block-wise arithmetic for frame-with-frame
jbrockmendel Mar 17, 2020
a7764d6
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 17, 2020
30a836d
lint fixup
jbrockmendel Mar 17, 2020
3559698
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 18, 2020
4334353
troubleshoot npdev build
jbrockmendel Mar 18, 2020
cb40b0c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 24, 2020
713a776
comment
jbrockmendel Mar 25, 2020
95ef3ad
checkpoint passing
jbrockmendel Mar 25, 2020
61e5cd6
checkpoint passing
jbrockmendel Mar 25, 2020
89c3d7b
refactor
jbrockmendel Mar 25, 2020
e348e46
blackify
jbrockmendel Mar 25, 2020
519c757
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Mar 31, 2020
2b1ba18
disable assertions for perf
jbrockmendel Mar 31, 2020
53e93fc
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 1, 2020
91c86a3
asv
jbrockmendel Apr 1, 2020
2034084
whatsnew
jbrockmendel Apr 1, 2020
8aedf35
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 3, 2020
0c12d35
revert warning suppression
jbrockmendel Apr 3, 2020
9727562
Fixupm indentation
jbrockmendel Apr 3, 2020
6661dd3
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 4, 2020
42bbbf3
suppress warning
jbrockmendel Apr 4, 2020
65ab023
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 6, 2020
0d958a3
update asv
jbrockmendel Apr 6, 2020
7f91e74
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 6, 2020
56eef51
_data->_mgr
jbrockmendel Apr 6, 2020
4baea6f
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 7, 2020
41a4e7a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 7, 2020
b23144e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 9, 2020
7f24d57
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 9, 2020
ae744b7
update to use faspath constructor
jbrockmendel Apr 9, 2020
b14a98c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 10, 2020
f42c403
update import
jbrockmendel Apr 10, 2020
8a2807e
remove unused import
jbrockmendel Apr 10, 2020
fa046f0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 10, 2020
fd10fb6
rebase compat
jbrockmendel Apr 10, 2020
7ea5d3a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 11, 2020
7150e87
slice instead of take
jbrockmendel Apr 12, 2020
bddfbb0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 13, 2020
25f83d6
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 13, 2020
2142d29
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 14, 2020
0ca2125
Dummy commit to force CI
jbrockmendel Apr 14, 2020
1ea0cc0
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 15, 2020
2bfc308
update call bound
jbrockmendel Apr 15, 2020
d5ad2a0
update max_len
jbrockmendel Apr 15, 2020
108004b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 16, 2020
7ca5f9a
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 17, 2020
e78570d
never take
jbrockmendel Apr 17, 2020
33dfbdf
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 25, 2020
30f655b
REF: move operate_blockwise to new file
jbrockmendel Apr 25, 2020
65a4eaf
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 25, 2020
30d6580
ndim compat
jbrockmendel Apr 25, 2020
fe21f9c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 26, 2020
f86deb4
separate out helper function
jbrockmendel Apr 26, 2020
f3dc465
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel Apr 27, 2020
b57f52c
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel May 11, 2020
0c46531
update per comments
jbrockmendel May 11, 2020
463a145
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel May 12, 2020
32e70d8
update per comments
jbrockmendel May 12, 2020
7989251
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel May 14, 2020
455e45e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel May 16, 2020
41e8e78
update asv
jbrockmendel May 16, 2020
ac8eea8
Merge branch 'master' of https://github.com/pandas-dev/pandas into pe…
jbrockmendel May 17, 2020
8c4f951
requested edits
jbrockmendel May 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1299,7 +1299,7 @@ def _addsub_object_array(self, other: np.ndarray, op):
result : same class as self
"""
assert op in [operator.add, operator.sub]
if len(other) == 1:
if len(other) == 1 and self.ndim == other.ndim == 1:
return op(self, other[0])

warnings.warn(
Expand Down
72 changes: 69 additions & 3 deletions pandas/core/ops/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""
import datetime
import operator
from typing import TYPE_CHECKING, Optional, Set, Tuple
from typing import TYPE_CHECKING, List, Optional, Set, Tuple

import numpy as np

Expand Down Expand Up @@ -58,6 +58,7 @@

if TYPE_CHECKING:
from pandas import DataFrame # noqa:F401
from pandas.core.internals.blocks import Block # noqa: F401

# -----------------------------------------------------------------------------
# constants
Expand Down Expand Up @@ -353,6 +354,70 @@ def fill_binop(left, right, fill_value):
# Dispatch logic


def operate_blockwise(left, right, array_op):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok much nicer / readable that before. I would consider moving this to a new module to de-nest these functions for blockwise operations (e.g. pandas/core/ops/blockwise.py), but can be later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if you move this to ops/blockwise.py would be a +1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would really like to put this in a new file

assert right._indexed_same(left)

res_blks: List["Block"] = []
rmgr = right._data
for n, blk in enumerate(left._data.blocks):
locs = blk.mgr_locs

blk_vals = blk.values

if not isinstance(blk_vals, np.ndarray):
# 1D EA
assert len(locs) == 1, locs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more clear to make for example this section (368 to 374) as a function

so the structure of what you are doing is more clear.

rser = right.iloc[:, locs[0]]
rvals = extract_array(rser, extract_numpy=True)
res_values = array_op(blk_vals, rvals)
nbs = blk._split_op_result(res_values)
res_blks.extend(nbs)
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments here.(general + line comments as needed)

rblks = rmgr._slice_take_blocks_ax0(locs.indexer)

for k, rblk in enumerate(rblks):
lvals = blk_vals[rblk.mgr_locs.indexer, :]
rvals = rblk.values

if not isinstance(rvals, np.ndarray):
# 1D EA
assert lvals.shape[0] == 1, lvals.shape
lvals = lvals[0, :]
res_values = array_op(lvals, rvals)
nbs = rblk._split_op_result(res_values)
assert len(nbs) == 1
nb = nbs[0]
nb.mgr_locs = locs.as_array[nb.mgr_locs]
res_blks.append(nb)
continue

assert lvals.shape == rvals.shape, (lvals.shape, rvals.shape)

res_values = array_op(lvals, rvals)
assert res_values.shape == lvals.shape, (res_values.shape, lvals.shape)
nbs = rblk._split_op_result(res_values)
for nb in nbs:
# TODO: maybe optimize by sticking with slices?
nb_mgr_locs = nb.mgr_locs
nblocs = locs.as_array[nb_mgr_locs.indexer]
nb.mgr_locs = nblocs
assert len(nblocs) == nb.shape[0], (len(nblocs), nb.shape)
assert all(x in locs.as_array for x in nb.mgr_locs.as_array)

res_blks.extend(nbs)

slocs = {y for nb in res_blks for y in nb.mgr_locs.as_array}
nlocs = sum(len(nb.mgr_locs.as_array) for nb in res_blks)
assert nlocs == len(left.columns), (nlocs, len(left.columns))
assert len(slocs) == nlocs, (len(slocs), nlocs)
assert slocs == set(range(nlocs)), slocs

# TODO: once this is working, pass do_integrity_check=False
new_mgr = type(rmgr)(res_blks, axes=rmgr.axes)
return new_mgr


def dispatch_to_series(left, right, func, str_rep=None, axis=None):
"""
Evaluate the frame operation func(left, right) by evaluating
Expand Down Expand Up @@ -385,8 +450,9 @@ def dispatch_to_series(left, right, func, str_rep=None, axis=None):
elif isinstance(right, ABCDataFrame):
assert right._indexed_same(left)

def column_op(a, b):
return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to improve the performance of this line, and avoid the complexity that is added in operate_blockwise.
It will not give the same performance boost as what you now have with the blockwise operation, but it might already be a nice speedup with much less complex code, so a trade-off to consider IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless this is worth doing on L343/424 and L348/429

array_op = get_array_op(func, str_rep=str_rep)
bm = operate_blockwise(left, right, array_op)
return type(left)(bm)

elif isinstance(right, ABCSeries) and axis == "columns":
# We only get here if called via _combine_series_frame,
Expand Down
24 changes: 14 additions & 10 deletions pandas/core/ops/array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from functools import partial
import operator
from typing import Any, Optional
import warnings

import numpy as np

Expand Down Expand Up @@ -132,7 +133,7 @@ def masked_arith_op(x: np.ndarray, y, op):
return result


def define_na_arithmetic_op(op, str_rep: str):
def define_na_arithmetic_op(op, str_rep: Optional[str]):
def na_op(x, y):
return na_arithmetic_op(x, y, op, str_rep)

Expand Down Expand Up @@ -163,15 +164,18 @@ def na_arithmetic_op(left, right, op, str_rep: Optional[str], is_cmp: bool = Fal
"""
import pandas.core.computation.expressions as expressions

try:
result = expressions.evaluate(op, str_rep, left, right)
except TypeError:
if is_cmp:
# numexpr failed on comparison op, e.g. ndarray[float] > datetime
# In this case we do not fall back to the masked op, as that
# will handle complex numbers incorrectly, see GH#32047
raise
result = masked_arith_op(left, right, op)
with warnings.catch_warnings():
# suppress warnings from numpy about element-wise comparison
warnings.simplefilter("ignore", DeprecationWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What warning are we suppressing here, and is this future-proof? If this is the usual

In [2]: np.array([1, 2]) == 'a'
/Users/taugspurger/.virtualenvs/dask-dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  #!/Users/taugspurger/Envs/dask-dev/bin/python
Out[2]: False

then in the future the result will be array([False, False]). Is that what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warning-catching was necessary to make the npdev build, pass, im going to see if i can revert it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like catching this is needed in some form, but i can move the catching so as to cast a less-wide net.

try:
result = expressions.evaluate(op, str_rep, left, right)
except TypeError:
if is_cmp:
# numexpr failed on comparison op, e.g. ndarray[float] > datetime
# In this case we do not fall back to the masked op, as that
# will handle complex numbers incorrectly, see GH#32047
raise
result = masked_arith_op(left, right, op)

if is_cmp and (is_scalar(result) or result is NotImplemented):
# numpy returned a scalar instead of operating element-wise
Expand Down
9 changes: 8 additions & 1 deletion pandas/tests/arithmetic/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,14 @@ def assert_invalid_comparison(left, right, box):
result = right != left
tm.assert_equal(result, ~expected)

msg = "Invalid comparison between|Cannot compare type|not supported between"
msg = "|".join(
[
"Invalid comparison between",
"Cannot compare type",
"not supported between",
"invalid type promotion",
]
)
with pytest.raises(TypeError, match=msg):
left < right
with pytest.raises(TypeError, match=msg):
Expand Down
7 changes: 4 additions & 3 deletions pandas/tests/arithmetic/test_datetime64.py
Original file line number Diff line number Diff line change
Expand Up @@ -964,7 +964,9 @@ def test_dt64arr_sub_dt64object_array(self, box_with_array, tz_naive_fixture):
obj = tm.box_expected(dti, box_with_array)
expected = tm.box_expected(expected, box_with_array)

warn = PerformanceWarning if box_with_array is not pd.DataFrame else None
warn = None
if box_with_array is not pd.DataFrame or tz_naive_fixture is None:
warn = PerformanceWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this changed? We do / do not raise a warning on the array-level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do a warning when operating op(arr, length_1_object_array), which turns out to be the cases described here (in reverse)

with tm.assert_produces_warning(warn):
result = obj - obj.astype(object)
tm.assert_equal(result, expected)
Expand Down Expand Up @@ -1388,8 +1390,7 @@ def test_dt64arr_add_mixed_offset_array(self, box_with_array):
s = DatetimeIndex([Timestamp("2000-1-1"), Timestamp("2000-2-1")])
s = tm.box_expected(s, box_with_array)

warn = None if box_with_array is pd.DataFrame else PerformanceWarning
with tm.assert_produces_warning(warn):
with tm.assert_produces_warning(PerformanceWarning):
other = pd.Index([pd.offsets.DateOffset(years=1), pd.offsets.MonthEnd()])
other = tm.box_expected(other, box_with_array)
result = s + other
Expand Down