Skip to content

PERF: repeated slicing along index in groupby #40353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 16, 2021

Conversation

jbrockmendel
Copy link
Member

This cuts another 25% off of the benchmark discussed #40171 (comment), getting us to within 3x of the cython-path performance.

@@ -817,6 +818,15 @@ def get_slice(self, slobj: slice, axis: int = 0) -> BlockManager:

return type(self)._simple_new(tuple(new_blocks), new_axes)

def get_slice_index(self, slobj: slice) -> BlockManager:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a get_slice_columns and change get_slice to just dispatch? (to share code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only code-sharing i see available would be to change L819-L819 to return self.get_slice_index(slobj)which only saves 1 line

Flip side: one of the things im looking at is making another attempt at getting axes out of BlockManager (and ArrayManager), which would marginally widen the scope for sharing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this specialized version avoid? Isn't it only the if axis == 0:/ elif axis == 1 checks?
(which I assume are quite fast checks, although if done repeatedly might indeed still worth to avoid)

The other difference is that get_slice does

            slicer = (slice(None), slobj)
            new_blocks = [blk.getitem_block(slicer) for blk in self.blocks]

while this new function does:

new_blocks = [blk.getitem_block_index(slobj) for blk in self.blocks]

but get_slice could do that as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you answer this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it only the if axis == 0:/ elif axis == 1 checks? (which I assume are quite fast checks, although if done repeatedly might indeed still worth to avoid)

Yes. The code I'm profiling to optimize is (#40171 (comment))

import numpy as np
import pandas as pd

N = 10 ** 4
labels = np.random.randint(0, 2000, size=N)
labels2 = np.random.randint(0, 3, size=N)
df = pd.DataFrame(
    {
        "key": labels,
        "key2": labels2,
        "value1": np.random.randn(N),
        "value2": ["foo", "bar", "baz", "qux"] * (N // 4),
    }
)

%prun -s cumtime for i in range(100): df.groupby("key").apply(lambda x: 1)  # on master as of this writing

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.170    4.170 {built-in method builtins.exec}
        1    0.001    0.001    4.170    4.170 <string>:1(<module>)
      100    0.001    0.000    4.156    0.042 groupby.py:911(apply)
      100    0.004    0.000    4.150    0.042 groupby.py:960(_python_apply_general)
      100    0.497    0.005    4.084    0.041 ops.py:263(apply)
   199000    0.176    0.000    3.223    0.000 ops.py:1005(__iter__)
   198900    0.248    0.000    2.986    0.000 ops.py:1046(_chop)
   198900    0.339    0.000    2.522    0.000 managers.py:796(get_slice)
   198900    0.188    0.000    1.631    0.000 managers.py:803(<listcomp>)
   596700    0.823    0.000    1.443    0.000 blocks.py:324(getitem_block)
   198900    0.135    0.000    0.409    0.000 base.py:4638(_getitem_slice)

we're still spending 8% of the runtime in get_slice itself, which is why i think avoiding re-doing these checks is worthwhile.

but get_slice could do that as well?

sure, could update get_slice to also use getitem_block_index (i think the first draft of this PR started before get_slice reliably actually got a slice object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're still spending 8% of the runtime in get_slice itself, which is why i think avoiding re-doing these checks is worthwhile.

And did that number reduce by using get_slice_index instead?

sure, could update get_slice to also use getitem_block_index

Yes, let's at least make the code consistent there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this branch, timing mgr.get_slice_index(slobj) vs mgr.get_slice(slobj, axis=1) (but making them both use getitem_block_index, and removing the assert) , I only get a tiny difference (less than 5% difference, and that's then on something that only takes 8% of the total runtime)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im seeing 17-22% gains with just getitem_block_index and not the extra BM/AM methods, so will revert those for this PR

@jreback jreback added Groupby Performance Memory or execution speed performance labels Mar 11, 2021
@@ -349,6 +349,15 @@ def __getitem__(self, item: Any) -> Any:
"Only integers, slices and integer or "
"boolean arrays are valid indices."
)
elif isinstance(item, tuple):
# possibly unpack arr[:, n] to arr[n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed for arrow string array? That's invalid indexing for a 1D array, and should raise an IndexError (as numpy does, or as we do for eg StringArray)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be arr[..., n], will update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for arr[..., n], why is it needed specifically here for Arrow string array? That also doesn't work for StringArray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got a test failure without this; will add a base extension test to require it for all EAs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember which test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW, I am not sure we should require this behaviour from EAs, as we can't be consistent with numpy in returning a 0-dim array)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember which test?

test_groupby_extension_apply * 8

@@ -817,6 +818,15 @@ def get_slice(self, slobj: slice, axis: int = 0) -> BlockManager:

return type(self)._simple_new(tuple(new_blocks), new_axes)

def get_slice_index(self, slobj: slice) -> BlockManager:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this specialized version avoid? Isn't it only the if axis == 0:/ elif axis == 1 checks?
(which I assume are quite fast checks, although if done repeatedly might indeed still worth to avoid)

The other difference is that get_slice does

            slicer = (slice(None), slobj)
            new_blocks = [blk.getitem_block(slicer) for blk in self.blocks]

while this new function does:

new_blocks = [blk.getitem_block_index(slobj) for blk in self.blocks]

but get_slice could do that as well?

@jreback jreback added this to the 1.3 milestone Mar 12, 2021
@@ -1012,19 +1012,27 @@ def __iter__(self):

starts, ends = lib.generate_slices(self.slabels, self.ngroups)

for i, (start, end) in enumerate(zip(starts, ends)):
yield i, self._chop(sdata, slice(start, end))
if self.axis == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make this more readable by

method =  self._chop_index if self.axis == 0 else self._chop_column

for i ......

     yield i, method(....)

@@ -245,6 +245,17 @@ def test_getitem_slice(self, data):
result = data[slice(1)] # scalar
assert isinstance(result, type(data))

def test_getitem_ellipsis_and_slice(self, data):
# GH#40353 this is called from getitem_block_index
result = data[..., :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test data[:, ...] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will update

@jbrockmendel
Copy link
Member Author

i think following today's call we're all set here?

@jreback
Copy link
Contributor

jreback commented Mar 16, 2021

guess i merged in wrong order. can you merge master and ping on green.

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Mar 16, 2021

@jorisvandenbossche any guesses why the array manager sql tests are breaking on a bunch of PRs?

update: looks like they break for non-ArrayManager, just thats the only place we have it running with up-to-date sqlalchemy

@jorisvandenbossche
Copy link
Member

Yes, it seems some new deprecation warnings with a new SQLAlchemy

@jbrockmendel
Copy link
Member Author

green ex unrelated sqlalchemy issue

@jreback
Copy link
Contributor

jreback commented Mar 16, 2021

merging this, the sqlalchemy patch can be a followup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants