Skip to content

Performance Regression in frame_methods.Equals.time_frame_float_equal #35249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jul 12, 2020 · 5 comments · Fixed by #35328
Closed

Performance Regression in frame_methods.Equals.time_frame_float_equal #35249

TomAugspurger opened this issue Jul 12, 2020 · 5 comments · Fixed by #35328
Labels
Performance Memory or execution speed performance
Milestone

Comments

@TomAugspurger
Copy link
Contributor

https://pandas.pydata.org/speed/pandas/index.html#frame_methods.Equals.time_frame_float_equal?commits=77f4a4d673e2d2b3f7396a94818771d332bd913c

Some discussion starting at #34962 (comment)

@jbrockmendel what needs to be done for 1.1, just optimize array_equivalent or something larger? Is this a blocker (if so can you add the blocker tag)?

@TomAugspurger TomAugspurger added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 12, 2020
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jul 12, 2020
@TomAugspurger TomAugspurger added Performance Memory or execution speed performance and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 12, 2020
@jbrockmendel
Copy link
Member

Looks like optimizing array_equivalent for the floating case gets a 3x speedup. we'll need to operate blockwise to get the rest back.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2020

i guess we didn't merge #35256

@TomAugspurger
Copy link
Contributor Author

Yeah. I started on some small changes that gets a chunk of performance back by optimizing

  1. Using np.isnan in array_equivalent for float / complex.
  2. Noting that NDFrame.equals requires that dtypes match, so we can skip a bunch of dtype checks.

But these just get us partway back. We're still orders of magnitude slower for the somewhat wide table case.

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: N = 10**3
In [4]: float_df = pd.DataFrame(np.random.randn(N, N))
# 1.0.5
1.52 ms ± 94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# master
83.6 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# branch
31.2 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

One thought though, this new implementation is better in some cases since it short circuits. As soon as there's a column that differs we return False. So that's something.

@jbrockmendel
Copy link
Member

The "correct" fix is to operate blockwise. I'm happy to implement that, and it can re-use a bunch of code from the blockwise arithmetic. But I'm not inclined to spend a lot of time on blockwise stuff if we're going all-1D.

@TomAugspurger
Copy link
Contributor Author

Mostly agreed. I think for now I'll push up my small optimizations, and we can accept some slowdown for wide dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants