Skip to content

PERF: DataFrame construction #42631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 23, 2021
Merged

Conversation

jbrockmendel
Copy link
Member

In [1]: from asv_bench.benchmarks.frame_ctor import *

In [2]: cls = FromArrays

In [3]: self = cls()

In [4]: self.setup()

In [5]: %timeit self.time_frame_from_arrays_int()
6.21 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- master
3.27 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR

@jreback jreback added the Performance Memory or execution speed performance label Jul 22, 2021
@jreback jreback added this to the 1.4 milestone Jul 22, 2021
@jreback jreback added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 22, 2021
@jreback
Copy link
Contributor

jreback commented Jul 22, 2021

cool i would actually add a release note about this as this is a significant perf improvement. does this hold more generally with other construction asv's?

also doe you think its the removing asserts or the changes in not using is_sparse that are the big wins here? (guessing its the asserts)

@jbrockmendel
Copy link
Member Author

also doe you think its the removing asserts or the changes in not using is_sparse that are the big wins here? (guessing its the asserts)

I forget the exact numbers, but remember being surprised how big a difference the is_sparse change made

@jreback jreback merged commit 9d6f8fe into pandas-dev:master Jul 23, 2021
@jbrockmendel jbrockmendel deleted the perf-construction branch July 23, 2021 23:24
@rhshadrach
Copy link
Member

I'm seeing failure for the groupby.apply doctest on master. The condition that leads to raising an error is

if newb.shape != self.shape:

Looks like shape is being cached here - might be related?

@jbrockmendel
Copy link
Member Author

good catch, that caching should be reverted (and ideally a non-doctest test implemented)

simonjayhawkins added a commit that referenced this pull request Jul 24, 2021
simonjayhawkins added a commit that referenced this pull request Jul 24, 2021
@jbrockmendel jbrockmendel mentioned this pull request Jul 24, 2021
4 tasks
CGe0516 pushed a commit to CGe0516/pandas that referenced this pull request Jul 29, 2021
CGe0516 pushed a commit to CGe0516/pandas that referenced this pull request Jul 29, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants