Skip to content

PERF: indexing #43274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

debnathshoham
Copy link
Member

@debnathshoham debnathshoham commented Aug 28, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Added a check as suggested in #39280 (comment)

In [1]: from asv_bench.benchmarks.indexing import InsertColumns

In [2]: self = InsertColumns()

In [3]: self.setup()

In [4]: %timeit self.time_assign_with_setitem()
27.6 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) #master

In [4]: %timeit self.time_assign_with_setitem()
8.6 ms ± 6.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) #PR

Edit:

       before           after         ratio
     [2b3a9b16]       [fa9c7140]
     <master>         <indexing.InsertColumns.time_assign_with_setitem>
-         179±4μs        163±0.6μs     0.91  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        838±50μs          759±7μs     0.90  indexing.NonNumericSeriesIndexing.time_getitem_list_like('datetime', 'nonunique_monotonic_inc')
-        37.7±4μs       34.0±0.5μs     0.90  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-     2.91±0.09ms       2.61±0.1ms     0.90  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        60.1±2μs       54.0±0.5μs     0.90  indexing.NumericSeriesIndexing.time_iloc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.56±0.03ms      1.40±0.04ms     0.90  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        183±20μs          164±1μs     0.89  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     2.39±0.05ms         2.14±0ms     0.89  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-        879±30μs         780±50μs     0.89  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_incr')
-        184±20μs          163±1μs     0.88  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-         131±7μs          115±4μs     0.87  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     3.02±0.03ms      2.63±0.09ms     0.87  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     3.19±0.04ms      2.78±0.04ms     0.87  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      11.7±0.2ms       10.2±0.2ms     0.87  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-     1.89±0.04ms         1.65±0ms     0.87  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-      4.63±0.1ms      3.99±0.02ms     0.86  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.82±0.04ms      1.56±0.01ms     0.86  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-      3.69±0.1ms       3.14±0.1ms     0.85  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-     4.08±0.06ms      3.43±0.02ms     0.84  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-        150±30μs          120±3μs     0.80  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        961±30μs         762±20μs     0.79  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_decr')
-        295±30μs          231±3μs     0.78  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-      1.11±0.3ms         825±20μs     0.74  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt8Engine'>, <class 'numpy.uint8'>), 'monotonic_incr')
-        741±80μs          288±9μs     0.39  indexing.AssignTimeseriesIndex.time_frame_assign_timeseries_index
-        49.3±2ms       18.2±0.2ms     0.37  indexing.InsertColumns.time_assign_with_setitem
-       54.3±20ms         18.5±1ms     0.34  indexing.InsertColumns.time_assign_list_like_with_setitem

@debnathshoham debnathshoham changed the title PERF: indexing.InsertColumns.time_assign_with_setitem PERF: indexing Aug 28, 2021
@jbrockmendel
Copy link
Member

nice cc @phofl

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

can u run the indexing asvs and report here

@phofl
Copy link
Member

phofl commented Aug 28, 2021

Really nice, good catch.

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

also pls add a whatsnew note for 1.4

@mzeitlin11
Copy link
Member

Could also go in 1.3.3, looks like this helps top regression here: https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x

@phofl
Copy link
Member

phofl commented Aug 28, 2021

Yeah good point, this check was added in 1.3

@debnathshoham
Copy link
Member Author

should I put the whatsnew in 1.3.3 fixed regression or under others?
@jreback - I have added the indexing asvs report in the top comment

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

great yeah 1.3.3 is fine (add a perf section)


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify in which cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @phofl - updated. Let me know if this is fine.

@simonjayhawkins simonjayhawkins added this to the 1.3.3 milestone Aug 29, 2021
@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Aug 29, 2021
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Aug 29, 2021

Could also go in 1.3.3, looks like this helps top regression here: https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x

side note: class InsertColumns was updated in #42998. Changing the code or setup code invalidates previous results for that benchmark. So that regression is not currently reported.

@simonjayhawkins
Copy link
Member

side note: class InsertColumns was updated in #42998. Changing the code or setup code invalidates previous results for that benchmark. So that regression is not currently reported.

benchmarks have been re-run and gh-pages updated.


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` with dataframes containing unique columns (:issue:`43274`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not specific enough. If key is any list-like or frame we don't hit this at all. Key has to be a scalar I think and value is not a df

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. let me know how this looks


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` when the key or value is not a dataframe, or key is not list-like (:issue:`43274`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:class:DataFrame, otherwise lgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @phofl merge when ready.

~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` when the key or value is not a :class:`DataFrame`, or key is not list-like (:issue:`43274`)
-
-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a blank line, docs aren't building like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I see. I was confused why the docs were consistently failing.

@phofl phofl merged commit 1009a7d into pandas-dev:master Aug 30, 2021
@phofl
Copy link
Member

phofl commented Aug 30, 2021

thx @debnathshoham

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 30, 2021
@debnathshoham debnathshoham deleted the indexing.InsertColumns.time_assign_with_setitem branch August 30, 2021 15:49
phofl pushed a commit that referenced this pull request Aug 30, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
* PERF: indexing.InsertColumns.time_assign_with_setitem

* added whatsnew

* more descriptive whatsnew

* rectified whatsnew

* added PR ref in whatsnew

* clear whatsnew

* suggested change

* added blank line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants