Skip to content

Place the calculation of mask prior to the calls of comp in replace_list to improve performance #35229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 15, 2020

Conversation

chrispe
Copy link
Contributor

@chrispe chrispe commented Jul 11, 2020

@chrispe
Copy link
Contributor Author

chrispe commented Jul 11, 2020

I've included the change in the whatsnew document of the 1.1 milestone. Do you think we can include it in that release, if this PR is approved on time? It would be nice since this change is related to #32890 which is also in 1.1.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show the results of the asv's

@jreback jreback added Performance Memory or execution speed performance Strings String extension data type and string data labels Jul 13, 2020
@jreback jreback added this to the 1.1 milestone Jul 13, 2020
@chrispe
Copy link
Contributor Author

chrispe commented Jul 14, 2020

can you show the results of the asv's

       before           after         ratio
     [0ed1dcd5]       [5b6912c5]
     <master>         <improve-replace_list-performance>
-      27.2±0.8μs      24.7±0.04μs     0.91  tslibs.timestamp.TimestampAcrossDst.time_replace_across_dst
-        41.3±3μs       37.2±0.4μs     0.90  tslibs.timestamp.TimestampOps.time_replace_tz(datetime.timezone(datetime.timedelta(seconds=3600)))
-        9.03±3μs       8.02±0.1μs     0.89  tslibs.timestamp.TimestampOps.time_replace_None(tzfile('/usr/share/zoneinfo/US/Central'))
-        115±10ms         82.1±9ms     0.72  replace.Convert.time_replace('Series', 'Timedelta')
-         5.33±0s        3.59±0.1s     0.67  replace.ReplaceDict.time_replace_series(True)
-        130±20ms         85.2±8ms     0.66  replace.Convert.time_replace('Series', 'Timestamp')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, pls ping on green.

@@ -856,6 +856,7 @@ Performance improvements
- Performance improvement in arithmetic operations (sub, add, mul, div) for MultiIndex (:issue:`34297`)
- Performance improvement in `DataFrame[bool_indexer]` when `bool_indexer` is a list (:issue:`33924`)
- Significant performance improvement of :meth:`io.formats.style.Styler.render` with styles added with various ways such as :meth:`io.formats.style.Styler.apply`, :meth:`io.formats.style.Styler.applymap` or :meth:`io.formats.style.Styler.bar` (:issue:`19917`)
- Performance improvement in the :func:`Series.replace` when `to_replace` is a dict (:issue:`33920`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this improve performance relative to 1.0.5, or just master? We only need this if it's faster relative to 1.0.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, I have this at ~1.5x slower than 1.0.5 on the example from #33920. So seems like we could still use some work to get us back to 1.0.x speeds, as long as we don't sacrifice correctness.

Copy link
Contributor Author

@chrispe chrispe Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, those improvements are only relative to master. So, maybe it doesn't make sense to list this change at all in the whatsnew document. I'll have it removed. However, this should still be merged to 1.1.0 (if we manage on time) right? And then keep improving on it?

@jreback jreback merged commit 8090479 into pandas-dev:master Jul 15, 2020
@jreback
Copy link
Contributor

jreback commented Jul 15, 2020

thanks @chrispe92

happy to take additional perf improvements in strings or anywhere :->

@chrispe chrispe deleted the improve-replace_list-performance branch July 16, 2020 07:08
fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression in replace.ReplaceDict.time_replace_series
3 participants