[PERF] Get rid of MultiIndex conversion in IntervalIndex.is_monotonic methods #25820

makbigc · 2019-03-21T11:56:57Z

xref PERF: Avoid MultiIndex conversion for IntervalIndex methods #24813
passes git diff upstream/master -u -- "*.py" | flake8 --diff

All benchmarks:

       before           after         ratio
     [85572de5]       [445fbe56]
     <master>         <GH24813-inc-dec>
-         261±6ns          134±3ns     0.51  index_object.IntervalIndexMethod.time_monotonic_inc

codecov · 2019-03-21T12:33:12Z

Codecov Report

Merging #25820 into master will decrease coverage by <.01%.
The diff coverage is 75%.

@@            Coverage Diff             @@
##           master   #25820      +/-   ##
==========================================
- Coverage   91.27%   91.26%   -0.01%     
==========================================
  Files         173      173              
  Lines       53002    53007       +5     
==========================================
+ Hits        48375    48379       +4     
- Misses       4627     4628       +1

Flag	Coverage Δ
#multiple	`89.83% <75%> (-0.01%)`	⬇️
#single	`41.76% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`94.94% <75%> (-0.31%)`	⬇️
pandas/util/testing.py	`89.4% <0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f39a9ce...445fbe5. Read the comment docs.

codecov · 2019-03-21T12:33:14Z

Codecov Report

Merging #25820 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25820      +/-   ##
==========================================
- Coverage   91.98%   91.97%   -0.01%     
==========================================
  Files         175      175              
  Lines       52377    52377              
==========================================
- Hits        48180    48176       -4     
- Misses       4197     4201       +4

Flag	Coverage Δ
#multiple	`90.53% <100%> (ø)`	⬆️
#single	`40.71% <0%> (-0.15%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`95.22% <100%> (ø)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.9% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ab84a0...480b353. Read the comment docs.

pandas/core/indexes/interval.py

jschendel · 2019-03-25T21:08:49Z

Can you add a whatsnew note to pandas/doc/source/whatsnew/v0.25.0.rst under the "Performance Improvements" section?

jreback · 2019-03-25T23:45:44Z

asv_bench/benchmarks/index_object.py

+class IntervalIndexMethod(object):
+    # GH 24813
+    def setup(self):
+        N = 10**5


can you add N=10**3 as well

jreback · 2019-03-30T19:22:43Z

can you update

topper-123 · 2019-04-01T21:16:51Z

pandas/core/indexes/interval.py

+        values = [self.right, self.left]
+
+        sort_order = np.lexsort(values)
+        return Index(sort_order).is_monotonic


Hey @makbigc,

Thansk for contributing.

I worked on CategoricalIndex.is_monotonic_increasing a while ago and I can remember that instantiating an Index was quite heavy, probably because a lot of things are being set up when instantiation an Index. Can you try using pd._libs.algos.is_monotonic(sort_order, False) instead of making an index. Should be even faster.

I get from a quick timeit test:

>>> N = 10**5 >>> left = np.append(np.arange(N), np.array(0)) >>> right = np.append(np.arange(1, N + 1), np.array(1)) >>> sort_order = np.lexsort([left, right]) >>> %timeit pd.Index(sort_order).is_monotonic 42.8 µs ± 3.48 µs per loop >>> %timeit pd._libs.algos.is_monotonic(sort_order, False)[0] 4.15 µs ± 123 ns per loop

So a 10x improvement.

Architectually, can you also see if it makes sense to add InterValTree.is_monotonic_[increasing|decreasing] methods using _libs.algos.is_monotonic and then in IntervalIndex.is_monotonic_increasing just make a call to self._engine.is_monotonic_increasing. That would make the call structure for IntervalIndex similar to the one for other indexes, see e.g. Index.is_monotonic_increasing.

@jreback, do you agree?

@topper-123 Thanks for your elucidative advice. The use of algos.is_monotonic does improve the performance.

All benchmarks: before after ratio [23650ef5] [3e32db5b] <GH24813-inc-dec~2> <GH24813-inc-dec> 213±3ns 210±2ns 0.98 index_object.IntervalIndexMethod.time_monotonic_inc(1000) 268±1ns 260±2ns 0.97 index_object.IntervalIndexMethod.time_monotonic_inc(100000)

While putting is_monotonic_increasing in IntervalIndex._engine slows down the performance. I guess this is due to the formation of IntervalTree is expensive in the _engine method. The order is increased from ns to ms.

You're not really measuring the speed of the operation in the above benchmark. Notice that @cache_readonly above the is_monotonic_increasing method. That means that the result of the call to is_monotonic_increasing replaces the method itelf upon first call, so you're just measuring attribute access speed.

The formation of IntervalTree should happen only once, so should be in the ns-range also on subsequent calls. Are you forgtting to cache the result of IntervalTree.is_monotonic_increasing? If it resly is being constructed anew each time, can you post an issue to the issue list.

topper-123 · 2019-04-01T21:25:55Z

asv_bench/benchmarks/index_object.py

+        self.intv = IntervalIndex.from_arrays(left, right)
+
+    def time_monotonic_inc(self):
+        self.intv.is_monotonic_increasing


Just note that this measures access speed to the cached value, not the time cost of the underlying operation. To time the underlying method, you need to do pd.IntervalIndex.is_monotonic_increasing.func(self.intv), but that hits a cached value at self.intv._multiindex.is_monotonic_increasing in master, so will still seem to be in the nanosecond range, but really won't be because it's just the caching doing tricks.

You're right. The performance is even better. The _engine is set in prior in asv setup method.

All benchmarks: before after ratio [a1fee919] [6fb6d2cb] <master> <GH24813-inc-dec> 229±1ns 209±1ns 0.91 index_object.IntervalIndexMethod.time_monotonic_inc(1000) - 450±10ns 260±2ns 0.58 index_object.IntervalIndexMethod.time_monotonic_inc(100000)

jreback · 2019-04-05T00:39:08Z

@makbigc can you merge master, also @topper-123 offered some nice suggestions, see if they can work

topper-123 · 2019-04-07T09:03:56Z

pandas/core/indexes/interval.py


    @cache_readonly
    def is_monotonic_decreasing(self):
        """
        Return True if the IntervalIndex is monotonic decreasing (only equal or
        decreasing values), else False
        """
-        return self._multiindex.is_monotonic_decreasing
+        return self[::-1].is_monotonic_increasing


I would prefer checking monotonic_decreasing directly, so:

values = [self.right, self.left] sort_order = np.lexsort(values) return is_monotonic(sort_order, False)[1]

(But adding the properties to IntervalTree is even better).

I have tried in this way. But is_monotonic(sort_order, False)[1] returns False for stationary IntervalIndex, e.g. IntervalIndex([(0, 1], (0, 1]], closed='right', dtype='interval[int64]'). By definition, is_monotonic_decreasing should be True for equal values.

is_monotonic(sort_order, False)[0] returns True for the stationary IntervalIndex.

You are right, as lexsort does treats equal values as increasing their index location, so is_monotonic(...)[1] will fail.

It would still deliver efficiency gains from calculating is_monotonic_increasing and unique simultaneous using is_monotonic in IntervalTree as described above. Would you be up for adding this? Else this PR is good IMO.

When implementing is_unique in this way, I got a similar problem. That is, is_monotonic(sort_order, False)[2] returns False for an IntervalIndex with consecutively identity element, e.g. IntervalIndex([(0, 1], (0, 1], (2, 3]]) .

IntervalIndex.is_unique is being handled in #25159

jreback · 2019-04-20T17:02:02Z

can you merge master and ping on green.

jreback · 2019-04-21T16:02:30Z

thanks @makbigc

makbigc · 2019-04-22T02:41:36Z

thanks @topper-123 @jschendel and @jreback

… methods (pandas-dev#25820)

jschendel reviewed Mar 21, 2019

View reviewed changes

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved

jschendel added Performance Memory or execution speed performance Interval Interval data type labels Mar 21, 2019

jreback requested changes Mar 22, 2019

View reviewed changes

pandas/core/indexes/interval.py Outdated Show resolved Hide resolved

jreback added this to the 0.25.0 milestone Mar 22, 2019

jreback requested changes Mar 25, 2019

View reviewed changes

topper-123 requested changes Apr 1, 2019

View reviewed changes

topper-123 reviewed Apr 7, 2019

View reviewed changes

makbigc added 9 commits April 14, 2019 23:15

Get rid of _multiindex

7cb99e1

Add benchmark test

0c54e55

Decrease test_monotonic_inc by one order and change is_monotonic

e9f2601

Remove the exception for the mixed type

8c5c580

Add a whatsnew note

543b0d3

Use algos.is_monotonic instead of Index instant

0bba602

Add one more bench

ceb91f3

Move is_monotonic_inc to IntervalTree

2e6fa28

Run _engine method in setup

34ad0f2

makbigc force-pushed the GH24813-inc-dec branch from 4f53e80 to 34ad0f2 Compare April 14, 2019 15:19

Remove trailing space

57f8404

jschendel mentioned this pull request Apr 20, 2019

PERF: Avoid MultiIndex conversion for IntervalIndex methods #24813

Closed

9 tasks

Merge master

480b353

jreback approved these changes Apr 21, 2019

View reviewed changes

jreback merged commit ef3bec2 into pandas-dev:master Apr 21, 2019

makbigc deleted the GH24813-inc-dec branch April 22, 2019 02:41

ryanreh99 added a commit to ryanreh99/pandas that referenced this pull request Apr 22, 2019

[PERF] Get rid of MultiIndex conversion in IntervalIndex.is_monotonic…

38ea1f2

… methods (pandas-dev#25820)

Uh oh!

[PERF] Get rid of MultiIndex conversion in IntervalIndex.is_monotonic methods #25820

[PERF] Get rid of MultiIndex conversion in IntervalIndex.is_monotonic methods #25820

Uh oh!

Conversation

makbigc commented Mar 21, 2019 • edited by jreback Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 21, 2019

Codecov Report

Uh oh!

codecov bot commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

jschendel commented Mar 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 30, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Apr 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Apr 20, 2019

Uh oh!

jreback commented Apr 21, 2019

Uh oh!

makbigc commented Apr 22, 2019

Uh oh!

Uh oh!

makbigc commented Mar 21, 2019 •

edited by jreback

Loading

codecov bot commented Mar 21, 2019 •

edited

Loading