PERF: use uniqueness_check from monotonic check when possible #14270

jreback · 2016-09-21T13:06:43Z

jreback · 2016-09-21T13:09:06Z

can you give this a check (ideally you can perf test this as well)......

TomAugspurger · 2016-09-21T14:06:15Z

Is this big enough to warrant a second release candidate? (you have a better understanding for how big a change this is).

jreback · 2016-09-21T15:19:07Z

@TomAugspurger well our entire test suite passes, so I don't think this is a big deal as not changing functionaility (this is almost a bug fix).

jreback · 2016-09-21T15:19:34Z

doc/source/whatsnew/v0.19.0.txt

@@ -1404,6 +1404,7 @@ Performance Improvements
 - Improved performance of datetime string parsing in ``DatetimeIndex`` (:issue:`13692`)
 - Improved performance of hashing ``Period`` (:issue:`12817`)
 - Improved performance of ``factorize`` of datetime with timezone (:issue:`13750`)
+- Improved performance of of not eagerly creating indexing hashtables on larges Indexes (:issue:`14266`)


sp, I know!

jreback · 2016-09-21T15:56:38Z

some nice perf improvements as well (note these are backwards, in that a7469rc9 is master)

  [9ced5d50] [a7469cf9]
+   14.98μs    54.75ms   3656.26  indexing.series_getitem_scalar.time_series_getitem_scalar
+   30.83μs    54.60ms   1770.70  indexing.series_ix_scalar.time_series_ix_scalar
+   50.48μs    52.97ms   1049.33  indexing.series_loc_slice.time_series_loc_slice
+   54.64μs    53.11ms    972.14  indexing.series_ix_slice.time_series_ix_slice
+    2.21ms    56.38ms     25.52  timeseries.timeseries_large_lookup_value.time_timeseries_large_lookup_value
+    1.36ms     4.17ms      3.08  timeseries.timeseries_datetimeindex_offset_delta.time_timeseries_datetimeindex_offset_delta
+  231.80μs   514.32μs      2.22  timeseries.datetimeindex_add_offset.time_datetimeindex_add_offset
-    3.45ms     1.50ms      0.43  index_object.index_datetime_set_difference.time_index_datetime_difference_disjoint
-   51.35ms    21.71ms      0.42  frame_methods.frame_getitem_single_column.time_frame_getitem_single_column
-  312.45μs   129.48μs      0.41  timeseries.timeseries_asof_single.time_timeseries_asof_single
-  239.91μs    92.65μs      0.39  index_object.datetime_index_union.time_datetime_index_union
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2016-09-21T16:08:14Z

@ssanderson if you could test out would be great.

codecov-io · 2016-09-21T17:11:57Z

Current coverage is 85.25% (diff: 100%)

Merging #14270 into master will decrease coverage by <.01%

@@             master     #14270   diff @@
==========================================
  Files           140        140          
  Lines         50568      50568          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43112      43111     -1   
- Misses         7456       7457     +1   
  Partials          0          0

Powered by Codecov. Last update a7469cf...968a4f7

ssanderson · 2016-09-21T17:44:28Z

@jreback I'll take a look shortly.

llllllllll · 2016-09-21T17:55:04Z

This change looks good to me, not sure if I can easily test zipline with this though. Running the memory usage example from the issue shows the memory usage pretty constant across the get_loc call. Thanks for the really quick fix!

ssanderson · 2016-09-21T17:56:49Z

I'm installing this into a fresh zipline build to see if it runs (there may
be upgrade work that needs to happen to be compat with 0.19, so if that
doesn't work I'll probably just confirm the minimal repro case as well.)

On Wed, Sep 21, 2016 at 1:55 PM, Joe Jevnik [email protected]
wrote:

This change looks good to me, not sure if I can easily test zipline with
this though. Running the memory usage example from the issue shows the
memory usage pretty constant across the get_loc call. Thanks for the really
quick fix!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14270 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABg8hf0Wb0pogI1wR6RgrZFbCKUhloBPks5qsW-IgaJpZM4KCxuk
.

ssanderson · 2016-09-21T18:20:10Z

LGTM. Thanks for the quick turnaround @jreback.

closes pandas-dev#14266

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Sep 21, 2016

jreback added this to the 0.19.0 milestone Sep 21, 2016

jreback force-pushed the memory branch from ad5a646 to 97b3c69 Compare September 21, 2016 13:08

jreback force-pushed the memory branch 2 times, most recently from c2c4927 to 9ced5d5 Compare September 21, 2016 13:55

jreback force-pushed the memory branch from 9ced5d5 to 443d5e3 Compare September 21, 2016 15:05

jreback commented Sep 21, 2016

View reviewed changes

PERF: use uniqueness_check from monotonic check when possible

968a4f7

closes pandas-dev#14266

jreback force-pushed the memory branch from 443d5e3 to 968a4f7 Compare September 21, 2016 22:01

jreback closed this in 3c96442 Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: use uniqueness_check from monotonic check when possible #14270

PERF: use uniqueness_check from monotonic check when possible #14270

jreback commented Sep 21, 2016

jreback commented Sep 21, 2016

TomAugspurger commented Sep 21, 2016

jreback commented Sep 21, 2016

jreback Sep 21, 2016

jreback commented Sep 21, 2016

jreback commented Sep 21, 2016

codecov-io commented Sep 21, 2016 •

edited

Loading

ssanderson commented Sep 21, 2016

llllllllll commented Sep 21, 2016

ssanderson commented Sep 21, 2016

ssanderson commented Sep 21, 2016

PERF: use uniqueness_check from monotonic check when possible #14270

PERF: use uniqueness_check from monotonic check when possible #14270

Conversation

jreback commented Sep 21, 2016

jreback commented Sep 21, 2016

TomAugspurger commented Sep 21, 2016

jreback commented Sep 21, 2016

jreback Sep 21, 2016

Choose a reason for hiding this comment

jreback commented Sep 21, 2016

jreback commented Sep 21, 2016

codecov-io commented Sep 21, 2016 • edited Loading

Current coverage is 85.25% (diff: 100%)

ssanderson commented Sep 21, 2016

llllllllll commented Sep 21, 2016

ssanderson commented Sep 21, 2016

ssanderson commented Sep 21, 2016

codecov-io commented Sep 21, 2016 •

edited

Loading