Skip to content

PERF: use uniqueness_check from monotonic check when possible #14270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Sep 21, 2016

closes #14266

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Sep 21, 2016
@jreback jreback added this to the 0.19.0 milestone Sep 21, 2016
@jreback
Copy link
Contributor Author

jreback commented Sep 21, 2016

@ssanderson @llllllllll

can you give this a check (ideally you can perf test this as well)......

@jreback jreback force-pushed the memory branch 2 times, most recently from c2c4927 to 9ced5d5 Compare September 21, 2016 13:55
@TomAugspurger
Copy link
Contributor

Is this big enough to warrant a second release candidate? (you have a better understanding for how big a change this is).

@jreback
Copy link
Contributor Author

jreback commented Sep 21, 2016

@TomAugspurger well our entire test suite passes, so I don't think this is a big deal as not changing functionaility (this is almost a bug fix).

@@ -1404,6 +1404,7 @@ Performance Improvements
- Improved performance of datetime string parsing in ``DatetimeIndex`` (:issue:`13692`)
- Improved performance of hashing ``Period`` (:issue:`12817`)
- Improved performance of ``factorize`` of datetime with timezone (:issue:`13750`)
- Improved performance of of not eagerly creating indexing hashtables on larges Indexes (:issue:`14266`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp, I know!

@jreback
Copy link
Contributor Author

jreback commented Sep 21, 2016

some nice perf improvements as well (note these are backwards, in that a7469rc9 is master)

  [9ced5d50] [a7469cf9]
+   14.98μs    54.75ms   3656.26  indexing.series_getitem_scalar.time_series_getitem_scalar
+   30.83μs    54.60ms   1770.70  indexing.series_ix_scalar.time_series_ix_scalar
+   50.48μs    52.97ms   1049.33  indexing.series_loc_slice.time_series_loc_slice
+   54.64μs    53.11ms    972.14  indexing.series_ix_slice.time_series_ix_slice
+    2.21ms    56.38ms     25.52  timeseries.timeseries_large_lookup_value.time_timeseries_large_lookup_value
+    1.36ms     4.17ms      3.08  timeseries.timeseries_datetimeindex_offset_delta.time_timeseries_datetimeindex_offset_delta
+  231.80μs   514.32μs      2.22  timeseries.datetimeindex_add_offset.time_datetimeindex_add_offset
-    3.45ms     1.50ms      0.43  index_object.index_datetime_set_difference.time_index_datetime_difference_disjoint
-   51.35ms    21.71ms      0.42  frame_methods.frame_getitem_single_column.time_frame_getitem_single_column
-  312.45μs   129.48μs      0.41  timeseries.timeseries_asof_single.time_timeseries_asof_single
-  239.91μs    92.65μs      0.39  index_object.datetime_index_union.time_datetime_index_union
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback
Copy link
Contributor Author

jreback commented Sep 21, 2016

@ssanderson if you could test out would be great.

@codecov-io
Copy link

codecov-io commented Sep 21, 2016

Current coverage is 85.25% (diff: 100%)

Merging #14270 into master will decrease coverage by <.01%

@@             master     #14270   diff @@
==========================================
  Files           140        140          
  Lines         50568      50568          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43112      43111     -1   
- Misses         7456       7457     +1   
  Partials          0          0          

Powered by Codecov. Last update a7469cf...968a4f7

@ssanderson
Copy link
Contributor

@jreback I'll take a look shortly.

@llllllllll
Copy link
Contributor

This change looks good to me, not sure if I can easily test zipline with this though. Running the memory usage example from the issue shows the memory usage pretty constant across the get_loc call. Thanks for the really quick fix!

@ssanderson
Copy link
Contributor

I'm installing this into a fresh zipline build to see if it runs (there may
be upgrade work that needs to happen to be compat with 0.19, so if that
doesn't work I'll probably just confirm the minimal repro case as well.)

On Wed, Sep 21, 2016 at 1:55 PM, Joe Jevnik [email protected]
wrote:

This change looks good to me, not sure if I can easily test zipline with
this though. Running the memory usage example from the issue shows the
memory usage pretty constant across the get_loc call. Thanks for the really
quick fix!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14270 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABg8hf0Wb0pogI1wR6RgrZFbCKUhloBPks5qsW-IgaJpZM4KCxuk
.

@ssanderson
Copy link
Contributor

LGTM. Thanks for the quick turnaround @jreback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large Monotonic Index Objects Always Allocate Hash Tables on get_loc
5 participants