Skip to content

REF: avoid getattr pattern for join_indexer #29117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 22, 2019

Conversation

jbrockmendel
Copy link
Member

No description provided.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2019

can u perf check to make sure that does the right thing

@jbrockmendel
Copy link
Member Author

can u perf check to make sure that does the right thing

Noise

    before     after       ratio
  [e623f0fb] [7b88515e]
+  135.77ms   292.28ms      2.15  binary_ops.Ops.time_frame_multi_and(True, 'default')
+   13.20ms    19.18ms      1.45  binary_ops.Ops.time_frame_add(True, 'default')
+    2.10ms     2.74ms      1.31  stat_ops.Correlation.time_corr('pearson', True)
+    7.23μs     8.76μs      1.21  timeseries.DatetimeIndex.time_get('tz_aware')
+   13.04μs    15.67μs      1.20  offset.OffestDatetimeArithmetic.time_apply(<YearEnd: month=12>)
+    1.92μs     2.30μs      1.20  timestamp.TimestampOps.time_tz_convert(<UTC>)
+   27.29μs    32.50μs      1.19  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<Day>)
+   13.31ms    15.82ms      1.19  timeseries.ToDatetimeFormat.time_no_exact
+   14.45μs    17.10μs      1.18  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessQuarterEnd: startingMonth=3>)
+   13.34μs    15.63μs      1.17  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<MonthEnd>)
+   13.09ms    15.31ms      1.17  rolling.EWMMethods.time_ewm('DataFrame', 1000, 'float', 'std')
+   56.09ms    65.07ms      1.16  eval.Eval.time_add('python', 'all')
+    1.28ms     1.48ms      1.15  inference.NumericInferOps.time_subtract(<class 'numpy.int32'>)
+    4.94ms     5.70ms      1.15  strings.Cat.time_cat(0, ',', '-', 0.0)
+    2.34μs     2.69μs      1.15  timestamp.TimestampOps.time_replace_None(tzutc())
+   15.80ms    18.16ms      1.15  gil.ParallelReadCSV.time_read_csv('object')
+    6.36μs     7.27μs      1.14  timestamp.TimestampOps.time_normalize(None)
+    3.68μs     4.19μs      1.14  timedelta.TimedeltaConstructor.time_from_np_timedelta
+  564.77μs   642.94μs      1.14  stat_ops.Correlation.time_corr_series('kendall', True)
+    4.94ms     5.63ms      1.14  inference.DateInferOps.time_subtract_datetimes
+    6.55μs     7.44μs      1.14  timeseries.AsOf.time_asof_single_early('Series')
+   17.61ms    19.99ms      1.13  frame_methods.MaskBool.time_frame_mask_bools
+    2.54ms     2.88ms      1.13  inference.NumericInferOps.time_divide(<class 'numpy.int64'>)
+   13.01μs    14.73μs      1.13  offset.OffestDatetimeArithmetic.time_apply(<MonthEnd>)
+     1.19s      1.35s      1.13  timeseries.Iteration.time_iter(<function period_range at 0x7f11b4b42b70>)
+   17.21μs    19.45μs      1.13  period.PeriodUnaryMethods.time_now('M')
+  125.99ms   142.15ms      1.13  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
+   13.72μs    15.45μs      1.13  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthBegin>)
+   23.05ms    25.93ms      1.13  strings.Methods.time_title
+  265.95ms   298.69ms      1.12  io.json.ToJSON.time_to_json('index', 'df_int_floats')
+     8.99s     10.09s      1.12  stat_ops.Correlation.time_corr_wide('kendall', False)
+  901.09ns     1.01μs      1.12  period.PeriodProperties.time_property('min', 'dayofweek')
+   23.52μs    26.34μs      1.12  offset.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterEnd: startingMonth=3>)
+   34.80ms    38.95ms      1.12  gil.ParallelGroupbyMethods.time_parallel(2, 'prod')
+  819.06μs   915.96μs      1.12  stat_ops.Correlation.time_corr_series('spearman', True)
+  572.74μs   640.07μs      1.12  stat_ops.Correlation.time_corr_series('kendall', False)
+   10.10ms    11.28ms      1.12  algorithms.Duplicated.time_duplicated('first', 'string')
+    3.50μs     3.91μs      1.11  indexing_engines.ObjectEngineIndexing.time_get_loc('monotonic_incr')
+   14.17μs    15.78μs      1.11  offset.OffestDatetimeArithmetic.time_apply(<SemiMonthEnd: day_of_month=15>)
+   23.55μs    26.20μs      1.11  offset.OffestDatetimeArithmetic.time_add(<CustomBusinessDay>)
+  135.84ms   151.01ms      1.11  inference.ToNumericDowncast.time_downcast('string-nint', None)
+    7.64ms     8.48ms      1.11  series_methods.NanOps.time_func('kurt', 1000000, 'int64')
+   27.82μs    30.85μs      1.11  timestamp.TimestampProperties.time_is_quarter_end(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, 'B')
+  390.29ms   432.65ms      1.11  rolling.Apply.time_rolling('Series', 1000, 'int', <function sum at 0x7f11bbdd99d8>, True)
+  891.93ns   988.13ns      1.11  period.PeriodProperties.time_property('M', 'month')
+   13.77μs    15.26μs      1.11  series_methods.SearchSorted.time_searchsorted('int32')
+    8.58ms     9.49ms      1.11  timeseries.Factorize.time_factorize('Asia/Tokyo')
+   12.81μs    14.16μs      1.11  offset.OffestDatetimeArithmetic.time_apply(<MonthBegin>)
+   25.46ms    28.13ms      1.11  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlite', 'bool')
+   14.45μs    15.96μs      1.10  offset.OffestDatetimeArithmetic.time_add(<BusinessMonthBegin>)
+     9.15s     10.10s      1.10  stat_ops.Correlation.time_corr_wide('kendall', True)
+   21.89μs    24.15μs      1.10  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessDay>)
+   14.26μs    15.73μs      1.10  offset.OffestDatetimeArithmetic.time_add(<MonthBegin>)
+  445.02ms   490.51ms      1.10  rolling.Apply.time_rolling('Series', 1000, 'int', <function Apply.<lambda> at 0x7f11d5229598>, True)
+   24.81μs    27.33μs      1.10  offset.OffestDatetimeArithmetic.time_subtract_10(<SemiMonthEnd: day_of_month=15>)
+   60.44ms    66.56ms      1.10  gil.ParallelFactorize.time_parallel(4)
-    7.73μs     7.02μs      0.91  indexing.NonNumericSeriesIndexing.time_getitem_scalar('string', 'unique_monotonic_inc')
-    2.12ms     1.92ms      0.91  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-   17.03ms    15.44ms      0.91  frame_methods.MaskBool.time_frame_mask_floats
-   75.17ms    68.04ms      0.91  gil.ParallelGroupbyMethods.time_parallel(4, 'prod')
-   14.77μs    13.37μs      0.91  offset.OffestDatetimeArithmetic.time_apply(<BusinessQuarterEnd: startingMonth=3>)
-   30.46μs    27.48μs      0.90  series_methods.All.time_all(1000, 'slow')
-  216.05ns   194.88ns      0.90  timestamp.TimestampProperties.time_weekday_name(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-    5.75ms     5.19ms      0.90  gil.ParallelRolling.time_rolling('mean')
-   10.30μs     9.28μs      0.90  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-   16.47μs    14.81μs      0.90  offset.OffestDatetimeArithmetic.time_add(<BusinessMonthEnd>)
-    8.27ms     7.43ms      0.90  io.csv.ParseDateComparison.time_to_datetime_dayfirst(False)
-    3.70μs     3.33μs      0.90  timeseries.DatetimeIndex.time_get('dst')
-  239.93ms   215.40ms      0.90  multiindex_object.Values.time_datetime_level_values_copy
-   15.54μs    13.92μs      0.90  series_methods.SearchSorted.time_searchsorted('uint32')
-   14.77μs    13.23μs      0.90  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<QuarterEnd: startingMonth=3>)
-   76.47ms    68.49ms      0.90  strings.Repeat.time_repeat('array')
-   48.40ms    43.35ms      0.90  timedelta.ToTimedeltaErrors.time_convert('coerce')
-   15.80μs    14.12μs      0.89  offset.OffestDatetimeArithmetic.time_apply_np_dt64(<YearBegin: month=1>)
-    1.87μs     1.67μs      0.89  timestamp.TimestampConstruction.time_parse_today
-   11.00μs     9.82μs      0.89  timestamp.TimestampProperties.time_is_quarter_start(tzutc(), 'B')
-   48.04ms    42.87ms      0.89  io.parsers.DoesStringLookLikeDatetime.time_check_datetimes('2Q2005')
-    7.10μs     6.33μs      0.89  timestamp.TimestampOps.time_normalize(<UTC>)
-   51.82μs    46.18μs      0.89  multiindex_object.GetLoc.time_med_get_loc
-  577.90μs   514.94μs      0.89  multiindex_object.Duplicates.time_remove_unused_levels
-   17.18μs    15.25μs      0.89  categoricals.Contains.time_categorical_contains
-   15.23μs    13.52μs      0.89  offset.OffestDatetimeArithmetic.time_apply(<BusinessMonthEnd>)
-    4.91μs     4.33μs      0.88  timedelta.TimedeltaConstructor.time_from_int
-    5.81ms     5.10ms      0.88  gil.ParallelRolling.time_rolling('var')
-   12.65μs    11.08μs      0.88  timestamp.TimestampOps.time_to_julian_date('US/Eastern')
-  105.53ms    92.30ms      0.87  sparse.SparseSeriesToFrame.time_series_to_frame
-    2.46ms     2.14ms      0.87  categoricals.Concat.time_union
-   23.74μs    20.43μs      0.86  offset.OffestDatetimeArithmetic.time_subtract(<QuarterBegin: startingMonth=3>)
-   23.19μs    19.91μs      0.86  period.PeriodUnaryMethods.time_asfreq('min')
-   25.57ms    21.88ms      0.86  stat_ops.FrameOps.time_op('median', 'float', 1, False)
-  185.16μs   157.25μs      0.85  period.PeriodConstructor.time_period_constructor('D', False)
-   27.40μs    23.22μs      0.85  offset.OffestDatetimeArithmetic.time_subtract_10(<QuarterEnd: startingMonth=3>)
-   14.28μs    12.06μs      0.84  timestamp.TimestampOps.time_replace_None('US/Eastern')
-   28.62μs    23.97μs      0.84  timestamp.TimestampProperties.time_month_name(<DstTzInfo 'Europe/Amsterdam' LMT+0:20:00 STD>, None)
-  290.44ns   236.05ns      0.81  timestamp.TimestampProperties.time_is_month_start(tzutc(), None)
-    3.27ms     2.62ms      0.80  binary_ops.Ops2.time_frame_series_dot
-   25.24μs    20.09μs      0.80  offset.OffestDatetimeArithmetic.time_subtract_10(<BusinessMonthEnd>)
-    2.59ms     2.02ms      0.78  indexing.IntervalIndexing.time_getitem_scalar
-   18.03μs    13.95μs      0.77  series_methods.SearchSorted.time_searchsorted('int8')
-    1.16μs   900.98ns      0.77  period.PeriodProperties.time_property('M', 'year')
-  205.07ms   153.54ms      0.75  binary_ops.Ops2.time_frame_dot
-   19.81μs     7.74μs      0.39  algorithms.MaybeConvertObjects.time_maybe_convert_objects

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jbrockmendel jbrockmendel changed the title REF: let cython handle dtype specialization for join_indexer REF: avoid getattr pattern for join_indexer Oct 21, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jbrockmendel a lot of nice cleanups lately great job

@WillAyd WillAyd added the Clean label Oct 22, 2019
@WillAyd WillAyd added this to the 1.0 milestone Oct 22, 2019
exp = np.array([-1, -1, -1], dtype=np.int64)
tm.assert_numpy_array_equal(rindexer, exp)
@pytest.mark.parametrize(
"dtype", ["int32", "int64", "float32", "float64", "object"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point if can change to use a fixture to ensure more coverage (we may need to expand the indexers here as well), pls create a followup issue.

@jreback jreback merged commit c7462ed into pandas-dev:master Oct 22, 2019
@jreback
Copy link
Contributor

jreback commented Oct 22, 2019

lgtm, followup issue.

@jbrockmendel jbrockmendel deleted the cyfuncs branch October 22, 2019 15:07
HawkinsBA pushed a commit to HawkinsBA/pandas that referenced this pull request Oct 29, 2019
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants