Skip to content

PERF: periodarr_to_dt64arr #35171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 9, 2020

Conversation

jbrockmendel
Copy link
Member

Re-use ensure_datetime64ns for the subset of cases where that is applicable. I'm at a loss for why we get such a slowdown for the other cases.

       before           after         ratio
     [42a6d445]       [a0ece778]
     <master>         <perf-periodarr_to_dt64arr>
+      48.7±0.8ms          133±6ms     2.72  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 4000)
+        48.0±1ms        131±0.8ms     2.72  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 5000)
+        493±10μs      1.33±0.05ms     2.70  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 4000)
+      47.8±0.4ms          125±3ms     2.62  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 4006)
+        599±20μs      1.47±0.07ms     2.46  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 2000)
+      60.6±0.9ms          149±7ms     2.45  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 3000)
+        524±40μs      1.27±0.04ms     2.42  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 4006)
+        587±10μs      1.42±0.04ms     2.41  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 1000)
+      59.9±0.8ms        144±0.9ms     2.40  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 1000)
+        526±40μs      1.26±0.05ms     2.39  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 5000)
+        61.0±1ms          144±2ms     2.35  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 2011)
+      60.8±0.5ms        143±0.9ms     2.35  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 2000)
+      7.72±0.3μs         17.7±3μs     2.30  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 4000)
+      63.7±0.6ms          145±3ms     2.28  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 1011)
+     3.04±0.06μs         6.94±1μs     2.28  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 6000)
+        611±10μs      1.38±0.06ms     2.27  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 2011)
+        653±20μs      1.44±0.03ms     2.20  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 1011)
+     2.82±0.07μs       6.12±0.2μs     2.17  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 6000)
+     2.81±0.06μs       6.06±0.2μs     2.16  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 9000)
+     2.99±0.06μs       6.16±0.2μs     2.06  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 8000)
+     2.80±0.06μs       5.75±0.2μs     2.05  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 11000)
+     2.95±0.09μs       6.02±0.4μs     2.04  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 9000)
+     2.94±0.06μs       5.97±0.2μs     2.03  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 7000)
+        701±50μs      1.41±0.01ms     2.01  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 3000)
+      3.09±0.1μs       6.18±0.1μs     2.00  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 7000)
+      3.02±0.2μs       6.02±0.3μs     1.99  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 8000)
+      3.05±0.1μs       5.95±0.2μs     1.95  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 10000)
+      7.98±0.3μs       15.4±0.3μs     1.93  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 4006)
+      9.01±0.4μs       17.3±0.8μs     1.92  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 2011)
+      2.99±0.1μs       5.66±0.2μs     1.89  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 11000)
+     3.09±0.09μs       5.82±0.1μs     1.88  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 10000)
+      8.20±0.4μs       15.3±0.5μs     1.87  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 5000)
+      9.37±0.6μs       17.4±0.5μs     1.86  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 2000)
+      9.28±0.6μs       17.2±0.7μs     1.85  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 3000)
+      9.61±0.3μs       17.6±0.6μs     1.83  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 1011)
+      10.1±0.7μs       17.5±0.7μs     1.73  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 1000)
-      2.93±0.1μs       2.48±0.1μs     0.85  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 4000)
-     2.90±0.04μs      2.44±0.08μs     0.84  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 2011)
-     2.92±0.06μs       2.44±0.1μs     0.84  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 1011)
-      3.08±0.1μs       2.56±0.1μs     0.83  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 2011)
-      3.03±0.2μs      2.52±0.09μs     0.83  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 4006)
-      9.41±0.4μs       7.79±0.2μs     0.83  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 8000)
-      2.97±0.1μs       2.42±0.1μs     0.81  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 2000)
-      2.99±0.1μs      2.40±0.07μs     0.80  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 4006)
-      3.18±0.1μs       2.55±0.1μs     0.80  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 1000)
-      3.21±0.2μs      2.48±0.03μs     0.77  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 5000)
-      3.07±0.2μs      2.34±0.05μs     0.76  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 3000)
-      3.08±0.1μs      2.34±0.04μs     0.76  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 5000)
-      3.23±0.1μs       2.43±0.1μs     0.75  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 1000)
-      3.16±0.1μs      2.37±0.05μs     0.75  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 2000)
-      3.28±0.3μs      2.33±0.08μs     0.71  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 4000)
-      3.44±0.3μs       2.36±0.1μs     0.69  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 1011)
-        398±20μs          242±7μs     0.61  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 6000)
-        39.0±1ms       23.7±0.4ms     0.61  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 6000)
-      62.4±0.5ms       28.9±0.5ms     0.46  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 11000)
-      64.4±0.2ms       28.6±0.4ms     0.44  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 10000)
-        657±20μs         290±10μs     0.44  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 11000)
-      64.7±0.4ms       27.9±0.4ms     0.43  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 9000)
-        659±20μs          279±8μs     0.42  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 7000)
-      64.1±0.6ms       27.1±0.4ms     0.42  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 8000)
-      64.6±0.9ms       26.7±0.3ms     0.41  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 7000)
-        661±20μs          272±8μs     0.41  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 8000)
-        661±10μs         269±10μs     0.41  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 9000)
-        698±50μs          282±7μs     0.40  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 10000)
-      3.02±0.1μs         344±10ns     0.11  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 12000)
-      3.21±0.2μs         342±10ns     0.11  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 12000)
-      10.6±0.5μs         339±10ns     0.03  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 12000)
-        661±20μs          346±5ns     0.00  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 12000)
-      62.3±0.6ms         368±10ns     0.00  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 12000)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment on the slowdowns on the particular cases, e.g. 4000, 5000, I guess we don't care about these as much

@@ -946,9 +947,26 @@ def periodarr_to_dt64arr(const int64_t[:] periodarr, int freq):
int64_t[:] out
Py_ssize_t i, l

l = len(periodarr)
if freq >= FR_DAY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here that you are short-circuiting for perf


out = np.empty(l, dtype='i8')
l = len(periodarr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on which cases this hits

@jreback jreback added Performance Memory or execution speed performance Period Period data type labels Jul 8, 2020

out = np.empty(l, dtype='i8')
l = len(periodarr)
out = np.empty(l, dtype="i8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a nogil loop right? (actually couldn't this entire function)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth doing this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cant be nogil becuase it calls check_dts_bounds, which may raise

@jbrockmendel
Copy link
Member Author

can you comment on the slowdowns on the particular cases, e.g. 4000, 5000, I guess we don't care about these as much

As mentioned in the OP, i dont understand why we see slowdowns this big. It should just be one extra if freq >= FR_DAY check, shouldn't make a major difference in the non-shortcircuited cases

@TomAugspurger TomAugspurger added this to the 1.1 milestone Jul 8, 2020
@jbrockmendel
Copy link
Member Author

i think i might be able to fastpath some more cases, will give that a shot

@jbrockmendel
Copy link
Member Author

OK, i think i got these addressed:

       before           after         ratio
     [42fd7e7d]       [3fe9e107]
     <cln-nat-cmp>       <perf-periodarr_to_dt64arr>
+      2.98±0.1μs      8.88±0.08μs     2.97  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 6000)
+      3.32±0.2μs       8.17±0.7μs     2.46  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 11000)
+     3.01±0.09μs       7.33±0.3μs     2.44  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 8000)
+     3.02±0.08μs       7.32±0.3μs     2.42  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 10000)
+      2.93±0.1μs       7.07±0.2μs     2.41  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 8000)
+      3.25±0.3μs       7.43±0.2μs     2.28  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 9000)
+      3.38±0.2μs       7.70±0.6μs     2.28  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 7000)
+      3.14±0.1μs       6.98±0.2μs     2.23  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 11000)
+      3.10±0.2μs       6.68±0.2μs     2.16  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 9000)
+      3.58±0.4μs       7.34±0.3μs     2.05  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 7000)
+      3.34±0.3μs       6.76±0.2μs     2.02  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 10000)
+      3.59±0.3μs       7.07±0.2μs     1.97  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 6000)
-        39.2±1ms       23.3±0.6ms     0.59  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 6000)
-        439±30μs         239±10μs     0.54  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 6000)
-      63.1±0.5ms         30.8±3ms     0.49  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 10000)
-        649±10μs         309±30μs     0.48  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 10000)
-        65.5±2ms         30.2±3ms     0.46  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 11000)
-        64.7±2ms         29.6±3ms     0.46  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 9000)
-      62.9±0.6ms       27.1±0.7ms     0.43  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 7000)
-        64.0±1ms       26.9±0.6ms     0.42  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 8000)
-        664±20μs          267±6μs     0.40  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 7000)
-       784±100μs         308±20μs     0.39  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 11000)
-        725±90μs          282±7μs     0.39  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 9000)
-      3.16±0.2μs      1.14±0.04μs     0.36  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1, 12000)
-     3.11±0.07μs      1.11±0.04μs     0.36  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(0, 12000)
-       809±100μs          287±6μs     0.36  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 8000)
-      9.57±0.3μs      1.12±0.03μs     0.12  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(100, 12000)
-        654±20μs      1.18±0.06μs     0.00  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(10000, 12000)
-        64.4±1ms      1.18±0.04μs     0.00  tslibs.period.TimePeriodArrToDT64Arr.time_periodarray_to_dt64arr(1000000, 12000)

We still have some cases that see a 2-3x slowdown, but those are now all length-0 or length-1 cases where the actual time difference is tiny (and makes sense)

@jreback jreback merged commit 9b95eeb into pandas-dev:master Jul 9, 2020
@jreback
Copy link
Contributor

jreback commented Jul 9, 2020

thanks

@jbrockmendel jbrockmendel deleted the perf-periodarr_to_dt64arr branch July 9, 2020 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime
3 participants