Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime #33919

TomAugspurger · 2020-05-01T13:47:36Z

import pandas as pd
import numpy as np
from pandas._testing import test_parallel

N = 10 ** 6
dti = pd.date_range("1900-01-01", periods=N, freq="T")
period = dti.to_period("D")



@test_parallel(num_threads=2)
def run(period):
    period.to_timestamp()

%timeit run(period)

# 1.0.2
96.8 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# master
129 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

https://pandas.pydata.org/speed/pandas/index.html#gil.ParallelDatetimeFields.time_period_to_datetime?commits=d106b81ce532bc71ec6cced944ddb751a4b0e5a3-577de1c6b5cda7f5ae0e4832c2bc3f97ca186e9b points to d106b81...577de1c, perhaps #33491 or #33047 (cc @jbrockmendel)

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-05-01T18:18:57Z

using %prun -s cumtime for n in range(100): period.to_timestamp() the only thing that stands out is periodarr_to_dt64. It's plausible that #33491 is responsible.

TomAugspurger · 2020-07-02T21:00:36Z

One note: my code above used test_parallel, but the single-threaded version is also slower:

# 1.0.4
%timeit period.to_timestamp()
53 ms ± 740 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# master
In [10]: %timeit period.to_timestamp()
71.1 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So the current implementation is likely just slower.

TomAugspurger · 2020-07-07T20:56:50Z

It is periodarr_to_dt64 that's slower

In [1]: import numpy as np
   ...: from pandas._libs.tslibs import period as libperiod
   ...:
   ...: i8vals = np.arange(0, 100_000)
   ...: %timeit libperiod.periodarr_to_dt64arr(i8vals, 6000)
   ...:
   ...:

# 1.0.4
3.74 ms ± 73 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# master
5.35 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Any thoughts on how to progress @jbrockmendel? I'm not familiar with how to profile C/Cython code.

jbrockmendel · 2020-07-07T21:19:38Z

I've got a couple of ideas that might optimize this a bit, will give it a go. If those don't work, then I think we're back to reverting #33491

jbrockmendel · 2020-07-07T21:34:49Z

yep, 2.44ms on the example above, will make a PR

TomAugspurger added Datetime Datetime data dtype Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels May 1, 2020

TomAugspurger added this to the 1.1 milestone May 1, 2020

jbrockmendel mentioned this issue Jul 8, 2020

PERF: periodarr_to_dt64arr #35171

Merged

5 tasks

jreback closed this as completed in #35171 Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime #33919

Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime #33919

TomAugspurger commented May 1, 2020

jbrockmendel commented May 1, 2020

TomAugspurger commented Jul 2, 2020

TomAugspurger commented Jul 7, 2020 •

edited

Loading

jbrockmendel commented Jul 7, 2020

jbrockmendel commented Jul 7, 2020

Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime #33919

Performance regression in gil.ParallelDatetimeFields.time_period_to_datetime #33919

Comments

TomAugspurger commented May 1, 2020

jbrockmendel commented May 1, 2020

TomAugspurger commented Jul 2, 2020

TomAugspurger commented Jul 7, 2020 • edited Loading

jbrockmendel commented Jul 7, 2020

jbrockmendel commented Jul 7, 2020

TomAugspurger commented Jul 7, 2020 •

edited

Loading