-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Period factorization very slow in 0.19.0 #14338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you simplify the example here to the simplest possible setup? e.g., by removing the |
It's not the In [24]: %time p = pd.DatetimeIndex(df.date).to_period('D')
CPU times: user 339 ms, sys: 6.71 ms, total: 345 ms
Wall time: 350 ms Also, post the actual code you're running if you could (imports too). |
Here's a smaller example, import time
import pandas as pd
p = pd.period_range('2010-01-01', freq='D', periods=100000)
t0 = time.time()
pd.factorize(p)
t1 = time.time()
print('{}: {:.2f}s'.format(pd.__version__, t1 - t0)) Some outputs:
|
This is probably due to # 0.18.1
In [5]: np.asarray(p)
Out[5]: array([ 14610, 14611, 14612, ..., 114607, 114608, 114609]) # 0.19
In [4]: np.asarray(p)
Out[4]:
array([Period('2010-01-01', 'D'), Period('2010-01-02', 'D'),
Period('2010-01-03', 'D'), ..., Period('2283-10-14', 'D'),
Period('2283-10-15', 'D'), Period('2283-10-16', 'D')], dtype=object) cc @sinhrks I think. |
Probably just need a check similar to datetimetz around here to view as an https://github.com/pydata/pandas/blob/v0.19.0/pandas/core/algorithms.py#L294 |
@MattRijk Personally, I use SublimeText, usually just on a laptop. But this is off topic for this issue. |
yeah this is a pretty easy fix, IIRC this was in @sinhrks PeriodBlock PR, but must have been backed out...something like
|
Caused by #13988. I think the logic of period/datetimetz can be merged using And the following comment is no longer correct... |
Looks like a 0.19.1 may be close around the corner... |
Expected Output
outputs dataframe
Output of
pd.show_versions()
0.19.0
The output is not the issue, the issue is that in any version before 0.19.0, this was incredibly fast, like ~1 second or less. With 0.19.0, after waiting many minutes I just give up.
The text was updated successfully, but these errors were encountered: