You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find working with pd.Period very slow. Much much slower than working with pd.Timestamp and I was wondering why that is. I have around 2 million records and I'm forced to work with full timestamps even though my data is really on the level of months. Here is an example that I think shows the discrepancy between working with these types.
Code Sample, a copy-pastable example if possible
importpandasaspdimportnumpyasnp# create a data frame with 500K datesdates=pd.date_range('1/1/2011', periods=500000, freq='H')
values=np.random.random(size=len(dates))
df=pd.DataFrame({"Date":dates, "Values":values})
Returning the values of the period is also very slow
%%timeperiods=df.Period.unique()
Wall time: 17.4 s
%%timetimes=df.Date.unique()
Wall time: 69 ms
Unrelated but I also found that I can only do the period comparison on a series in one direction. This throws an error even though all I did was flip the terms in the inequality
I find working with
pd.Period
very slow. Much much slower than working withpd.Timestamp
and I was wondering why that is. I have around 2 million records and I'm forced to work with full timestamps even though my data is really on the level of months. Here is an example that I think shows the discrepancy between working with these types.Code Sample, a copy-pastable example if possible
Creating the period index is fairly slow
And using boolean indexing with periods is extremely slow
Especially when compared with using timestamps!
Returning the values of the period is also very slow
Unrelated but I also found that I can only do the period comparison on a series in one direction. This throws an error even though all I did was flip the terms in the inequality
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: