-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Some faster date/time string formatting #46759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Some faster date/time string formatting #46759
Conversation
…imes faster ! Related to pandas-dev#44764
get_repr = lambda: repr(self.offset1)
glob = globals()
glob.update(locals())
res = Timer("get_repr()", globals=glob).repeat(3, 10000) before: [0.15039550000000013, 0.16580459999999952, 0.12403839999999988] = 4 times faster! |
…eIndex`: string formatting is now up to 80% faster (as fast as default) when one of the default strftime formats ``"%Y-%m-%d %H:%M:%S"`` or ``"%Y-%m-%d %H:%M:%S.%f"`` is used. See pandas-dev#44764
…me.to_excel` (:class:`ExcelFormatter`): processing dates can be up to 4% faster. (related to :issue:`44764`)
…QLiteTable`): processing time arrays can be up to 65% faster ! (related to pandas-dev#44764)
…ure/44764_strftime_perf_datetimes
SQL mini-bench: import sqlite3
con = sqlite3.connect(':memory:')
dt = pd.date_range('2014-01-01', periods=1000, freq='10h')
df = pd.DataFrame({'time': dt.time})
from timeit import Timer
to_assess = lambda: df.to_sql('test_datetime', con, index=False, if_exists='replace')
glob = globals()
glob.update(locals())
res = Timer("to_assess()", globals=glob).repeat(5, 300)
print(res) before: [2.667514099999991, 2.711463600000002, 2.7073388000000023, 2.6616616999999962, 2.7392122000000114] |
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these have specific asv's that test performance? if not pls add. pls show them in the PR
…formatting" This reverts commit 139f229
…mezone. Made it faster using f-string
Thanks to both of you for the quick review ! I took care of all comments.
Ok. I never used asv, so I'll have to check how this works. It may take a bit of time until I come back with something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, pls rebase
…ure/44764_strftime_perf_datetimes � Conflicts: � doc/source/whatsnew/v1.5.0.rst � pandas/_libs/tslib.pyx
It seems that everything is ok now. Thanks for this review! |
@jreback : could we move forward on this ? thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks fine. @jbrockmendel @mroeschke if any comments
…ure/44764_strftime_perf_datetimes � Conflicts: � pandas/_libs/tslib.pyx
Note after updating the branch to the latest @jbrockmendel @jreback @mroeschke if one of you originated this mod, could you please suggest a docstring sentence and check that my interpretation (of not taking it into account for basic format day branch) is still relevant ? EDIT: @jbrockmendel answered so this is OK now. |
…ure/44764_strftime_perf_datetimes � Conflicts: � pandas/_libs/tslib.pyx
Ping ? |
…ure/44764_strftime_perf_datetimes � Conflicts: � doc/source/whatsnew/v1.5.0.rst
I updated the PR to the latest head, it builds ok. @jreback approved. Could we proceed ? @jreback @mroeschke @jbrockmendel |
thanks @smarie |
Great ! Thanks all for the reviews |
@@ -0,0 +1,64 @@ | |||
import numpy as np | |||
|
|||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late review; these benchmarks don't belong in the tslibs directory (unless you can re-write them to not depend on anything outside of tslibs). can you do a follow-up to move them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sure, do you think that moving them one folder up would be enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would do it, yes. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #47665 . Let's wait for a successful build just in case..
…libs dir (#47665) Code review from #46759 : moved benchmark file Co-authored-by: Sylvain MARIE <[email protected]>
…side of tslibs dir (pandas-dev#47665) Code review from pandas-dev#46759 : moved benchmark file Co-authored-by: Sylvain MARIE <[email protected]>
"Easy" fixes related to #44764, taken from the big PR #46116:
BusinessHour
,repr
is now 4 times fasterDatetimeArray
andDatetimeIndex
: string formatting is now up to 80% faster (as fast as default) when one of the default strftime formats"%Y-%m-%d %H:%M:%S"
or"%Y-%m-%d %H:%M:%S.%f"
is used (see full bench in PR [WIP] perf improvements for strftime #46116).Series.to_excel
andDataFrame.to_excel
(seeExcelFormatter
): processing dates can be up to 4% faster.Series.to_sql
andDataFrame.to_sql
(SQLiteTable
): processing time arrays can be up to 65% faster !I also added a few code comments about datetime formatting sections, to improve maintenance.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.