-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[READY] Improved performance of Period
's default formatter (period_format
)
#51459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[READY] Improved performance of Period
's default formatter (period_format
)
#51459
Conversation
…ure/44764_perf_issue_new_period
…com/smarie/pandas into feature/44764_perf_issue_new_period
It seems that the failing tests are not related to this PR's contents, do you confirm ? |
I added a few asvs for the to_csv operations and updated the what's new. This is now ready for review @MarcoGorelli
I found the reason and added a docstring on that bench function. The |
…t was improved: the date_format parameter is not taken into account today.
Note: the failing test seeems to be related to a network issue, so completely unrelated.
@mroeschke since you have reviewed my similar PRs for strftime in the past, do not hesitate to jump in too. Note that once this PR will be merged, #51298 will be finally small enough to be revewed |
Period
's default formatter (period_format
)Period
's default formatter (period_format
)
…ure/44764_perf_issue_new_period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, the changes themselves look good to me! Just got a question
Regarding date_format
only taking effect for Period
if it's in the index - that looks like a bug to me, well done for noticing! I don't see any issue open about this, do you want to open one? (else I can, no worries)
asv_bench/benchmarks/strftime.py
Outdated
def time_frame_period_formatting_custom(self, obs, fq): | ||
self.data["p"].dt.strftime(date_format="%Y-%m-%d --- %H:%M:%S") | ||
|
||
def time_frame_period_formatting_iso8601_strftime_Z(self, obs, fq): | ||
self.data["p"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%SZ") | ||
|
||
def time_frame_period_formatting_iso8601_strftime_offset(self, obs, fq): | ||
"""Not optimized yet as %z is not supported by `convert_strftime_format`""" | ||
self.data["p"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%S%z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do these not show up in the asv results you posted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, it seems that I forgot them in the copy paste. I'll try to rerun asv partially to get them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Actually I had removed all asvs that were expected not to show improvements (custom date formats). Here are all ASVs for PeriodStrftime
:
15.4±0.2ms 17.1±2ms ~1.11 strftime.PeriodStrftime.time_frame_period_formatting_custom(1000, 'D')
15.0±0.7ms 14.1±0.2ms 0.94 strftime.PeriodStrftime.time_frame_period_formatting_custom(1000, 'H')
142±6ms 153±5ms 1.08 strftime.PeriodStrftime.time_frame_period_formatting_custom(10000, 'D')
143±5ms 157±7ms 1.10 strftime.PeriodStrftime.time_frame_period_formatting_custom(10000, 'H')
- 4.37±0.07ms 2.24±0.06ms 0.51 strftime.PeriodStrftime.time_frame_period_formatting_default(1000, 'D')
- 4.72±0.06ms 2.90±0.1ms 0.61 strftime.PeriodStrftime.time_frame_period_formatting_default(1000, 'H')
- 42.8±1ms 24.1±0.7ms 0.56 strftime.PeriodStrftime.time_frame_period_formatting_default(10000, 'D')
- 48.4±0.5ms 24.4±0.2ms 0.50 strftime.PeriodStrftime.time_frame_period_formatting_index_default(10000, 'D')
- 50.9±3ms 27.1±0.7ms 0.53 strftime.PeriodStrftime.time_frame_period_formatting_index_default(10000, 'H')
7.52±0.7ms 7.22±0.3ms 0.96 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_Z(1000, 'D')
7.12±0.1ms 7.00±0.3ms 0.98 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_Z(1000, 'H')
75.2±4ms 68.1±1ms ~0.91 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_Z(10000, 'D')
71.9±4ms 77.3±8ms 1.08 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_Z(10000, 'H')
13.5±0.5ms 14.0±0.4ms 1.03 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_offset(1000, 'D')
15.9±3ms 14.4±1ms ~0.91 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_offset(1000, 'H')
135±3ms 148±7ms 1.10 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_offset(10000, 'D')
140±3ms 136±2ms 0.97 strftime.PeriodStrftime.time_frame_period_formatting_iso8601_strftime_offset(10000, 'H')
Thanks @MarcoGorelli !
Done #51621 |
…ure/44764_perf_issue_new_period # Conflicts: # doc/source/whatsnew/v1.5.4.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @smarie !
Leaving open for a bit in case others have comments
@jbrockmendel @MarcoGorelli would you have a few mins this week to check this PR so that I can move to the next one ? |
yup taking a look now (was off last week) |
pandas/_libs/tslibs/period.pyx
Outdated
return f"{dts.year}" | ||
|
||
elif freq_group == FR_QTR and (is_fmt_none or fmt == "%FQ%q"): | ||
# get quarter and modify dts.year to be the fiscal year (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes themselves look good, I'm just confused by this (?)
. are you asking for confirmation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @MarcoGorelli , "fiscal year" is the wording to confirm in this comment, I wasn't sure that this was the right definition.
get_yq
docstring is not very explicit about this, apart from stating Sets dts.year in-place.
Maybe this docstring could be improved according to the actual answer.
pandas/pandas/_libs/tslibs/period.pyx
Line 900 in 74cbabf
cdef int get_yq(int64_t ordinal, int freq, npy_datetimestruct* dts): |
…ure/44764_perf_issue_new_period # Conflicts: # doc/source/whatsnew/v2.1.0.rst
…iod' into feature/44764_perf_issue_new_period
…ure/44764_perf_issue_new_period
ok should be good - I'll give it a final look and merge this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had another look, and to be honest I'm slightly scared about merging something this user-facing
What I'm most concerned about is that not all these reprs seems to be explicitly tested for. Some of them are, e.g.
pandas/pandas/tests/scalar/period/test_period.py
Lines 711 to 719 in 044227f
def test_millisecond_repr(self): | |
p = Period("2000-01-01 12:15:02.123") | |
assert repr(p) == "Period('2000-01-01 12:15:02.123', 'L')" | |
def test_microsecond_repr(self): | |
p = Period("2000-01-01 12:15:02.123567") | |
assert repr(p) == "Period('2000-01-01 12:15:02.123567', 'U')" |
but then FR_SEC
isn't (as far as I can tell).
Reckon we should add such unit tests for each period group (perhaps in a separate PR, which we could get in very quickly)? That would give me much more confidence about merging this
Really sorry this is taking so long
I did not review the available tests extensively so maybe this is indeed needed. I'll have a look one of these days |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Yes this is still active and even ready for merge, but it would require that another PR pushes a few more tests first so that @MarcoGorelli is more confident about the non-regressive impact :) |
…ure/44764_perf_issue_new_period # Conflicts: # doc/source/whatsnew/v2.1.0.rst
Now that #53003 is merged this should be ok - all new tests seem to pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, let's get this in, thanks @smarie !
let's get this in, thanks @smarie |
Great news @MarcoGorelli thanks ! Thanks again ! |
…_format`) (pandas-dev#51459) * Improved performance of default period formatting (`period_format`). Added corresponding ASVs * Improved ASV for period frames and datetimes * What's new * Update asv_bench/benchmarks/strftime.py * Fixed whats new backticks * Completed whatsnew * Added ASVs for to_csv for period * Aligned the namings * Completed Whats new * Added a docstring explaining why the ASV bench with custom date format was improved: the date_format parameter is not taken into account today. * Moved whatsnew to 2.0.0 * Moved whatsnew to 2.1 * Improved docstring as per code review * Renamed asv params as per code review * Fixed ASV comment as per code review * ASV: renamed parameters as per code review * Improved `period_format`: now the performance is the same when no format is provided and if an explicit `fmt` is provided matching the actual default format for this `freq` * Code review: Improved strftime ASV: set_index is now in the setup * Removed useless main * Removed wrong code * Improved ASVs for period formatting: now there is a "default explicit" everywhere * Update pandas/_libs/tslibs/period.pyx * Update pandas/_libs/tslibs/period.pyx * Update pandas/_libs/tslibs/period.pyx * Minor refactoring to avoid retesting for none several time * Fixed issue: bool does not exist, using bint * Added missing quarter variable as cdef * Fixed asv bug * Code review: fixed docstring * Update doc/source/whatsnew/v2.1.0.rst * fixup --------- Co-authored-by: Sylvain MARIE <[email protected]> Co-authored-by: Marco Edward Gorelli <[email protected]> Co-authored-by: MarcoGorelli <>
…_format`) (pandas-dev#51459) * Improved performance of default period formatting (`period_format`). Added corresponding ASVs * Improved ASV for period frames and datetimes * What's new * Update asv_bench/benchmarks/strftime.py * Fixed whats new backticks * Completed whatsnew * Added ASVs for to_csv for period * Aligned the namings * Completed Whats new * Added a docstring explaining why the ASV bench with custom date format was improved: the date_format parameter is not taken into account today. * Moved whatsnew to 2.0.0 * Moved whatsnew to 2.1 * Improved docstring as per code review * Renamed asv params as per code review * Fixed ASV comment as per code review * ASV: renamed parameters as per code review * Improved `period_format`: now the performance is the same when no format is provided and if an explicit `fmt` is provided matching the actual default format for this `freq` * Code review: Improved strftime ASV: set_index is now in the setup * Removed useless main * Removed wrong code * Improved ASVs for period formatting: now there is a "default explicit" everywhere * Update pandas/_libs/tslibs/period.pyx * Update pandas/_libs/tslibs/period.pyx * Update pandas/_libs/tslibs/period.pyx * Minor refactoring to avoid retesting for none several time * Fixed issue: bool does not exist, using bint * Added missing quarter variable as cdef * Fixed asv bug * Code review: fixed docstring * Update doc/source/whatsnew/v2.1.0.rst * fixup --------- Co-authored-by: Sylvain MARIE <[email protected]> Co-authored-by: Marco Edward Gorelli <[email protected]> Co-authored-by: MarcoGorelli <>
Period
's default formatter (period_format
) is now significantly (~twice) faster.str(Period)
,repr(Period)
, andPeriod.strftime(fmt=None)
,PeriodArray.strftime(fmt=None)
,PeriodIndex.strftime(fmt=None)
andPeriodIndex.format(fmt=None)
.to_csv
operations involvingPeriodArray
orPeriodIndex
with defaultdate_format
are also significantly accelerated.ASVs:
EDIT: additional for
to_csv
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.