Skip to content

[READY] Improved performance of Period's default formatter (period_format) #51459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
2504a90
Improved performance of default period formatting (`period_format`). …
Feb 17, 2023
c1bbfd6
Improved ASV for period frames and datetimes
Feb 17, 2023
3693b30
What's new
Feb 17, 2023
3c4efea
Update asv_bench/benchmarks/strftime.py
smarie Feb 17, 2023
7ccb423
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 17, 2023
e4a58e7
Fixed whats new backticks
Feb 17, 2023
714d518
Merge branch 'feature/44764_perf_issue_new_period' of https://github.…
Feb 17, 2023
04fbf6b
Completed whatsnew
Feb 17, 2023
279114d
Added ASVs for to_csv for period
Feb 18, 2023
35ec182
Aligned the namings
Feb 18, 2023
28d8b5d
Completed Whats new
Feb 18, 2023
a69aeca
Added a docstring explaining why the ASV bench with custom date forma…
Feb 19, 2023
705ff81
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 24, 2023
08d8d5e
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Feb 28, 2023
c1ff5fb
Moved whatsnew to 2.0.0
Feb 28, 2023
5b0acc8
Moved whatsnew to 2.1
Mar 1, 2023
f4cef3a
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 1, 2023
4cefded
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 2, 2023
99b82f9
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 3, 2023
3df0666
Merge branch 'main' into feature/44764_perf_issue_new_period
smarie Mar 3, 2023
576e475
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 6, 2023
d55e47f
Improved docstring as per code review
Mar 6, 2023
316d2b7
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new_per…
Mar 6, 2023
e62867e
Merge branch 'main' into feature/44764_perf_issue_new_period
smarie Mar 7, 2023
2cd568c
Renamed asv params as per code review
Mar 8, 2023
a07d89e
Fixed ASV comment as per code review
Mar 8, 2023
8ad4827
ASV: renamed parameters as per code review
Mar 8, 2023
03d9778
Improved `period_format`: now the performance is the same when no for…
Mar 8, 2023
8d21053
Code review: Improved strftime ASV: set_index is now in the setup
Mar 8, 2023
13acb29
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new_per…
Mar 8, 2023
4fd20ad
Removed useless main
Mar 8, 2023
5d4bfc1
Removed wrong code
Mar 8, 2023
77b8b7a
Improved ASVs for period formatting: now there is a "default explicit…
Mar 8, 2023
24a6b63
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 8, 2023
a309388
Update pandas/_libs/tslibs/period.pyx
smarie Mar 8, 2023
17eedeb
Update pandas/_libs/tslibs/period.pyx
smarie Mar 8, 2023
14b7489
Update pandas/_libs/tslibs/period.pyx
smarie Mar 8, 2023
e1a81ef
Minor refactoring to avoid retesting for none several time
Mar 8, 2023
faf97eb
Merge branch 'feature/44764_perf_issue_new_period' of https://github.…
Mar 8, 2023
f6ced94
Fixed issue: bool does not exist, using bint
Mar 8, 2023
3da1e4b
Added missing quarter variable as cdef
Mar 8, 2023
55d180e
Fixed asv bug
Mar 8, 2023
ea9fc47
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 9, 2023
4becd71
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 10, 2023
74cbabf
Merge branch 'main' into feature/44764_perf_issue_new_period
smarie Mar 13, 2023
b3d5963
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 14, 2023
1301795
Merge remote-tracking branch 'origin/feature/44764_perf_issue_new_per…
Mar 14, 2023
61ce96e
Code review: fixed docstring
Mar 15, 2023
6890776
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 15, 2023
e14c383
Merge branch 'main' into feature/44764_perf_issue_new_period
smarie Mar 16, 2023
ac68ab9
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
May 6, 2023
1a05c22
Update doc/source/whatsnew/v2.1.0.rst
MarcoGorelli May 6, 2023
d3c19d0
fixup
May 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 59 additions & 6 deletions asv_bench/benchmarks/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
DataFrame,
concat,
date_range,
period_range,
read_csv,
to_datetime,
)
Expand Down Expand Up @@ -98,24 +99,76 @@ def time_frame_date_no_format_index(self):
self.data.to_csv(self.fname)


class ToCSVPeriod(BaseIO):
fname = "__test__.csv"

params = ([1000, 10000], ["D", "H"])
param_names = ["nobs", "freq"]

def setup(self, nobs, freq):
rng = period_range(start="2000-01-01", periods=nobs, freq=freq)
self.data = DataFrame(rng)
if freq == "D":
self.default_fmt = "%Y-%m-%d"
elif freq == "H":
self.default_fmt = "%Y-%m-%d %H:00"

def time_frame_period_formatting_default(self, nobs, freq):
self.data.to_csv(self.fname)

def time_frame_period_formatting_default_explicit(self, nobs, freq):
self.data.to_csv(self.fname, date_format=self.default_fmt)

def time_frame_period_formatting(self, nobs, freq):
# Nb: `date_format` is not actually taken into account here today, so the
# performance is currently identical to `time_frame_period_formatting_default`
# above. This timer is therefore expected to degrade when GH#51621 is fixed.
# (Remove this comment when GH#51621 is fixed.)
self.data.to_csv(self.fname, date_format="%Y-%m-%d___%H:%M:%S")


class ToCSVPeriodIndex(BaseIO):
fname = "__test__.csv"

params = ([1000, 10000], ["D", "H"])
param_names = ["nobs", "freq"]

def setup(self, nobs, freq):
rng = period_range(start="2000-01-01", periods=nobs, freq=freq)
self.data = DataFrame({"a": 1}, index=rng)
if freq == "D":
self.default_fmt = "%Y-%m-%d"
elif freq == "H":
self.default_fmt = "%Y-%m-%d %H:00"

def time_frame_period_formatting_index(self, nobs, freq):
self.data.to_csv(self.fname, date_format="%Y-%m-%d___%H:%M:%S")

def time_frame_period_formatting_index_default(self, nobs, freq):
self.data.to_csv(self.fname)

def time_frame_period_formatting_index_default_explicit(self, nobs, freq):
self.data.to_csv(self.fname, date_format=self.default_fmt)


class ToCSVDatetimeBig(BaseIO):
fname = "__test__.csv"
timeout = 1500
params = [1000, 10000, 100000]
param_names = ["obs"]
param_names = ["nobs"]

def setup(self, obs):
def setup(self, nobs):
d = "2018-11-29"
dt = "2018-11-26 11:18:27.0"
self.data = DataFrame(
{
"dt": [np.datetime64(dt)] * obs,
"d": [np.datetime64(d)] * obs,
"r": [np.random.uniform()] * obs,
"dt": [np.datetime64(dt)] * nobs,
"d": [np.datetime64(d)] * nobs,
"r": [np.random.uniform()] * nobs,
}
)

def time_frame(self, obs):
def time_frame(self, nobs):
self.data.to_csv(self.fname)


Expand Down
87 changes: 69 additions & 18 deletions asv_bench/benchmarks/strftime.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,58 +7,109 @@
class DatetimeStrftime:
timeout = 1500
params = [1000, 10000]
param_names = ["obs"]
param_names = ["nobs"]

def setup(self, obs):
def setup(self, nobs):
d = "2018-11-29"
dt = "2018-11-26 11:18:27.0"
self.data = pd.DataFrame(
{
"dt": [np.datetime64(dt)] * obs,
"d": [np.datetime64(d)] * obs,
"r": [np.random.uniform()] * obs,
"dt": [np.datetime64(dt)] * nobs,
"d": [np.datetime64(d)] * nobs,
"r": [np.random.uniform()] * nobs,
}
)

def time_frame_date_to_str(self, obs):
def time_frame_date_to_str(self, nobs):
self.data["d"].astype(str)

def time_frame_date_formatting_default(self, obs):
def time_frame_date_formatting_default(self, nobs):
self.data["d"].dt.strftime(date_format=None)

def time_frame_date_formatting_default_explicit(self, nobs):
self.data["d"].dt.strftime(date_format="%Y-%m-%d")

def time_frame_date_formatting_custom(self, obs):
def time_frame_date_formatting_custom(self, nobs):
self.data["d"].dt.strftime(date_format="%Y---%m---%d")

def time_frame_datetime_to_str(self, obs):
def time_frame_datetime_to_str(self, nobs):
self.data["dt"].astype(str)

def time_frame_datetime_formatting_default_date_only(self, obs):
def time_frame_datetime_formatting_default(self, nobs):
self.data["dt"].dt.strftime(date_format=None)

def time_frame_datetime_formatting_default_explicit_date_only(self, nobs):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d")

def time_frame_datetime_formatting_default(self, obs):
def time_frame_datetime_formatting_default_explicit(self, nobs):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S")

def time_frame_datetime_formatting_default_with_float(self, obs):
def time_frame_datetime_formatting_default_with_float(self, nobs):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S.%f")

def time_frame_datetime_formatting_custom(self, obs):
def time_frame_datetime_formatting_custom(self, nobs):
self.data["dt"].dt.strftime(date_format="%Y-%m-%d --- %H:%M:%S")


class PeriodStrftime:
timeout = 1500
params = ([1000, 10000], ["D", "H"])
param_names = ["nobs", "freq"]

def setup(self, nobs, freq):
self.data = pd.DataFrame(
{
"p": pd.period_range(start="2000-01-01", periods=nobs, freq=freq),
"r": [np.random.uniform()] * nobs,
}
)
self.data["i"] = self.data["p"]
self.data.set_index("i", inplace=True)
if freq == "D":
self.default_fmt = "%Y-%m-%d"
elif freq == "H":
self.default_fmt = "%Y-%m-%d %H:00"

def time_frame_period_to_str(self, nobs, freq):
self.data["p"].astype(str)

def time_frame_period_formatting_default(self, nobs, freq):
self.data["p"].dt.strftime(date_format=None)

def time_frame_period_formatting_default_explicit(self, nobs, freq):
self.data["p"].dt.strftime(date_format=self.default_fmt)

def time_frame_period_formatting_index_default(self, nobs, freq):
self.data.index.format()

def time_frame_period_formatting_index_default_explicit(self, nobs, freq):
self.data.index.format(self.default_fmt)

def time_frame_period_formatting_custom(self, nobs, freq):
self.data["p"].dt.strftime(date_format="%Y-%m-%d --- %H:%M:%S")

def time_frame_period_formatting_iso8601_strftime_Z(self, nobs, freq):
self.data["p"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%SZ")

def time_frame_period_formatting_iso8601_strftime_offset(self, nobs, freq):
"""Not optimized yet as %z is not supported by `convert_strftime_format`"""
self.data["p"].dt.strftime(date_format="%Y-%m-%dT%H:%M:%S%z")


class BusinessHourStrftime:
timeout = 1500
params = [1000, 10000]
param_names = ["obs"]
param_names = ["nobs"]

def setup(self, obs):
def setup(self, nobs):
self.data = pd.DataFrame(
{
"off": [offsets.BusinessHour()] * obs,
"off": [offsets.BusinessHour()] * nobs,
}
)

def time_frame_offset_str(self, obs):
def time_frame_offset_str(self, nobs):
self.data["off"].apply(str)

def time_frame_offset_repr(self, obs):
def time_frame_offset_repr(self, nobs):
self.data["off"].apply(repr)
19 changes: 19 additions & 0 deletions asv_bench/benchmarks/tslibs/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ class PeriodUnaryMethods:

def setup(self, freq):
self.per = Period("2012-06-01", freq=freq)
if freq == "M":
self.default_fmt = "%Y-%m"
elif freq == "min":
self.default_fmt = "%Y-%m-%d %H:%M"

def time_to_timestamp(self, freq):
self.per.to_timestamp()
Expand All @@ -70,6 +74,21 @@ def time_now(self, freq):
def time_asfreq(self, freq):
self.per.asfreq("A")

def time_str(self, freq):
str(self.per)

def time_repr(self, freq):
repr(self.per)

def time_strftime_default(self, freq):
self.per.strftime(None)

def time_strftime_default_explicit(self, freq):
self.per.strftime(self.default_fmt)

def time_strftime_custom(self, freq):
self.per.strftime("%b. %d, %Y was a %A")


class PeriodConstructor:
params = [["D"], [True, False]]
Expand Down
6 changes: 6 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,12 @@ Performance improvements
- Performance improvement in :meth:`Series.combine_first` (:issue:`51777`)
- Performance improvement in :meth:`MultiIndex.set_levels` and :meth:`MultiIndex.set_codes` when ``verify_integrity=True`` (:issue:`51873`)
- Performance improvement in :func:`factorize` for object columns not containing strings (:issue:`51921`)
- :class:`Period`'s default formatter (`period_format`) is now significantly
(~twice) faster. This improves performance of ``str(Period)``, ``repr(Period)``, and
:meth:`Period.strftime(fmt=None)`, as well as ``PeriodArray.strftime(fmt=None)``,
``PeriodIndex.strftime(fmt=None)`` and ``PeriodIndex.format(fmt=None)``. Finally,
``to_csv`` operations involving :class:`PeriodArray` or :class:`PeriodIndex` with
default ``date_format`` are also significantly accelerated. (:issue:`51459`)

.. ---------------------------------------------------------------------------
.. _whatsnew_210.bug_fixes:
Expand Down
Loading