Skip to content

PERF: Speed up Period construction #50149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 18, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -768,6 +768,7 @@ Performance improvements
- Performance improvement in :func:`merge` when not merging on the index - the new index will now be :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49478`)
- Performance improvement in :meth:`DataFrame.to_dict` and :meth:`Series.to_dict` when using any non-object dtypes (:issue:`46470`)
- Performance improvement in :func:`read_html` when there are multiple tables (:issue:`49929`)
- Performance improvement in :class:`Period` constructor when constructing from a string or integer (:issue:`38312`)

.. ---------------------------------------------------------------------------
.. _whatsnew_200.bug_fixes:
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/parsing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ cdef parse_datetime_string_with_reso(
&out_tzoffset, False
)
if not string_to_dts_failed:
if dts.ps != 0 or out_local:
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns or out_local:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns is handled here, should it be removed from the dict on L399-L409?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, both blocks go through the dict to get the string representation of the reso.

This is just to force the returned datetime object to be Timestamp, which has a nanosecond attribute.

# TODO: the not-out_local case we could do without Timestamp;
# avoid circular import
from pandas import Timestamp
Expand All @@ -395,6 +395,7 @@ cdef parse_datetime_string_with_reso(
parsed = datetime(
dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us
)

reso = {
NPY_DATETIMEUNIT.NPY_FR_Y: "year",
NPY_DATETIMEUNIT.NPY_FR_M: "month",
Expand Down
14 changes: 5 additions & 9 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2590,18 +2590,14 @@ class Period(_Period):
value = str(value)
value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
nanosecond = 0
else:
nanosecond = ts.nanosecond
if nanosecond != 0:
reso = "nanosecond"
if reso == "nanosecond":
nanosecond = dt.nanosecond

if dt is NaT:
ordinal = NPY_NAT

if freq is None:
if freq is None and ordinal != NPY_NAT:
# Skip NaT, since it doesn't have a resolution
try:
freq = attrname_to_abbrevs[reso]
except KeyError:
Expand Down