Skip to content

PERF: PeriodArray._format_native_types #51793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

jbrockmendel
Copy link
Member

#23979 suggests implementing PeriodArray._format_native_types with self.to_timestamp()._format_native_types. That is faster than this, but this avoids copies, is robust to weekly freqs, and should further improve with #51459.

pi = pd.period_range("2016-01-01", periods=10**5, freq='H')
pa = pi._data
pa[4] = pd.NaT

%timeit pa._format_native_types()
246 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- PR
411 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- main

%timeit pa._format_native_types(date_format="%Y-%m-%d")
274 ms ± 8.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- PR
420 ms ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- main

In [18]: %timeit pa.to_timestamp()._format_native_types()
164 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cdef:
Py_ssize_t i, n = values.size
int64_t ordinal
object item_repr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be a str?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _format_native_types it is annotated as str | float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, thought you were referring to na_rep. this can't be str bc na_rep isn't str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, (odd na_rep can be a float but okay)

@mroeschke mroeschke added Performance Memory or execution speed performance Period Period data type labels Mar 6, 2023
@mroeschke mroeschke modified the milestones: 2.0, 2.1 Mar 7, 2023
@mroeschke mroeschke merged commit 948392b into pandas-dev:main Mar 7, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-23979 branch March 7, 2023 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: Refactor PeriodArray._format_native_types to use DatetimeArray
2 participants