-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Speeds up creation of Period, PeriodArray, with Offset freq #23589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Speeds up creation of Period, PeriodArray, with Offset freq #23589
Conversation
master: ```python In [2]: freq = pd.tseries.offsets.Day() ...: ...: %timeit pd.Period("2001", freq=freq) 294 µs ± 5.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [3]: %timeit pd.Period._maybe_convert_freq(freq) ...: 64.7 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` branch: ```python In [2]: freq = pd.tseries.offsets.Day() ...: ...: %timeit pd.Period("2001", freq=freq) 158 µs ± 2.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [3]: %timeit pd.Period._maybe_convert_freq(freq) 193 ns ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) ``` While looking at the profile plot in snakeviz, it seems like a lot of time in Period._maybe_convert_freq was spent importing modules. `_maybe_convert_freq` calls `offsets.to_offset`, which imports a Python function inside the method. Does Cython not handle this well?
Codecov Report
@@ Coverage Diff @@
## master #23589 +/- ##
==========================================
- Coverage 92.25% 92.25% -0.01%
==========================================
Files 161 161
Lines 51237 51260 +23
==========================================
+ Hits 47269 47290 +21
- Misses 3968 3970 +2
Continue to review full report at Codecov.
|
master
branch:
|
thanks! |
To be clear, although this PR improved performance a bit, the perf regression is not yet fixed |
IIRC the version of The long-term solution may just be to move |
Not saying that the remaining performance hit is necessarily due to For the plotting, it seems to be coming from Period.asfreq, which uses |
…fixed * upstream/master: (47 commits) CLN: remove values attribute from datetimelike EAs (pandas-dev#23603) DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381) PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589) PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591) TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502) Update description of Index._values/values/ndarray_values (pandas-dev#23507) Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552) DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953) ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654) CI: Auto-cancel redundant builds (pandas-dev#23523) Preserve EA dtype in DataFrame.stack (pandas-dev#23285) TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468) BUG: raise if invalid freq is passed (pandas-dev#23546) remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562) BUG: Fix error message for invalid HTML flavor (pandas-dev#23550) ENH: Support EAs in Series.unstack (pandas-dev#23284) DOC: Updating DataFrame.join docstring (pandas-dev#23471) TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888) BUG: Return KeyError for invalid string key (pandas-dev#23540) BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852) ...
master:
branch:
While looking at the profile plot in snakeviz, it seems like a lot of time in
Period._maybe_convert_freq was spent importing modules.
_maybe_convert_freq
calls
offsets.to_offset
, which imports a Python function inside the method.Does Cython not handle this well?