Skip to content

PERF: short-circuit allocations in infer_dtype, ensure_datetime64ns #44971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 22, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 19, 2021

from pandas._libs.tslibs.conversion import *

arr = np.arange(10**6).view("M8[ns]")

%timeit ensure_datetime64ns(arr, copy=False)
423 ns ± 10.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- PR
3.5 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  # <- master

%timeit ensure_datetime64ns(arr, copy=True)
517 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)   # <- PR
1.77 ms ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <- master


from pandas._libs.lib import *

farr = np.arange(10**6)[::2]

%timeit infer_dtype(farr, skipna=True)
2.94 µs ± 48.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  # <- PR
412 µs ± 34.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <- master

@jbrockmendel jbrockmendel reopened this Dec 20, 2021
@jreback jreback changed the title PERF: short-circuit allocations in infer_dtpye, ensure_datetime64ns PERF: short-circuit allocations in infer_dtype, ensure_datetime64ns Dec 20, 2021
@jreback jreback added the Performance Memory or execution speed performance label Dec 20, 2021
@jreback jreback added this to the 1.4 milestone Dec 20, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

@jreback
Copy link
Contributor

jreback commented Dec 20, 2021

  File "pandas/_libs/tslibs/tzconversion.pyx", line 554, in pandas._libs.tslibs.tzconversion._tzlocal_get_offset_components
    dt = dt.astimezone(tz)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 144, in fromutc
    return f(self, dt)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 261, in fromutc
    _fold = self._fold_status(dt, dt_wall)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 196, in _fold_status
    if self.is_ambiguous(dt_wall):
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/tz.py", line 254, in is_ambiguous
    naive_dst = self._naive_is_dst(dt)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/tz.py", line 260, in _naive_is_dst
    return time.localtime(timestamp + time.timezone).tm_isdst
OverflowError: timestamp out of range for platform time_t
##[warning]Exception ignored in: 'pandas._libs.tslibs.conversion._localize_tso'

Traceback (most recent call last):
  File "pandas/_libs/tslibs/tzconversion.pyx", line 383, in pandas._libs.tslibs.tzconversion.tz_convert_utc_to_tzlocal
    return _tz_convert_tzlocal_utc(utc_val, tz, to_utc=False, fold=fold)
  File "pandas/_libs/tslibs/tzconversion.pyx", line 594, in pandas._libs.tslibs.tzconversion._tz_convert_tzlocal_utc
    delta = _tzlocal_get_offset_components(val, tz, to_utc, fold)
  File "pandas/_libs/tslibs/tzconversion.pyx", line 554, in pandas._libs.tslibs.tzconversion._tzlocal_get_offset_components
    dt = dt.astimezone(tz)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 144, in fromutc
    return f(self, dt)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 261, in fromutc
    _fold = self._fold_status(dt, dt_wall)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/_common.py", line 196, in _fold_status
    if self.is_ambiguous(dt_wall):
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/tz.py", line 254, in is_ambiguous
    naive_dst = self._naive_is_dst(dt)
  File "/root/virtualenvs/pandas-dev/lib/python3.8/site-packages/dateutil/tz/tz.py", line 260, in _naive_is_dst
    return time.localtime(timestamp + time.timezone).tm_isdst
OverflowError: timestamp out of range for platform time_t

looks legit (32-bit build)

@jbrockmendel
Copy link
Member Author

looks legit (32-bit build)

those show up in the logs pretty frequently but don't cause failures. i'll wait until the other CI issues are resolved to re-push and see if its an actual problem here

@jreback jreback merged commit 50b4df3 into pandas-dev:master Dec 22, 2021
@jbrockmendel jbrockmendel deleted the perf-tslibs-noravel branch December 22, 2021 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants