-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: pd.to_datetime, unit='s' much slower for float64 than for int64 #35027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: pd.to_datetime, unit='s' much slower for float64 than for int64 #35027
Conversation
As a quick measure of the efficacy of this PR, I ran the checks from #20445 on current branch and master. The set-up is:
For ints
I get
and for floats
I get
|
@WillAyd @jbrockmendel I have something that works on some Linux environments but not on others and not on Mac/Windows. Not sure what the problem(s) are and the best way to go about debugging |
ping @WillAyd @jbrockmendel |
pandas/_libs/tslib.pyx
Outdated
# fill by comparing to NPY_NAT constant | ||
mask = fresult == NPY_NAT | ||
fresult[mask] = 0 | ||
fvalues = fresult * (<float64_t>m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast doesn't seem right and is probably the cause of your portability issue. What is the warning that this generates without?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The traceback if I remove this line:
File "/workspaces/pandas-arw2019/pandas/core/tools/datetimes.py", line 802, in to_datetime
values = convert_listlike(arg._values, format)
File "/workspaces/pandas-arw2019/pandas/core/tools/datetimes.py", line 338, in _convert_listlike_datetimes
result, tz_parsed = tslib.array_with_unit_to_datetime(
File "pandas/_libs/tslib.pyx", line 262, in pandas._libs.tslib.array_with_unit_to_datetime
if ((fvalues < Timestamp.min.value).any()
SystemError: /home/conda/feedstock_root/build_artifacts/python_1591030388223/work/Objects/object.c:769: bad argument to internal function
Benchmark results are below. Would appreciate any comments on how to triage the areas which take a hit
|
asv_bench/benchmarks/timeseries.py
Outdated
) | ||
self.timestamp_seconds_float = self.timestamp_seconds_int.astype("float64") | ||
|
||
def to_datetime_int(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these need to be def time_foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrote this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good a couple of comments. can you post an updated benchmark
pandas/_libs/tslib.pyx
Outdated
result = values.astype('M8[ns]') | ||
if unit == "ns": | ||
if issubclass(values.dtype.type, (np.integer, np.float_)): | ||
result = values.astype("M8[ns]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does astype of floats directly to M8 work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it does (as you use it below), but do we have a test specifically for float with unit='ns'?
also can try .astype(..., copy=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does astype of floats directly to M8 work?
would it be better to do, here and below:
ivalues = values.view("i8")
result = ivalues.astype("M8[ns]")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it does (as you use it below), but do we have a test specifically for float with unit='ns'?
I'll look some more but I think we don't. Will add unless I find one
@jreback with your suggestion (bb8c35b) benchmarks are now (largely) unaffected:
|
thanks @arw2019 very nice! |
thanks @jreback @WillAyd @jbrockmendel for reviewing!!! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
As per discussion in #20445 this PR addresses performance of
to_datetime
on a uniform type array of floats. The aim is to get a speed-up by implementingastype
-ing, and avoid looping, for floats.