-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
#18058: improve DatetimeIndex.date performance #18163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I ran asv with the tests from indexing.py, the output was "Benchmarks not significantly changed". A quick test with %timeit:
Please let me know if there are some other asv tests that need to be run. |
Codecov Report
@@ Coverage Diff @@
## master #18163 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.03%
==========================================
Files 163 163
Lines 49714 49714
==========================================
- Hits 45415 45405 -10
- Misses 4299 4309 +10
Continue to review full report at Codecov.
|
pandas/_libs/tslib.pyx
Outdated
|
||
if tz is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the box param and have it take strings instead of boolean
iow box can be date/datetime/timestamp
you can assert that it is one of those
you will have to change places where this function. is called (eg box=True -> timestamp) etc
pandas/_libs/tslib.pyx
Outdated
|
||
def ints_to_pydatetime(ndarray[int64_t] arr, tz=None, freq=None, | ||
box="datetime"): | ||
# convert an i8 repr to an ndarray of datetimes (if box == "datetime"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a Parameters section here
pandas/_libs/tslib.pyx
Outdated
func_create = create_datetime_from_ts | ||
|
||
if tz is not None: | ||
if tz is not None and box != "date": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, put an assert in the box == 'date'
section that tz is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if box == "date":
assert (tz is not None), "tz should be None when converting to date"
func_create = create_date_from_ts
@jreback Can I do something like this instead? That way the error message will be more specific when tz is used with date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes absolutely. The point is we should not be passing a tz with the date routine.
pandas/core/dtypes/concat.py
Outdated
x = x.reshape(shape) | ||
|
||
elif x.dtype == _TD_DTYPE: | ||
shape = x.shape | ||
x = tslib.ints_to_pytimedelta(x.view(np.int64).ravel(), box=True) | ||
x = tslib.ints_to_pytimedelta(x.view(np.int64).ravel(), box="True") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a different routine
pandas/_libs/tslib.pyx
Outdated
func_create = create_datetime_from_ts | ||
|
||
if tz is not None: | ||
if tz is not None and box != "date": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes absolutely. The point is we should not be passing a tz with the date routine.
can u add an asv for this? |
add a whatsnew note for 0.22, perf section |
Is the change in the timeseries.py file what you meant? Do you know how I can resolve the conflict? |
asv_bench/benchmarks/timeseries.py
Outdated
@@ -107,6 +107,8 @@ def time_infer_freq_daily(self): | |||
def time_infer_freq_business(self): | |||
infer_freq(self.b_freq) | |||
|
|||
def time_to_date(self): | |||
self.rng.date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well as ones for self.rng.time()
and self.rng.to_pydatetime()
while at it here.
note that this should be self.rng.date()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeIndex.date and DatetimeIndex.time are tagged with the @property
tag at the moment. Do you want me to remove that tag so that they can be used as functions instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that was a typo;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's ok
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -69,8 +69,9 @@ Removal of prior version deprecations/changes | |||
Performance Improvements | |||
~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- Indexers on ``Series`` or ``DataFrame`` no longer create a reference cycle (:issue:`17956`) | |||
- Indexers on Series or DataFrame no longer create a reference cycle (:issue:`17956`) | |||
- Added a keyword argument, ``cache``, to :func:`to_datetime` that improved the performance of converting duplicate datetime arguments (:issue:`11665`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only include your change.
pandas/core/indexes/datetimes.py
Outdated
@@ -1633,8 +1643,7 @@ def date(self): | |||
Returns numpy array of python datetime.date objects (namely, the date | |||
part of Timestamps without timezone information). | |||
""" | |||
return self._maybe_mask_results(libalgos.arrmap_object( | |||
self.asobject.values, lambda x: x.date())) | |||
return self.normalize().to_pydate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call the code for what you have in to_pydate()
above, and remove that method; we don't actually want to expose this to the user.
doc/source/whatsnew/v0.22.0.txt
Outdated
- Added a keyword argument, ``cache``, to :func:`to_datetime` that improved the performance of converting duplicate datetime arguments (:issue:`11665`) | ||
- DatetimeIndex.date skips the iterations that create many python objects (:issue:`18058`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved performance of :func:`Series.dt.date` and :func:`DatetimeIndex.date`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. needs a rebase. can you show the updated asv results. (for those 3 that we added), you can do --bench timeseries
pandas/_libs/tslib.pyx
Outdated
Parameters | ||
---------- | ||
arr : array of i8 repr | ||
tz : the timezone to convert to, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so doc-strings should be slightly different
arr : array of i8
tz : str, default None
convert to this timezone
freq : str/Offset, default None
freq to convert
box : {'datetime', 'timestamp', 'date'}, default 'datetime'
# add what you have below here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asv results are:
asv continuous -f 1.1 upstream/master issue18058 -b timeseries.DatetimeIndex.time_dti_time -b timeseries.DatetimeIndex.time_to_date -b timeseries.DatetimeIndex.time_to_pydatetime
conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[ 0.00%] · For pandas commit hash 04c90d90:
[ 0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 16.67%] ··· Running timeseries.DatetimeIndex.time_dti_time 312ms
[ 33.33%] ··· Running timeseries.DatetimeIndex.time_to_date 40.6ms
[ 50.00%] ··· Running timeseries.DatetimeIndex.time_to_pydatetime 46.9ms
[ 50.00%] · For pandas commit hash c176a3c2:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 66.67%] ··· Running timeseries.DatetimeIndex.time_dti_time 375ms
[ 83.33%] ··· Running timeseries.DatetimeIndex.time_to_date 258±4ms
[100.00%] ··· Running timeseries.DatetimeIndex.time_to_pydatetime 48.8±1ms
before after ratio
[c176a3c2] [04c90d90]
- 375ms 312ms 0.83 timeseries.DatetimeIndex.time_dti_time
- 258±4ms 40.6ms 0.16 timeseries.DatetimeIndex.time_to_date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
we should fix time too :) (another PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah, i was about to ask
can u rebase again |
yeah, i'm on it |
just finished it, do i need to push again? |
yes need to push again. |
Hello @tmnhat2001! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 22, 2017 at 04:46 Hours UTC |
can you rebase |
thanks @tmnhat2001 |
FYI @jbrockmendel IIRC you may have a PR to move some of this code to tslibs.conversion |
thanks @jreback |
git diff upstream/master -u -- "*.py" | flake8 --diff