-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Add cache keyword to to_datetime (#11665) #17077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
7f67ac9
move cache into convert_listlike
mroeschke a7c65f7
Move cache down the stack, explore threshold to trigger cache
mroeschke 243349a
Add more cache conditions
mroeschke d154a6d
Add some benchmarks
mroeschke b5e71d2
Some performance testing
mroeschke fb2e831
Add asvs, modify tests for caches
mroeschke 33c79d3
Fix asv errors and condition
mroeschke dcaafb6
Pep8 fixes
mroeschke 04df9d9
Remove unused import
mroeschke 34b468f
Wrap cache logic in a function
mroeschke d287cc6
Fix Series test
mroeschke 1bf4c9d
Add whatsnew and small documentation fix
mroeschke 3ffdd46
pep 8 fixes
mroeschke a093b88
Move box logic into maybe_convert_cache
mroeschke d1fc211
Use quicker unique check
mroeschke 9486df3
Move caching function outside to_datetime
mroeschke d059d44
Pass most tests
mroeschke 02ab4f3
Skip test related to GH 18111, lint
mroeschke 82f36d3
Update docstring
mroeschke 76547e1
adjust imports, docs and move whatsnew
mroeschke 590c9cc
Remove whitespace
mroeschke 9a985ac
Address comments
mroeschke 85a1f2d
Lint fix
mroeschke 49f5850
Move docs and adjust test
mroeschke 07fa22d
Lint
mroeschke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,9 +36,77 @@ def _guess_datetime_format_for_array(arr, **kwargs): | |
return _guess_datetime_format(arr[non_nan_elements[0]], **kwargs) | ||
|
||
|
||
def _maybe_cache(arg, format, cache, tz, convert_listlike): | ||
""" | ||
Create a cache of unique dates from an array of dates | ||
|
||
Parameters | ||
---------- | ||
arg : integer, float, string, datetime, list, tuple, 1-d array, Series | ||
format : string | ||
Strftime format to parse time | ||
cache : boolean | ||
True attempts to create a cache of converted values | ||
tz : string | ||
Timezone of the dates | ||
convert_listlike : function | ||
Conversion function to apply on dates | ||
|
||
Returns | ||
------- | ||
cache_array : Series | ||
Cache of converted, unique dates. Can be empty | ||
""" | ||
from pandas import Series | ||
cache_array = Series() | ||
if cache: | ||
# Perform a quicker unique check | ||
from pandas import Index | ||
if not Index(arg).is_unique: | ||
unique_dates = algorithms.unique(arg) | ||
cache_dates = convert_listlike(unique_dates, True, format, tz=tz) | ||
cache_array = Series(cache_dates, index=unique_dates) | ||
return cache_array | ||
|
||
|
||
def _convert_and_box_cache(arg, cache_array, box, errors, name=None): | ||
""" | ||
Convert array of dates with a cache and box the result | ||
|
||
Parameters | ||
---------- | ||
arg : integer, float, string, datetime, list, tuple, 1-d array, Series | ||
cache_array : Series | ||
Cache of converted, unique dates | ||
box : boolean | ||
True boxes result as an Index-like, False returns an ndarray | ||
errors : string | ||
'ignore' plus box=True will convert result to Index | ||
name : string, default None | ||
Name for a DatetimeIndex | ||
|
||
Returns | ||
------- | ||
result : datetime of converted dates | ||
Returns: | ||
|
||
- Index-like if box=True | ||
- ndarray if box=False | ||
""" | ||
from pandas import Series, DatetimeIndex, Index | ||
result = Series(arg).map(cache_array) | ||
if box: | ||
if errors == 'ignore': | ||
return Index(result) | ||
else: | ||
return DatetimeIndex(result, name=name) | ||
return result.values | ||
|
||
|
||
def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | ||
utc=None, box=True, format=None, exact=True, | ||
unit=None, infer_datetime_format=False, origin='unix'): | ||
unit=None, infer_datetime_format=False, origin='unix', | ||
cache=False): | ||
""" | ||
Convert argument to datetime. | ||
|
||
|
@@ -111,7 +179,12 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | |
origin. | ||
|
||
.. versionadded: 0.20.0 | ||
cache : boolean, default False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default is True |
||
If True, use a cache of unique, converted dates to apply the datetime | ||
conversion. May produce sigificant speed-up when parsing duplicate date | ||
strings, especially ones with timezone offsets. | ||
|
||
.. versionadded: 0.22.0 | ||
Returns | ||
------- | ||
ret : datetime if parsing succeeded. | ||
|
@@ -369,15 +442,28 @@ def _convert_listlike(arg, box, format, name=None, tz=tz): | |
if isinstance(arg, tslib.Timestamp): | ||
result = arg | ||
elif isinstance(arg, ABCSeries): | ||
from pandas import Series | ||
values = _convert_listlike(arg._values, True, format) | ||
result = Series(values, index=arg.index, name=arg.name) | ||
cache_array = _maybe_cache(arg, format, cache, tz, _convert_listlike) | ||
if not cache_array.empty: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this still looks pretty duplicative, but I guess ok for now. |
||
result = arg.map(cache_array) | ||
else: | ||
from pandas import Series | ||
values = _convert_listlike(arg._values, True, format) | ||
result = Series(values, index=arg.index, name=arg.name) | ||
elif isinstance(arg, (ABCDataFrame, MutableMapping)): | ||
result = _assemble_from_unit_mappings(arg, errors=errors) | ||
elif isinstance(arg, ABCIndexClass): | ||
result = _convert_listlike(arg, box, format, name=arg.name) | ||
cache_array = _maybe_cache(arg, format, cache, tz, _convert_listlike) | ||
if not cache_array.empty: | ||
result = _convert_and_box_cache(arg, cache_array, box, errors, | ||
name=arg.name) | ||
else: | ||
result = _convert_listlike(arg, box, format, name=arg.name) | ||
elif is_list_like(arg): | ||
result = _convert_listlike(arg, box, format) | ||
cache_array = _maybe_cache(arg, format, cache, tz, _convert_listlike) | ||
if not cache_array.empty: | ||
result = _convert_and_box_cache(arg, cache_array, box, errors) | ||
else: | ||
result = _convert_listlike(arg, box, format) | ||
else: | ||
result = _convert_listlike(np.array([arg]), box, format)[0] | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think there is a actually a way to do this with an asv matrix, but can ok for now