-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Return DatetimeIndex or TimedeltaIndex bins for q/cut when input is datelike #20956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20956 +/- ##
==========================================
+ Coverage 91.9% 91.91% +<.01%
==========================================
Files 158 158
Lines 49690 49695 +5
==========================================
+ Hits 45670 45677 +7
+ Misses 4020 4018 -2
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -524,6 +524,7 @@ Other Enhancements | |||
- Added new writer for exporting Stata dta files in version 117, ``StataWriter117``. This format supports exporting strings with lengths up to 2,000,000 characters (:issue:`16450`) | |||
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`) | |||
- :func:`date_range` now returns a linearly spaced ``DatetimeIndex`` if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`) | |||
- :func:`cut` and :func:`qcut` now returns a ``DatetimeIndex`` or ``TimedeltaIndex`` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add :class:
for DTI / TDI
@@ -332,6 +332,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None, | |||
result = result.astype(np.float64) | |||
np.putmask(result, na_mask, np.nan) | |||
|
|||
bins = _convert_bin_to_datelike_type(bins, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this just call Index()
which will do all of this inference
Simplified this a bit with the Index constructor; however, I wanted to avoid turing all bins into an Index type unless this is an API change we want to make. Also, I created #20964 since the Index constructor does not correctly localize values with a datetimetz dtype specified. |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -550,6 +550,7 @@ Other Enhancements | |||
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`) | |||
- :func:`cut` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`20947`) | |||
- :func:`date_range`, :func:`timedelta_range`, and :func:`interval_range` now return a linearly spaced index if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`, :issue:`20983`, :issue:`20976`) | |||
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.24
pandas/core/reshape/tile.py
Outdated
bins : Array-like of bins, DatetimeIndex or TimedeltaIndex if dtype is | ||
datelike | ||
""" | ||
# Can be simplified once GH 20964 is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to fix #20964 first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke can you rebase & move note to 0.24.0. ping and will have another look.
Updated and ready for a final review @jreback |
@@ -364,6 +365,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None, | |||
result = result.astype(np.float64) | |||
np.putmask(result, na_mask, np.nan) | |||
|
|||
bins = _convert_bin_to_datelike_type(bins, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to make this logic flow better, e.g. _convert_bins_to_numeric_type
and _convert_bin_to_datelike_type
are disjointed (e.g. they are separate), maybe combine? I just find that we are doing conversions in 2 places here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well they both serve different purposes;
_convert_bins_to_numeric_type
is used at the start to prep the bins (datelike data -> numeric data) before the main array is cut.
_convert_bin_to_datelike_type
is used at the end to "stylize" the bins (numeric data -> datelike data) after the array is cut.
Plus, I like the explicit naming of these two processes. Just my 2c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's fine then.
git diff upstream/master -u -- "*.py" | flake8 --diff