Skip to content

ENH: Return DatetimeIndex or TimedeltaIndex bins for q/cut when input is datelike #20956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 3, 2018

Conversation

mroeschke
Copy link
Member

@codecov
Copy link

codecov bot commented May 5, 2018

Codecov Report

Merging #20956 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20956      +/-   ##
==========================================
+ Coverage    91.9%   91.91%   +<.01%     
==========================================
  Files         158      158              
  Lines       49690    49695       +5     
==========================================
+ Hits        45670    45677       +7     
+ Misses       4020     4018       -2
Flag Coverage Δ
#multiple 90.29% <100%> (ø) ⬆️
#single 41.95% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/tile.py 94.73% <100%> (+1.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cd2679...17e96c4. Read the comment docs.

@@ -524,6 +524,7 @@ Other Enhancements
- Added new writer for exporting Stata dta files in version 117, ``StataWriter117``. This format supports exporting strings with lengths up to 2,000,000 characters (:issue:`16450`)
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`)
- :func:`date_range` now returns a linearly spaced ``DatetimeIndex`` if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`)
- :func:`cut` and :func:`qcut` now returns a ``DatetimeIndex`` or ``TimedeltaIndex`` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add :class: for DTI / TDI

@@ -332,6 +332,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None,
result = result.astype(np.float64)
np.putmask(result, na_mask, np.nan)

bins = _convert_bin_to_datelike_type(bins, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this just call Index() which will do all of this inference

@jreback jreback added Datetime Datetime data dtype Timedelta Timedelta data type labels May 5, 2018
@mroeschke
Copy link
Member Author

Simplified this a bit with the Index constructor; however, I wanted to avoid turing all bins into an Index type unless this is an API change we want to make.

Also, I created #20964 since the Index constructor does not correctly localize values with a datetimetz dtype specified.

@@ -550,6 +550,7 @@ Other Enhancements
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`)
- :func:`cut` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`20947`)
- :func:`date_range`, :func:`timedelta_range`, and :func:`interval_range` now return a linearly spaced index if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`, :issue:`20983`, :issue:`20976`)
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24

bins : Array-like of bins, DatetimeIndex or TimedeltaIndex if dtype is
datelike
"""
# Can be simplified once GH 20964 is fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to fix #20964 first

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke can you rebase & move note to 0.24.0. ping and will have another look.

@mroeschke
Copy link
Member Author

Updated and ready for a final review @jreback

@@ -364,6 +365,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None,
result = result.astype(np.float64)
np.putmask(result, na_mask, np.nan)

bins = _convert_bin_to_datelike_type(bins, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to make this logic flow better, e.g. _convert_bins_to_numeric_type and _convert_bin_to_datelike_type are disjointed (e.g. they are separate), maybe combine? I just find that we are doing conversions in 2 places here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well they both serve different purposes;
_convert_bins_to_numeric_type is used at the start to prep the bins (datelike data -> numeric data) before the main array is cut.

_convert_bin_to_datelike_type is used at the end to "stylize" the bins (numeric data -> datelike data) after the array is cut.

Plus, I like the explicit naming of these two processes. Just my 2c.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine then.

@jreback jreback added this to the 0.24.0 milestone Jul 3, 2018
@jreback jreback merged commit 79bc335 into pandas-dev:master Jul 3, 2018
@mroeschke mroeschke deleted the cut_date_bins branch July 3, 2018 16:41
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: cut return bins should be Timestamp/Timedelta objects for datelike data
2 participants