Skip to content

allowing datetime and timedelta datatype in pd cut bins #14798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

aileronajay
Copy link
Contributor

@aileronajay aileronajay commented Dec 4, 2016

xref #14714, follow-on to #14737

@aileronajay
Copy link
Contributor Author

The change is currently WIP, will add tests and other change, @jorisvandenbossche

@sinhrks sinhrks added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Timedelta Timedelta data type Datetime Datetime data dtype labels Dec 4, 2016
@codecov-io
Copy link

codecov-io commented Dec 5, 2016

Current coverage is 84.64% (diff: 88.88%)

Merging #14798 into master will increase coverage by <.01%

@@             master     #14798   diff @@
==========================================
  Files           144        144          
  Lines         51021      51030     +9   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43188      43196     +8   
- Misses         7833       7834     +1   
  Partials          0          0          

Powered by Codecov. Last update f79bc7a...82bffa1

@@ -313,6 +313,18 @@ def test_datetime_cut(self):
result, bins = cut(data, 3, retbins=True)
tm.assert_series_equal(Series(result), expected)

def test_datetime_bin(self):
data = [np.datetime64('2012-12-13'), np.datetime64('2012-12-15')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

Copy link
Contributor Author

@aileronajay aileronajay Dec 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i dont think there is an open issue for this change, @jorisvandenbossche had proposed this change when i was making changes to cut to allow datetime and timedelta data types

@@ -313,6 +313,18 @@ def test_datetime_cut(self):
result, bins = cut(data, 3, retbins=True)
tm.assert_series_equal(Series(result), expected)

def test_datetime_bin(self):
data = [np.datetime64('2012-12-13'), np.datetime64('2012-12-15')]
bins = [np.datetime64('2012-12-12'), np.datetime64('2012-12-14'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pd.Timestamp(....) instead of direct np.datetime64

Copy link
Contributor

@jreback jreback Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you prob want to test with datetime.datetime, Timestamp, np.datetime64 (but just put them in a loop something like

data = ['2012-12-12', '2012-12-14']

for conv in [Timestamp(x).to_pydatetime, Timestamp, np.datetime64]:
      bins = [ conv(v) for v in data ]

also test
bins = pd.to_datetime(data)

these should all work, because internally you need to wrap a Timestamp converter around each of fhe bins (if dtype==M8) or Timedelta if dtype==m8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback made the changes, i added a new method in tile.py to handle the time type bins

@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Dec 6, 2016
@jorisvandenbossche
Copy link
Member

@aileronajay can you update this?

@aileronajay
Copy link
Contributor Author

@jorisvandenbossche i am caught up with some stuff right now, i should be able to make these changes next week

@jreback
Copy link
Contributor

jreback commented Dec 22, 2016

thanks!

ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016
xref pandas-dev#14714, follow-on to pandas-dev#14737

Author: Ajay Saxena <[email protected]>

Closes pandas-dev#14798 from aileronajay/cut_timetype_bin and squashes the following commits:

82bffa1 [Ajay Saxena] added method for time type bins in pd cut and modified tests
ac919cf [Ajay Saxena] added test for datetime bin type
355e569 [Ajay Saxena]  allowing datetime and timedelta datatype in pd cut bins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants