Skip to content

ENH: return .dt.weekday/isoweekday/month_name/day_name as ordered categoricals #12993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jreback opened this issue Apr 26, 2016 · 14 comments
Open
Labels
Categorical Categorical Data Type Datetime Datetime data dtype Enhancement

Comments

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

#12803 added .dt.weekday_name. I think its appropriate to return this (and .weekday) as ordered categoricals

In [1]: s = Series(pd.date_range('20130101',periods=10))

In [2]: s.dt.weekday
Out[2]: 
0    1
1    2
2    3
3    4
4    5
5    6
6    0
7    1
8    2
9    3
dtype: int64

In [3]: s.dt.weekday_name
Out[3]: 
0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
5       Sunday
6       Monday
7      Tuesday
8    Wednesday
9     Thursday
dtype: object
@jreback jreback added Enhancement Datetime Datetime data dtype Difficulty Novice Categorical Categorical Data Type labels Apr 26, 2016
@jreback jreback added this to the Next Major Release milestone Apr 26, 2016
@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

xfref #12806

cc @BastiaanBergman

I realized as merging #12803 that we didn't actually have to do this in cython and instead is a trivial map operation.

In [7]: s.dt.weekday.map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])))
Out[7]: 
0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
5       Sunday
6       Monday
7      Tuesday
8    Wednesday
9     Thursday
dtype: object

@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

And if you categorize its even easier (and way more efficient)

In [18]: cats
Out[18]: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

In [19]: s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats)
Out[19]: 
0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
5       Sunday
6       Monday
7      Tuesday
8    Wednesday
9     Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]

@jreback jreback modified the milestones: 0.18.2, Next Major Release Apr 26, 2016
@BastiaanBergman
Copy link

I don't know what the speed implications are for big dataframes. In any
case, implementing alongside the existing Cython code wasn't exactly
un-trivial.

On Tue, Apr 26, 2016 at 6:52 AM, Jeff Reback [email protected]
wrote:

And if you categorize its even easier.

In [18]: cats
Out[18]: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

In [19]: s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats)
Out[19]:
0 Tuesday
1 Wednesday
2 Thursday
3 Friday
4 Saturday
5 Sunday
6 Monday
7 Tuesday
8 Wednesday
9 Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#12993 (comment)

@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

@BastiaanBergman no, what I mean is that THIS impl is trivial. Of course the cython is not :<

@kawochen
Copy link
Contributor

kawochen commented May 5, 2016

I would think they shouldn't be ordered (because it's cyclic). An order would probably only enable .max(), and .min(), right?

@jreback
Copy link
Contributor Author

jreback commented May 5, 2016

well also allows comparisons, e.g.

In [4]: os = s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats)

In [5]: os
Out[5]: 
0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
5       Sunday
6       Monday
7      Tuesday
8    Wednesday
9     Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]

In [9]: os
Out[9]: 
0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
5       Sunday
6       Monday
7      Tuesday
8    Wednesday
9     Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]

In [10]: os.min()
Out[10]: 'Monday'

In [11]: os<'Wednesday'
Out[11]: 
0     True
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8    False
9    False
dtype: bool

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 29, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@sivakar12
Copy link

I'd like to give this a try. Can I work on this?

@mroeschke
Copy link
Member

Go for it @sivakar12! Some of the files you may want to edit are in this recent PR https://github.com/pandas-dev/pandas/pull/18164/files

@sivakar12
Copy link

I found that categorical is not defined in the Cython code. So I focused on the DatetimeIndex class, tried calling as_type, returning a CategoricalIndex from the _field_accessor method there. They are not working and I always end up getting dtype: object. What am I missing?

@mroeschke
Copy link
Member

After the index is created, you can either use the map function or astype with predefined categories as described in these comments: #12993 (comment) or #12993 (comment)

@sivakar12
Copy link

I made DatetimeIndex class return a CategoricalIndex when weekday_name property is accessed. But the output of s.dt.weekday_name returns a DatetimeProperties object which seems to convert it back to object type.
The code in the comments apply map or astype on an instance of DatetimeProperties not on DatetimeIndex which works fine.
I can't figure out what's going on inside DatetimeProperties

@mroeschke
Copy link
Member

Feel free to open a pull request (you can mark it as a work in progress) with your initial changes. It will be easier for us to review and help debug the issue.

@mroeschke mroeschke changed the title ENH: return .dt.weekday/weekday_name as ordered categoricals ENH: return .dt.weekday/isoweekday/month_name/day_name as ordered categoricals Jan 24, 2020
@jbrockmendel
Copy link
Member

Not wild about making DatetimeArray have a dependency on Categorical (which in turn has dependency on Index)

@jreback
Copy link
Contributor Author

jreback commented Sep 20, 2020

this would be an indirect dependency and is for user convenience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Datetime Datetime data dtype Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants