Skip to content

ENH: dt.day_of_week should return int8 #58185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
WillAyd opened this issue Apr 8, 2024 · 5 comments
Open
3 tasks done

ENH: dt.day_of_week should return int8 #58185

WillAyd opened this issue Apr 8, 2024 · 5 comments
Assignees
Labels
Datetime Datetime data dtype Enhancement

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 8, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

For NumPy types today this returns int32.

In [3]: pd.Series(["2024-01-01", "2024-01-02", "2024-01-03"], dtype="datetime64[us]").dt.day_of_week
Out[3]: 
0    0
1    1
2    2
dtype: int32

pyarrow dates return int64:

In [18]:   pa_arr = pa.array([
    ...:       datetime.date(2024, 1, 1),
    ...:       datetime.date(2024, 1, 2),
    ...:       datetime.date(2024, 1, 3),
    ...:   ])
    ...:   ser = pd.Series(pa_arr, dtype=pd.ArrowDtype(pa.date32()))

In [19]: ser.dt.day_of_week
Out[19]: 
0    0
1    1
2    2
dtype: int64[pyarrow]

Feature Description

Both could reasonably return int8 or even uint8 since the domain values are 0-6

Alternative Solutions

status quo

Additional Context

No response

@WillAyd WillAyd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Timestamp pd.Timestamp and associated methods and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2024
@mroeschke
Copy link
Member

For pyarrow types, this would probably be a better enhancement request to pyarrow as it just uses pyarrow.compute.day_of_week

@jbrockmendel jbrockmendel added Datetime Datetime data dtype and removed Timestamp pd.Timestamp and associated methods labels Apr 8, 2024
@rmhowe425
Copy link
Contributor

take

@rmhowe425
Copy link
Contributor

Hi @WillAyd I'm trying to work on the implementation for this issue and I'm getting a little lost here.

It looks like when a DateTimeArray is initialized, day_of_week is automatically created as an int32 type, and I'm not seeing any way to change that in the Python code.

I dug a bit deeper and it looks like I may need to make some changes to the underlying ccalendar.pxd and ccalendar.pyx files used for Datetime objects. Specifically, the dayofweek function.

Would you agree that I'll need to make changes in the underlying Cython code? Or am I going down a rabbit hole?

@rmhowe425
Copy link
Contributor

rmhowe425 commented Jul 14, 2024

@WillAyd From what I can tell, ser.dt.day_of_week is created after cls._simple_new(subarr, freq=inferred_freq, dtype=data_dtype) is executed on line 403 in datetimes.py.

Looking at similar PRs, it looks like all DatetimeArray attributes are treated as int32 type, which tells me that I'll need to just modify day_of_week. Which leads me to believe that the simplest way to do this would be to modify ccalendar files in pandas._libs.tslibs to return what I assume to be the Cython equivalent of numpy.uint8

Looking at DatetimeIndex attributes, it looks like a good number of these fields could be numpy.uint8 🤷‍♂️

@WillAyd
Copy link
Member Author

WillAyd commented Jul 14, 2024

Hey @rmhowe425 thanks for taking a look. I'm not sure of all the places that need to be updated, but yes I expect the core of the issue will need to be tackled in Cython.

If you get somewhat close I would advise just pushing up a draft PR for discussion; usually easier to discuss and advise that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants