Skip to content

ENH: Period with YYYY-UU (week of the year) #48947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
lohraspco opened this issue Oct 5, 2022 · 7 comments
Closed
1 of 3 tasks

ENH: Period with YYYY-UU (week of the year) #48947

lohraspco opened this issue Oct 5, 2022 · 7 comments
Assignees
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Period Period data type

Comments

@lohraspco
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I am working with a widely used index for many industries which has week of year as index. Here is an example:
"2020-01"
"2020-02"
...
"2020-52"
"2021-01"
...
"2021-52"
dtype: str

Right now there is no feature to present this kind of Period and I have to use it like this:

2019-12-29
2020-01-05
...
2020-12-20
2020-12-27
...
2021-12-19
dtype: period[7D]

What I suggest is a new period, say YW, which is formatted as YYYY-UU as below:
2020-01
2020-02
...
2020-52
2021-01
...
2021-52
dtype: period[YW]

Feature Description

def to_period(freq="YW"):
""""Gets a single index as "2021-52"
returns Period (2021-52, YW)
""""

Alternative Solutions

def convert_weekstr_to_period(idx: str) -> pd.Period:
"""Converts single string week to Period.
The weeks start from 00 and the offset of one week will be applied.
Args:
idx (str): str of format yyyy-ww, e.g., 2022-51

Returns:
    Pandas Period: e.g., Period('2022-12-18', 'D')
"""
current_idx_dt = pd.to_datetime(idx + "-0", format="%Y-%U-%w")
one_week_offset = current_idx_dt - pd.Timedelta(weeks=1)
return one_week_offset.to_period("D")

def pd_series_weekstr_to_period(dsw: pd.Series) -> pd.Series:
"""Converts a Pandas Series string week to Period.

Args:
    dsw (pd.Series): the Pandas Series with strings of format yyyy-Www

Returns:
    Pandas Series: a Pandas Series with Period of format  Period('2022-12-18', 'D')
"""
return dsw.map(convert_weekstr_to_period)

Additional Context

The current format has a day attached that is redundant.

@lohraspco lohraspco added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 5, 2022
@u5927645
Copy link

u5927645 commented Oct 7, 2022

take

@u7238722
Copy link

u7238722 commented Oct 7, 2022

take

@joooeey
Copy link
Contributor

joooeey commented Dec 14, 2022

This is a part of #48000 . I've done an overview of the state of Pandas in that issue conversation. It includes lots of examples on how weeks and other periods are handled now. It's convoluted and definitely worth a look.

@lohraspco
Copy link
Author

Hi @joooeey

I like the explanations you provided in #48000. However, I would like to give my suggestion another try and I also modify my feature suggestion. I believe that reopening the ticket and exploring the potential benefits of this feature could bring significant value to pandas users, especially those who work extensively with week of year periods.

Problem statement:
Assume that we have an index in the form of string ("2022-W31" or "202231") or integer (202231) and we convert it to period. df.index = pd.PeriodIndex(pd.to_datetime(df.index + "-0",format="%Y-W%W-%w"), freq='W') which results in indices like Period('2022-05-16/2022-05-22', 'W-SUN'). While it is works, it will still require effort to get a specific week. If you want get loc of week 202123 then either you have to add a redundant column df['WoY'] = df.index.strftime("%Y%W") and filter on WoY column, or use df.loc[(df.index.year == 2021) & (df.index.weekofyear == 23)]. Now assume that you have a MultiIndex dataframe with one level week of year. The code will become messy.

Solution:
My suggestion is to have

  • a representation of the current PeriodIndex in the form of yyyyuu (instead of '2022-05-16/2022-05-22' it becomes 202220).
  • can be constructed easily from string ("2022-W31" or "202231") or integer (202231)
  • can have different company calendars (start and end weeks)
  • on plots show the week format as yyyyuu
import pandas as pd
df4 = pd.DataFrame({'data': [5, 2, 3]}, index=['2022-W51', '2022-W53', '2023-W02'])
def parse_week_of_year(week_str):
    year, week = week_str.split('-W')
    return pd.to_datetime(year, format='%Y') + pd.DateOffset(weeks=int(week)-1)
df4.index = pd.PeriodIndex([parse_week_of_year(idx) for idx in df4.index], freq='W')
# Filling the missing weeks
full_index = pd.period_range(start=df4.index.min(), end= df4.index.max(), freq='W')
df4 = df4.reindex(full_index)
df4.fillna(0).plot()

image

The X-axis has datetime format, not week of year format

Thanks, Matt

@joooeey
Copy link
Contributor

joooeey commented Jun 26, 2023

I agree with the others that we don't need an extra method to retrieve a year-week string from a Period as you proposed in #49355 as there are easy ways to achieve this and if we go down this route the combinations get endless (methods for year and week separately already exist and there's also .strftime).

However, as a user I'd appreciate it if the other direction (reading) was possible. In my opinion support for ISO 8601 is a worthwhile goal. I.e. Period("2022W31") should simply return a period spanning that week.

I'm not so excited about esoteric use cases such as Period("202231", freq="W") like you appear to suggest but I wouldn't rule out supporting that too.

@lohraspco
Copy link
Author

Thanks Joooeey for your comment. Since this feature may not be widely used (I still see many use cases in different industries), I will come up with another method. I have the following comments about the issues with some of the week of year handling approaches:
1- string (e.g., 2022W23): date operations cannot be applied directly, plotting won't recognize intervals and missing weeks.
2- current period format: plots will show the date instead of week of year (as in figure in my previous comment).
3- integer format: date operations cannot be applied directly and plots are not correct moving from one year to the other.

@jbrockmendel jbrockmendel added the Period Period data type label Nov 1, 2023
@mroeschke
Copy link
Member

Closing as covered by #48000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Period Period data type
Projects
None yet
Development

No branches or pull requests

6 participants