Skip to content

BUG: reset_index() looses the frequency of a DatetimeIndex #59273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
annika-rudolph opened this issue Jul 18, 2024 · 9 comments
Closed
3 tasks done

BUG: reset_index() looses the frequency of a DatetimeIndex #59273

annika-rudolph opened this issue Jul 18, 2024 · 9 comments
Assignees
Labels
Bug Closing Candidate May be closeable, needs more eyeballs

Comments

@annika-rudolph
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> index = pd.DatetimeIndex(pd.date_range(start="2000", freq = 'YS', periods = 10), name = 'Date')
>>> df = pd.DataFrame(data=list(range(10)), index = index)
>>> print(df.index.freq)
<YearBegin: month=1>
>>> print(df.reset_index()['Date']._values.freq)
None
>>> df = df.reset_index().set_index('Date')
>>> print(df.index.freq)
None

Issue Description

When doing reset_index() on a DatetimeIndex this leads to the frequency being lost. Although the newly created column is a DatetimeArray, it does not seem to carry the freq attribute. As a result, when doing reset_index() -> set_index() I cannot restore the original index which potentially creates issues.

Expected Behavior

I would expect that reset_index().set_index() let's me recover the original index :)

Installed Versions

INSTALLED VERSIONS

commit : bfe5be0
python : 3.10.12
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 0+untagged.34794.gbfe5be0
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 22.0.2
Cython : 3.0.10
sphinx : 7.3.7
IPython : 8.23.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.3.1
html5lib : 1.1
hypothesis : 6.100.1
gcsfs : 2024.3.1
jinja2 : 3.1.3
lxml.etree : 5.2.1
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.2
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 16.0.0
pyreadstat : 1.2.7
pytest : 8.1.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.3.1
scipy : 1.13.0
sqlalchemy : 2.0.29
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.3.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@annika-rudolph annika-rudolph added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 18, 2024
@aram-cedarwood
Copy link
Contributor

take

@aram-cedarwood
Copy link
Contributor

I did some digging, and it seems it's intended that freq becomes None in a column:

if isinstance(values, (DatetimeArray, TimedeltaArray)) and values.freq is not None:
# freq is only stored in DatetimeIndex/TimedeltaIndex, not in Series/DataFrame
values = values._with_freq(None)

The above was added in this PR #41425, which mentions that "The long-term behavior is definitely going to always drop the freq (more specifically, DTA/TDA won't have freq, xref #31218). So this PR standardizes always-dropping."

@annika-rudolph What do you think?
Also @jbrockmendel @jreback @mroeschke @jorisvandenbossche you created/reviewed/were mentioned in the PR. What are your thoughts on this issue?

@annika-rudolph
Copy link
Contributor Author

annika-rudolph commented Jul 23, 2024

Thanks for digging into this! It is what I suspected :)

From a user perspective I can say that frequencies in DatetimeIndices are quite important, even more so since some functionality (like businessday and resample) will be dropped for Periodindices -- which for us means that we recently moved everything to DatetimeIndices. Thus, it would be nice if they could cover the same functionality as Periodindices and specifically, the frequency attribute could be retained in all transformations.
Reset_index() -> set_index() is a common pattern that I see a lot when working with MultiIndices, which is also very relevant in many of my projects.

It seems to me that the decision on always dropping the frequency was taken some time ago (before deciding to drop PeriodIndex functionality?), so maybe it could be reconsidered?

@yuanx749
Copy link
Contributor

I encountered this issue and did some debugging. It is the reshape below that leads to loss of freq.

values = values.reshape(1, -1)

But as mentioned by @aram-cinnamon , I think this behaviour is expected.

@rhshadrach rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2024
@TorstenPietrek
Copy link

With the deprecation of PeriodIndex functionality progressing and the current recommendation to use a DateTimeIndex in place, I think the frequency property of a DateTimeIndex should not be dropped. If you drop the frequency property the object no longer holds information on the periods, so it could not be used to replace the PeriodIndex. @jbrockmendel it would be great to hear your opinion on that

@annika-rudolph
Copy link
Contributor Author

Hi @jbrockmendel would you mind to briefly confirm that this is indeed as intended (ie. that the frequency attribute of DTI/DTA is intentionally not always carried through)?

Otherwise I would close this issue.

@jbrockmendel
Copy link
Member

Yes, it is expected that obj.reset_index() will drop the .freq attribute from a DatetimeIndex. It is possible to change this, but the effort involved and the code complexity (and in some cases performance degradation) it would introduce would be significant.

@jbrockmendel
Copy link
Member

With the deprecation of PeriodIndex

PeriodIndex is not deprecated and that is unlikely to change. PeriodIndex with BDay freq has been deprecated, but that deprecation is not being enforced until at least 4.0.

@annika-rudolph
Copy link
Contributor Author

alright, thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

6 participants