Skip to content

BUG: After running to_timestamp on Series with PeriodIndex, frequency information is lost #51256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
j-bennet opened this issue Feb 9, 2023 · 4 comments
Open
3 tasks done
Labels
Bug freq retention User expects "freq" attribute to be preserved Frequency DateOffsets

Comments

@j-bennet
Copy link

j-bennet commented Feb 9, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np


def make_df(size):
    index = pd.period_range(freq="A", start="1/1/2001", periods=size)
    return pd.DataFrame({"x": np.arange(0, size)}, index=index)


df1 = make_df(2)
df2 = make_df(8)
res1 = df1.x.to_timestamp()
res2 = df2.x.to_timestamp()
print(f"{df1.x.index=}")
print(f"before: {df1.x.index.freq=}")
print(f"after: {res1.index.freq=}")
print()
print(f"{df2.x.index=}")
print(f"before: {df2.x.index.freq=}")
print(f"after: {res2.index.freq=}")

Issue Description

I'm calling to_timestamp on Series. Initially, Series has a PeriodIndex, with freq='A'. However, the Series returned from to_timestamp doesn't preserve the same freq. The behavior depends on Series size. If it's less than 3 records, the resulting Series has freq=None. If it's 3 and over, the resulting Series has freq=YearBegin.

The snippet above creates 2 dataframes, one with 2 records, and one with 8 records, and converts the series to_timestamp. It outputs index freq before and after conversion:

df1.x.index=PeriodIndex(['2001', '2002'], dtype='period[A-DEC]')
before: df1.x.index.freq=<YearEnd: month=12>
after: res1.index.freq=None

df2.x.index=PeriodIndex(['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008'], dtype='period[A-DEC]')
before: df2.x.index.freq=<YearEnd: month=12>
after: res2.index.freq=<YearBegin: month=1>

The frequency information is lost, but in a different way.

Possibly related (not the same):

Expected Behavior

Since the PeriodIndex already has freq information, it should be preserved when converting to DatetimeIndex.

Installed Versions

INSTALLED VERSIONS ------------------ commit : fcb8b80 python : 3.10.8.final.0 python-bits : 64 OS : Darwin OS-release : 22.3.0 Version : Darwin Kernel Version 22.3.0: Thu Jan 5 20:50:36 PST 2023; root:xnu-8792.81.2~2/RELEASE_ARM64_T6020 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.0.0.dev0+1448.gfcb8b809e9
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 66.1.1
pip : 23.0
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.9.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli :
fastparquet : 2023.1.0
fsspec : 2023.1.0
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.1.0
scipy : 1.10.0
snappy :
sqlalchemy : 1.4.46
tables : 3.7.0
tabulate : None
xarray : 2023.1.0
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

@j-bennet j-bennet added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2023
@phofl
Copy link
Member

phofl commented Feb 9, 2023

Hi, thanks for your report.

This is intended since 2 values do not reliably provide frequency information. The documentation is not that easy to find though when using to_timestamp (see for example https://pandas.pydata.org/docs/reference/api/pandas.infer_freq.html#pandas-infer-freq).

Edit: Forgot to reply to the second part. to_timestamp converts to the beginning of your period per default. That's why we are changing the frequency in the second example.

Interesting enough, setting how="end" sets the freq to None, this seems like a bug.

@phofl phofl added Frequency DateOffsets Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member Closing Candidate May be closeable, needs more eyeballs labels Feb 9, 2023
@j-bennet
Copy link
Author

This is intended since 2 values do not reliably provide frequency information.

If PeriodIndex can have freq with only 2 records, why can't DatetimeIndex?

to_timestamp converts to the beginning of your period per default. That's why we are changing the frequency in the second example.

But being an instance method of PeriodIndex, why does it disregard the original frequency of PeriodIndex?

@phofl
Copy link
Member

phofl commented Mar 1, 2023

It seems like the freq is always inferred, even if you specify it explicitly it gets overridden. Inference works only with 3 or more elements. But not really familiar enough with the conversion that happens here to judge.

We can't simply use the PeriodIndex freq, because the freq might change during the conversion

cc @MarcoGorelli any idea why a passed freq wouldn't be honoured?

@MarcoGorelli
Copy link
Member

Thanks @j-bennet for the report

it should be preserved when converting to DatetimeIndex.

so would your expect output be

DatetimeIndex(['2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31',
               '2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

?

Not sure that would be expected

@jbrockmendel jbrockmendel mentioned this issue May 29, 2023
3 tasks
@jbrockmendel jbrockmendel added the freq retention User expects "freq" attribute to be preserved label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug freq retention User expects "freq" attribute to be preserved Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

4 participants