-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: After running to_timestamp
on Series
with PeriodIndex
, frequency information is lost
#51256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for your report. This is intended since 2 values do not reliably provide frequency information. The documentation is not that easy to find though when using to_timestamp (see for example https://pandas.pydata.org/docs/reference/api/pandas.infer_freq.html#pandas-infer-freq). Edit: Forgot to reply to the second part. to_timestamp converts to the beginning of your period per default. That's why we are changing the frequency in the second example. Interesting enough, setting how="end" sets the freq to None, this seems like a bug. |
If
But being an instance method of |
It seems like the freq is always inferred, even if you specify it explicitly it gets overridden. Inference works only with 3 or more elements. But not really familiar enough with the conversion that happens here to judge. We can't simply use the PeriodIndex freq, because the freq might change during the conversion cc @MarcoGorelli any idea why a passed freq wouldn't be honoured? |
Thanks @j-bennet for the report
so would your expect output be DatetimeIndex(['2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31',
'2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31'],
dtype='datetime64[ns]', freq='A-DEC') ? Not sure that would be expected |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I'm calling
to_timestamp
on Series. Initially, Series has aPeriodIndex
, withfreq='A'
. However, the Series returned fromto_timestamp
doesn't preserve the samefreq
. The behavior depends on Series size. If it's less than 3 records, the resulting Series hasfreq=None
. If it's 3 and over, the resulting Series hasfreq=YearBegin
.The snippet above creates 2 dataframes, one with 2 records, and one with 8 records, and converts the series
to_timestamp
. It outputs indexfreq
before and after conversion:The frequency information is lost, but in a different way.
Possibly related (not the same):
Expected Behavior
Since the
PeriodIndex
already hasfreq
information, it should be preserved when converting toDatetimeIndex
.Installed Versions
pandas : 2.0.0.dev0+1448.gfcb8b809e9
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 66.1.1
pip : 23.0
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.9.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : None
brotli :
fastparquet : 2023.1.0
fsspec : 2023.1.0
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.1.0
scipy : 1.10.0
snappy :
sqlalchemy : 1.4.46
tables : 3.7.0
tabulate : None
xarray : 2023.1.0
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: