-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: PeriodIndex inconsistent deserialization with HDF5 - PyTables #41978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So I have figured out the issue: but the case was unhandled in For The fixed-format in both |
E.g. A simple but not clean fix will be to add a corner case in factory = Index
if is_datetime64_dtype(values.dtype) or is_datetime64tz_dtype(values.dtype):
factory = DatetimeIndex
elif "freq" in kwargs:
# workaround for PeriodIndex
def f(values, freq=None, **kwargs):
parr = PeriodArray._simple_new(values, freq=freq)
return PeriodIndex._simple_new(parr, **kwargs)
factory = f From my understanding, the |
I also noticed that both fixed and table format can not handle the store of values where the underlying array is series_p = pd.Series(data=pd.date_range(start='2015-01', end='2016-01', freq='M').to_period('M'))
store.put('/a/c', series_p, format='fixed')
store.put('/a/d', series_p, format='table') Output ( TypeError: objects of type ``PeriodArray`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or bytes PyTables TypeError: int() argument must be a string, a bytes-like object or a number, not 'Period' |
@mroeschke Is it ok if I start working on that since it's confirmed? I was able to patch my local pandas last year but haven't got time to re-attend to this since then. |
Sure go for it @ra1nty |
I have checked that this issue has not already been reported.
There was a issue 5 years ago mentioned that
.to_hdf()
acts inconsistently across Python2 & 3 onPeriodIndex
for fixed formatDataFrame with PeriodIndex written in Python2 gets an Int64Index when read back in Python3 #16781
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
The bug exists, but behavior is different - see next comment
I noticed that the deserialization of a pandas Series/DataFrame with
PeriodIndex
from HDF5 file is inconsistent when using PyTables format: The retrieved series/df index will be converted toInt64Index
instead ofPeriodIndex
: See code below for exampleOutput:
Output:
Problem description
Inconsistent output with HDF5 file & PyTables format
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252
pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.1
setuptools : 51.0.0.post20201207
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: