BUG: error writing sparse DataFrame to HDF #42070

raphaelquast · 2021-06-17T12:00:44Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd

filepath = r"<path where the file should be stored>"

data = [1,2,None,None,None,3]
sparse_data = pd.arrays.SparseArray(data)
df = pd.DataFrame(dict(a=sparse_data))

df.to_hdf(filepath, "data", format="t", data_columns=True)

>>>  File "...\lib\site-packages\pandas\io\pytables.py", line 2323, in _get_atom
>>>    itemsize = dtype.itemsize  # type: ignore[attr-defined]
>>>  AttributeError: 'SparseDtype' object has no attribute 'itemsize'

Problem description

the code-example tells it all... saving a sparse-DataFrame to HDF raises an error

Expected Output

a properly exported DataFrame

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17763
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : German_Austria.1252

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : None
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2021.06.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.17.0
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

Eyon42 · 2022-10-02T13:28:22Z

This error also arises when calling Series.nlargest() on a slice of a sparse matrix

sparseMatrix.iloc[:,id].nlargest(5)
will throw:
AttributeError: 'SparseDtype' object has no attribute 'itemsize'

sunghjung3 · 2024-07-08T22:12:35Z

My issue is similar, and it would be awesome if sparse DataFrames could be written to HDF.

When I run this:

sparse_df = df.astype(pd.SparseDtype(float, fill_value=0.0))
sparse_df.to_hdf(...)

I get this error:

TypeError: objects of type ``SparseArray`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or bytes

raphaelquast added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2021

mroeschke added IO HDF5 read_hdf, HDFStore Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021

tehunter mentioned this issue Sep 26, 2022

BUG: pd.Series.duplicated/pd.core.algorithms.duplicated fails with float-based SparseArray #48788

Closed

3 tasks

rhshadrach mentioned this issue Oct 27, 2024

QST: how can I save my sparse dataframe with indexes and columns to a format other than .pkl? #60017

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: error writing sparse DataFrame to HDF #42070

BUG: error writing sparse DataFrame to HDF #42070

raphaelquast commented Jun 17, 2021 •

edited

Loading

INSTALLED VERSIONS

Eyon42 commented Oct 2, 2022

sunghjung3 commented Jul 8, 2024

BUG: error writing sparse DataFrame to HDF #42070

BUG: error writing sparse DataFrame to HDF #42070

Comments

raphaelquast commented Jun 17, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Eyon42 commented Oct 2, 2022

sunghjung3 commented Jul 8, 2024

raphaelquast commented Jun 17, 2021 •

edited

Loading

Output of `pd.show_versions()`