Skip to content

BUG: error writing sparse DataFrame to HDF #42070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
raphaelquast opened this issue Jun 17, 2021 · 2 comments
Open
2 of 3 tasks

BUG: error writing sparse DataFrame to HDF #42070

raphaelquast opened this issue Jun 17, 2021 · 2 comments
Labels
Bug IO HDF5 read_hdf, HDFStore Sparse Sparse Data Type

Comments

@raphaelquast
Copy link

raphaelquast commented Jun 17, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd

filepath = r"<path where the file should be stored>"

data = [1,2,None,None,None,3]
sparse_data = pd.arrays.SparseArray(data)
df = pd.DataFrame(dict(a=sparse_data))

df.to_hdf(filepath, "data", format="t", data_columns=True)
>>>  File "...\lib\site-packages\pandas\io\pytables.py", line 2323, in _get_atom
>>>    itemsize = dtype.itemsize  # type: ignore[attr-defined]
>>>  AttributeError: 'SparseDtype' object has no attribute 'itemsize'

Problem description

the code-example tells it all... saving a sparse-DataFrame to HDF raises an error

Expected Output

a properly exported DataFrame

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17763
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : German_Austria.1252

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : None
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2021.06.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.17.0
xlrd : None
xlwt : None
numba : None

@raphaelquast raphaelquast added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2021
@mroeschke mroeschke added IO HDF5 read_hdf, HDFStore Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
@Eyon42
Copy link

Eyon42 commented Oct 2, 2022

This error also arises when calling Series.nlargest() on a slice of a sparse matrix

sparseMatrix.iloc[:,id].nlargest(5)
will throw:
AttributeError: 'SparseDtype' object has no attribute 'itemsize'

@sunghjung3
Copy link

My issue is similar, and it would be awesome if sparse DataFrames could be written to HDF.

When I run this:

sparse_df = df.astype(pd.SparseDtype(float, fill_value=0.0))
sparse_df.to_hdf(...)

I get this error:

TypeError: objects of type ``SparseArray`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

4 participants