Skip to content

BUG: DataFrame.sparse.from_spmatrix hard codes an invalid fill_value for certain subtypes #59063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
christophertitchen opened this issue Jun 21, 2024 · 0 comments · Fixed by #59064
Closed
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@christophertitchen
Copy link
Contributor

christophertitchen commented Jun 21, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from scipy.sparse import eye

pd.DataFrame.sparse.from_spmatrix(eye(2, dtype=bool))

Issue Description

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pandas/pandas/core/arrays/sparse/accessor.py", line 316, in from_spmatrix
    dtype = SparseDtype(array_data.dtype, 0)
  File "/pandas/pandas/core/dtypes/dtypes.py", line 1751, in __init__
    self._check_fill_value()
  File "/pandas/pandas/core/dtypes/dtypes.py", line 1835, in _check_fill_value
    raise ValueError(
ValueError: fill_value must be a valid value for the SparseDtype.subtype

Expected Behavior

The default argument for fill_value should be used instead of passing 0, which will fix the issue as the default missing value selected for bool is False. This bug also affects other dtypes like float and complex without raising a ValueError, as a fill_value of 0. or np.nan and 0. + 0.j, np.nan + 0.j, or np.nan respectively are more appropriate than 0.

We can also introduce a fill_value parameter to the DataFrame.sparse.from_spmatrix method, with a default argument of None, to fix the issue whilst giving the user flexibility to select a fill_value of choice.

dtype = SparseDtype(array_data.dtype, 0)

elif dtype.kind == "b":
if compat:
return False
return np.nan

Installed Versions

INSTALLED VERSIONS

commit : c46fb76
python : 3.10.14.final.0
python-bits : 64
OS : Darwin
OS-release : 23.5.0
Version : Darwin Kernel Version 23.5.0: Wed May 1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 3.0.0.dev0+1125.gc46fb76afa
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 70.1.0
pip : 24.0
Cython : 3.0.10
pytest : 8.2.2
hypothesis : 6.103.2
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.4
IPython : 8.25.0
pandas_datareader : None
adbc-driver-postgresql : None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.4.0
fastparquet : 2024.5.0
fsspec : 2024.6.0
gcsfs : 2024.6.0
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 16.1.0
pyreadstat : 1.2.7
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.6.0
scipy : 1.13.1
sqlalchemy : 2.0.31
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.6.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
1 participant