Skip to content

BUG: default value for dtype_backend is not applied by default #58938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
pumpkinlink opened this issue Jun 5, 2024 · 5 comments · Fixed by #59021
Closed
2 of 3 tasks

BUG: default value for dtype_backend is not applied by default #58938

pumpkinlink opened this issue Jun 5, 2024 · 5 comments · Fixed by #59021
Assignees
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@pumpkinlink
Copy link

pumpkinlink commented Jun 5, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

python
# (interactive shell)
import pandas
s = pandas.Series(['150', pandas.NA])
print(pandas.to_numeric(s))
# 0    150.0
# 1      NaN
# dtype: float64
print(pandas.to_numeric(s, dtype_backend='numpy_nullable'))
# 0     150
# 1    <NA>
# dtype: Int64


### Issue Description

Documentation says that `numpy_nullable` is the default value for the `dtype_backend` argument, 
yet the funcion works differently whether you pass it explicitly or not.

### Expected Behavior

``` python
# (interactive shell)
import pandas
s = pandas.Series(['150', pandas.NA])
print(pandas.to_numeric(s))
# 0     150
# 1    <NA>
# dtype: Int64

Installed Versions

/home/denisfranco/.local/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.10.12.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 6.5.0-35-generic
Version               : #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May  7 09:00:52 UTC 2
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : pt_BR.UTF-8

pandas                : 2.2.2
numpy                 : 1.22.4
pytz                  : 2022.1
dateutil              : 2.8.2
setuptools            : 65.6.3
pip                   : 22.0.2
Cython                : None
pytest                : None
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : 4.8.0
html5lib              : 1.1
pymysql               : None
psycopg2              : None
jinja2                : 3.1.2
IPython               : 8.4.0
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : 3.1.2
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None
@pumpkinlink pumpkinlink added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2024
@rhshadrach
Copy link
Member

Thanks for the report. I'd guess that this is the intention as integers with NA values cannot be stored in a NumPy array (they would be coerced to float or object). If that is the case, the docstring could be improved to include this (or a more generic description of it).

This was originally added in #50505; @phofl - was this case considered?

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions labels Jun 6, 2024
@phofl
Copy link
Member

phofl commented Jun 6, 2024

the doc string is incorrect, sorry about this

the behaviour is correct

@phofl phofl added Docs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2024
@luke396
Copy link
Contributor

luke396 commented Jun 13, 2024

take

@luke396
Copy link
Contributor

luke396 commented Jun 14, 2024

It is observed that the resulting data type is not solely dependent on the presence of NaN values.

ser2 = pd.Series(['150.0', '2'])
print(pd.to_numeric(ser2, dtype_backend='numpy_nullable'))
print(pd.to_numeric(ser2, dtype_backend='pyarrow'))
print(pd.to_numeric(ser2))

# 0    150.0
# 1      2.0
# dtype: Float64
# 0    150.0
# 1      2.0
# dtype: double[pyarrow]
# 0    150.0
# 1      2.0
# dtype: float64

The default dtype_backend value of NO_DEFAULT does not yield results similar to those obtained with numpy_nullable or pyarrow.

dtype_backend: DtypeBackend | lib.NoDefault = lib.no_default,

values, new_mask = lib.maybe_convert_numeric( # type: ignore[call-overload]
values,
set(),
coerce_numeric=coerce_numeric,
convert_to_masked_nullable=dtype_backend is not lib.no_default
or isinstance(values_dtype, StringDtype)
and not values_dtype.storage == "pyarrow_numpy",
)

@rhshadrach
Copy link
Member

Good find @luke396 - this documentation became incorrect in #54104. We should fix that as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants