Skip to content

Series constructor with missing values and int dtype fails or passes depending on data type #22585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Sep 4, 2018 · 2 comments · Fixed by #37090
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@TomAugspurger
Copy link
Contributor

I'm not sure what the expected behavior is, but these should match

In [16]: pd.Series([1, 2, np.nan], dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-f0075752a7f2> in <module>()
----> 1 pd.Series([1, 2, np.nan], dtype=int)

~/sandbox/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    280             else:
    281                 data = _sanitize_array(data, index, dtype, copy,
--> 282                                        raise_cast_failure=True)
    283
    284                 data = SingleBlockManager(data, index, fastpath=True)

~/sandbox/pandas/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   4154         if dtype is not None:
   4155             try:
-> 4156                 subarr = _try_cast(data, False)
   4157             except Exception:
   4158                 if raise_cast_failure:  # pragma: no cover

~/sandbox/pandas/pandas/core/series.py in _try_cast(arr, take_fast_path)
   4087             # that we can convert the data to the requested dtype.
   4088             if is_float_dtype(dtype) or is_integer_dtype(dtype):
-> 4089                 subarr = maybe_cast_to_integer_array(arr, dtype)
   4090
   4091             subarr = maybe_cast_to_datetime(arr, dtype)

~/sandbox/pandas/pandas/core/dtypes/cast.py in maybe_cast_to_integer_array(arr, dtype, copy)
   1341     try:
   1342         if not hasattr(arr, "astype"):
-> 1343             casted = np.array(arr, dtype=dtype, copy=copy)
   1344         else:
   1345             casted = arr.astype(dtype, copy=copy)

ValueError: cannot convert float NaN to integer

and when data is an array, we end up with float dtype.

In [17]: pd.Series(np.array([1, 2, np.nan]), dtype=np.int64)
Out[17]:
0    1.0
1    2.0
2    NaN
dtype: float64
@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions labels Sep 4, 2018
@TomAugspurger TomAugspurger changed the title Series constructor with missing values and int dtype Series constructor with missing values and int dtype fails or passes depending on data type Sep 4, 2018
@alanbato
Copy link
Contributor

alanbato commented Sep 4, 2018

My initial guess would be that since we're explicitly requesting an int dtype we should get that as the type of the resulting Series, and maybe make a special case for npn.nan so it doesn't fail when it can't be casted?

However this is not any of those two behaviors, so if we want to go with one of those two I would go to falling back to what happens when data is an array.

@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Dec 28, 2019
@mroeschke mroeschke added the Bug label Jun 28, 2020
@arw2019
Copy link
Member

arw2019 commented Oct 12, 2020

They match on 1.2 master:

In [19]: pd.Series([1, 2, np.nan], dtype=np.int)                                                                         
...
/workspaces/pandas-arw2019/pandas/core/dtypes/cast.py in maybe_cast_to_integer_array(arr, dtype, copy)
   1743     try:
   1744         if not hasattr(arr, "astype"):
-> 1745             casted = np.array(arr, dtype=dtype, copy=copy)
   1746         else:
   1747             casted = arr.astype(dtype, copy=copy)

ValueError: cannot convert float NaN to integer

In [20]: pd.Series([1, 2, np.nan], dtype=int)                                                                            
(...)
/workspaces/pandas-arw2019/pandas/core/dtypes/cast.py in maybe_cast_to_integer_array(arr, dtype, copy)
   1743     try:
   1744         if not hasattr(arr, "astype"):
-> 1745             casted = np.array(arr, dtype=dtype, copy=copy)
   1746         else:
   1747             casted = arr.astype(dtype, copy=copy)

ValueError: cannot convert float NaN to integer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants