Skip to content

BUG: Fix for convert_dtypes with mix of int and string #32126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Bug fixes

- Fix bug in :meth:`DataFrame.convert_dtypes` for columns that were already using the ``"string"`` dtype (:issue:`31731`).
- Fixed bug in setting values using a slice indexer with string dtype (:issue:`31772`)
- Fix bug in :meth:`Series.convert_dtypes` for series with mix of integers and strings (:issue:`32117`)

.. ---------------------------------------------------------------------------

Expand Down
5 changes: 0 additions & 5 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1062,11 +1062,6 @@ def convert_dtypes(
if convert_integer:
target_int_dtype = "Int64"

if isinstance(inferred_dtype, str) and (
inferred_dtype == "mixed-integer"
or inferred_dtype == "mixed-integer-float"
):
inferred_dtype = target_int_dtype
if is_integer_dtype(input_array.dtype) and not is_extension_array_dtype(
input_array.dtype
):
Expand Down
28 changes: 26 additions & 2 deletions pandas/tests/series/methods/test_convert_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ class TestSeriesConvertDtypes:
),
},
),
( # GH32117
["h", "i", 1],
np.dtype("O"),
{
(
(True, False),
(True, False),
(True, False),
(True, False),
): np.dtype("O"),
},
),
(
[10, np.nan, 20],
np.dtype("float"),
Expand Down Expand Up @@ -144,11 +156,23 @@ class TestSeriesConvertDtypes:
[1, 2.0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to interpret below tests / diff, but does pd.Series([1, 2.0], dtype=object).convert_dtypes() still give Int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here's the interpretation of the tests for that case:
The code reads:

            (
                [1, 2.0],
                object,
                {
                    ((True,), (True, False), (True,), (True, False)): "Int64",
                    ((True,), (True, False), (False,), (True, False)): np.dtype(
                        "float"
                    ),
                    ((False,), (True, False), (True, False), (True, False)): np.dtype(
                        "object"
                    ),
                },
            ),

This means the following:

  1. Create a Series with [1, 2.0] as the entries, with dtype object
  2. Consider the 16 possible combinations of the 4 arguments infer_objects, convert_string, convert_integer and convert_boolean
  3. If infer_objects==True and convert_integer==True, result should be Int64
  4. If infer_objects==True and convert_integer==False, result should be float
  5. If infer_objects==False, result is always object

Prior to this PR, the tests were as follows:
p3) If convert_integer==True, result should be Int64 independent of value of infer_objects
p4) If infer_objects==True and convert_integer==False, result should be float (same)
p5) If infer_objects==False and convert_integer==False, result is object

I think the new version is what we want the behavior to be, i.e., if you start with object and you don't do the infer-objects step, it remains an object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks

object,
{
((True, False), (True, False), (True,), (True, False)): "Int64",
((True,), (True, False), (True,), (True, False)): "Int64",
((True,), (True, False), (False,), (True, False)): np.dtype(
"float"
),
((False,), (True, False), (False,), (True, False)): np.dtype(
((False,), (True, False), (True, False), (True, False)): np.dtype(
"object"
),
},
),
(
[1, 2.5],
object,
{
((True,), (True, False), (True, False), (True, False)): np.dtype(
"float"
),
((False,), (True, False), (True, False), (True, False)): np.dtype(
"object"
),
},
Expand Down