Skip to content

BUG: Fix for convert_dtypes with mix of int and string #32126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2020

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 20, 2020

  • closes convert_dtypes fails with int and str #32117
  • tests added / passed
    • added new cases for test_convert_dtypes
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
    • placed in v1.0.2 whatsnew

@jorisvandenbossche jorisvandenbossche changed the title Fix for convert_dtypes with mix of int and string BUG: Fix for convert_dtypes with mix of int and string Feb 20, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 20, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Small question

@@ -144,11 +156,23 @@ class TestSeriesConvertDtypes:
[1, 2.0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to interpret below tests / diff, but does pd.Series([1, 2.0], dtype=object).convert_dtypes() still give Int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here's the interpretation of the tests for that case:
The code reads:

            (
                [1, 2.0],
                object,
                {
                    ((True,), (True, False), (True,), (True, False)): "Int64",
                    ((True,), (True, False), (False,), (True, False)): np.dtype(
                        "float"
                    ),
                    ((False,), (True, False), (True, False), (True, False)): np.dtype(
                        "object"
                    ),
                },
            ),

This means the following:

  1. Create a Series with [1, 2.0] as the entries, with dtype object
  2. Consider the 16 possible combinations of the 4 arguments infer_objects, convert_string, convert_integer and convert_boolean
  3. If infer_objects==True and convert_integer==True, result should be Int64
  4. If infer_objects==True and convert_integer==False, result should be float
  5. If infer_objects==False, result is always object

Prior to this PR, the tests were as follows:
p3) If convert_integer==True, result should be Int64 independent of value of infer_objects
p4) If infer_objects==True and convert_integer==False, result should be float (same)
p5) If infer_objects==False and convert_integer==False, result is object

I think the new version is what we want the behavior to be, i.e., if you start with object and you don't do the infer-objects step, it remains an object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks

@jorisvandenbossche
Copy link
Member

@Dr-Irv Thanks!

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 21, 2020
simonjayhawkins pushed a commit that referenced this pull request Feb 21, 2020
roberthdevries pushed a commit to roberthdevries/pandas that referenced this pull request Mar 2, 2020
@Dr-Irv Dr-Irv deleted the issue32117 branch February 13, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_dtypes fails with int and str
2 participants