Skip to content

DOC: Updated convert_dtype of Series.apply #39941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 25, 2021

Conversation

sukriti1
Copy link
Contributor

@sukriti1 sukriti1 commented Feb 20, 2021

This updates the docs for parameter convert_dtype in pandas.Series.apply and adds dtypes for which the original dtype is kept.

@pep8speaks
Copy link

pep8speaks commented Feb 20, 2021

Hello @sukriti1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-16 09:22:45 UTC

@sukriti1 sukriti1 closed this Feb 20, 2021
@sukriti1 sukriti1 reopened this Feb 20, 2021
Comment on lines 4059 to 4061
False, leave as dtype=object. For the dtypes Categorical,
Sparse, Interval, Period, DatetimeArray and TimedeltaArray
the original dtype is kept.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sukriti1 - how did you find this list of types? Was it from the code, or are they documented elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MarcoGorelli - I found this list of type from this comment :
#39580 (comment)

Copy link
Member

@MarcoGorelli MarcoGorelli Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks!

OK, so instead of listing the dtypes here (which could expand in the future) maybe it's better to just say something like

note that conversion doesn't happen for extension array dtypes which have a map method (e.g. Categorical)

cc @attack68

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Changes made!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with @MarcoGorelli's second thought here, I do like the original wording better. Do we have tests for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can also add something to the effect that the list is not necessarily exhaustive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, "For some dtypes, such as Categorical,
Sparse, Interval, Period, DatetimeArray, and TimedeltaArray,
the original dtype is kept."?

@MarcoGorelli
Copy link
Member

Thanks for updating

Hey @rhshadrach - any thoughts on this? Re-reading this, I think my suggestion wasn't a very good one (besides, it's ._values that can have .map, not the dtype), I don't think internals need describing to users

@jreback jreback added the Docs label Mar 5, 2021
@@ -4056,7 +4056,9 @@ def apply(
Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If
False, leave as dtype=object.
False, leave as dtype=object. Note that conversion does not
happen for extension array dtypes which have a map method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
happen for extension array dtypes which have a map method
happen for extension array dtypes, such as Categorical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different behaviors for different extension array dtypes - using convert_dtype=False with Series([1], dtype=pd.Int64Dtype()) and Series([1], dtype="category") results in object and category dtypes respectively. Agreed on not mentioning the map method here, but it seems better to me to spell out which pandas dtypes will not come out as object when convert_dtype is False.

False, leave as dtype=object.
False, leave as dtype=object. Note that conversion does not
happen for extension array dtypes which have a map method
(e.g. Categorical).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this line

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on green

@jreback jreback added the Apply Apply, Aggregate, Transform, Map label Mar 5, 2021
@mkp-gebensleben
Copy link

mkp-gebensleben commented Mar 5, 2021

Do you really expect users to know what an extension array is? Maybe it's my fault, but I have been using pandas on and off for years and never heard of extension arrays.

If a user wants to know what dtypes are excluded from conversion they will land on this page next. Which quite frankly is gibberish if you are not educated in pandas internals.

But even if the user gets past that, they now will have to research which extension arrays have a map method and which ones do not.

@MarcoGorelli
Copy link
Member

Hi @mkp-gebensleben - if you read through the discussion, you'll see that I went back on my initial suggestion to mention extension arrays and the map method

@MarcoGorelli MarcoGorelli self-requested a review March 10, 2021 09:23
@@ -4067,7 +4067,8 @@ def apply(
Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If
False, leave as dtype=object.
False, leave as dtype=object. Note that conversion does not
happen for extension array dtypes, such as Categorical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "Note that conversion does not happen" seems ambiguous to me. Is it that the conversion to object doesn't happen, or the conversion to find a better dtype? Maybe something like "Note that the dtype is always preserved for extension array dtypes, such as Categorical."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sukriti1 - do you want to update as per @rhshadrach 's suggestion? Sorry for having led you down the wrong route initially

@rhshadrach
Copy link
Member

@sukriti1 - Are you interested in finishing this up? If not, would you mind if I pushed it over the finish line?

@sukriti1
Copy link
Contributor Author

@rhshadrach Sorry for the delay just pushed the changes! Please let me know if there's anymore feedback

@MarcoGorelli MarcoGorelli added this to the 1.3 milestone Apr 25, 2021
@MarcoGorelli
Copy link
Member

Thanks @sukriti1

@MarcoGorelli MarcoGorelli merged commit accf345 into pandas-dev:master Apr 25, 2021
@rhshadrach
Copy link
Member

@sukriti1 @MarcoGorelli

#39941 (comment) - As commented there, I think the changes made in the PR are not quite correct. While it seems to me undesirable to list out the behavior for various extension arrays, I don't see another way to have correct documentation.

An alternative would be to change the behavior of the method itself to match the current docstring. I don't know how palatable this would be.

@MarcoGorelli
Copy link
Member

Hey @rhshadrach - is it necessary to list out behaviour for various extension arrays, or would

Note that for some extension array dtypes, such as Categorical, the dtype is always preserved.

be enough to make it correct without being too long?

@rhshadrach
Copy link
Member

I didn't consider that @MarcoGorelli and do think it's better than incorrect documentation (although it really doesn't tell the user what will happen). I am still wondering if it's not possible to make the current documentation (that is now in master) correct and will take a look at this in a few days.

@rhshadrach rhshadrach mentioned this pull request May 4, 2021
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
* DOC: Updated convert_dtype of Series.apply

* DOC: Updated convert_dtype for pandas.Series.apply

* DOC: Updated convert_dtype for pandas.Series.apply

* Editing the doc

* only mention categorical as example

* updates

Co-authored-by: Marco Gorelli <[email protected]>
@MarcoGorelli
Copy link
Member

I didn't consider that @MarcoGorelli and do think it's better than incorrect documentation (although it really doesn't tell the user what will happen). I am still wondering if it's not possible to make the current documentation (that is now in master) correct and will take a look at this in a few days.

Hey @rhshadrach - just checking if you've had further thoughts on this, and what your preference would be

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
* DOC: Updated convert_dtype of Series.apply

* DOC: Updated convert_dtype for pandas.Series.apply

* DOC: Updated convert_dtype for pandas.Series.apply

* Editing the doc

* only mention categorical as example

* updates

Co-authored-by: Marco Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: apply on Series of dtype category produces dtype category when float was expected
7 participants