-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Updated convert_dtype of Series.apply #39941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/series.py
Outdated
False, leave as dtype=object. For the dtypes Categorical, | ||
Sparse, Interval, Period, DatetimeArray and TimedeltaArray | ||
the original dtype is kept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sukriti1 - how did you find this list of types? Was it from the code, or are they documented elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MarcoGorelli - I found this list of type from this comment :
#39580 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
OK, so instead of listing the dtypes here (which could expand in the future) maybe it's better to just say something like
note that conversion doesn't happen for extension array dtypes which have a
map
method (e.g.Categorical
)
cc @attack68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli Changes made!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with @MarcoGorelli's second thought here, I do like the original wording better. Do we have tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can also add something to the effect that the list is not necessarily exhaustive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like, "For some dtypes, such as Categorical,
Sparse, Interval, Period, DatetimeArray, and TimedeltaArray,
the original dtype is kept."?
Thanks for updating Hey @rhshadrach - any thoughts on this? Re-reading this, I think my suggestion wasn't a very good one (besides, it's |
pandas/core/series.py
Outdated
@@ -4056,7 +4056,9 @@ def apply( | |||
Python function or NumPy ufunc to apply. | |||
convert_dtype : bool, default True | |||
Try to find better dtype for elementwise function results. If | |||
False, leave as dtype=object. | |||
False, leave as dtype=object. Note that conversion does not | |||
happen for extension array dtypes which have a map method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happen for extension array dtypes which have a map method | |
happen for extension array dtypes, such as Categorical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are different behaviors for different extension array dtypes - using convert_dtype=False with Series([1], dtype=pd.Int64Dtype())
and Series([1], dtype="category")
results in object and category dtypes respectively. Agreed on not mentioning the map method here, but it seems better to me to spell out which pandas dtypes will not come out as object when convert_dtype is False.
pandas/core/series.py
Outdated
False, leave as dtype=object. | ||
False, leave as dtype=object. Note that conversion does not | ||
happen for extension array dtypes which have a map method | ||
(e.g. Categorical). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping on green
Do you really expect users to know what an extension array is? Maybe it's my fault, but I have been using pandas on and off for years and never heard of extension arrays. If a user wants to know what dtypes are excluded from conversion they will land on this page next. Which quite frankly is gibberish if you are not educated in pandas internals. But even if the user gets past that, they now will have to research which extension arrays have a |
Hi @mkp-gebensleben - if you read through the discussion, you'll see that I went back on my initial suggestion to mention extension arrays and the map method |
pandas/core/series.py
Outdated
@@ -4067,7 +4067,8 @@ def apply( | |||
Python function or NumPy ufunc to apply. | |||
convert_dtype : bool, default True | |||
Try to find better dtype for elementwise function results. If | |||
False, leave as dtype=object. | |||
False, leave as dtype=object. Note that conversion does not | |||
happen for extension array dtypes, such as Categorical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "Note that conversion does not happen" seems ambiguous to me. Is it that the conversion to object doesn't happen, or the conversion to find a better dtype? Maybe something like "Note that the dtype is always preserved for extension array dtypes, such as Categorical."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sukriti1 - do you want to update as per @rhshadrach 's suggestion? Sorry for having led you down the wrong route initially
@sukriti1 - Are you interested in finishing this up? If not, would you mind if I pushed it over the finish line? |
@rhshadrach Sorry for the delay just pushed the changes! Please let me know if there's anymore feedback |
Thanks @sukriti1 |
#39941 (comment) - As commented there, I think the changes made in the PR are not quite correct. While it seems to me undesirable to list out the behavior for various extension arrays, I don't see another way to have correct documentation. An alternative would be to change the behavior of the method itself to match the current docstring. I don't know how palatable this would be. |
Hey @rhshadrach - is it necessary to list out behaviour for various extension arrays, or would
be enough to make it correct without being too long? |
I didn't consider that @MarcoGorelli and do think it's better than incorrect documentation (although it really doesn't tell the user what will happen). I am still wondering if it's not possible to make the current documentation (that is now in master) correct and will take a look at this in a few days. |
* DOC: Updated convert_dtype of Series.apply * DOC: Updated convert_dtype for pandas.Series.apply * DOC: Updated convert_dtype for pandas.Series.apply * Editing the doc * only mention categorical as example * updates Co-authored-by: Marco Gorelli <[email protected]>
Hey @rhshadrach - just checking if you've had further thoughts on this, and what your preference would be |
* DOC: Updated convert_dtype of Series.apply * DOC: Updated convert_dtype for pandas.Series.apply * DOC: Updated convert_dtype for pandas.Series.apply * Editing the doc * only mention categorical as example * updates Co-authored-by: Marco Gorelli <[email protected]>
This updates the docs for parameter convert_dtype in pandas.Series.apply and adds dtypes for which the original dtype is kept.