-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Series.str.get_dummies() raise on string type (follow up to PR #59577) #59786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Series.str.get_dummies() raise on string type (follow up to PR #59577) #59786
Conversation
…hucarroll/pandas into stringmethods-get-dummies
…hucarroll/pandas into stringmethods-get-dummies
@aaronchucarroll - I believe there is an issue related to this, is that right? |
I don't think there exists an issue for this. These are the requested follow-up changes to my last PR (#59577) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the followup!
@mroeschke - I've resolved your comments from #59577 where they have been taken care of. There are a few comments that could use your review.
…hucarroll/pandas into stringmethods-get-dummies
@rhshadrach Please see new changes |
pandas/core/strings/accessor.py
Outdated
@@ -2482,12 +2482,16 @@ def get_dummies( | |||
1 False False False | |||
2 True False True | |||
""" | |||
from pandas.core.dtypes.common import is_string_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.core.dtypes.common
is already being imported from at the top of this file. Can you move this import up there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aaronchucarroll. Would you have time to update based on the comments?
pandas/core/strings/accessor.py
Outdated
if is_string_dtype(dtype): | ||
raise ValueError("string dtype not supported, please use a numeric dtype") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of explicitly disallowing specifically string dtype, should we allow any numeric dtype? (so raise an error if not is_numeric_dtype(dtype)
)
Because right now this would also allow the user to specify they want datetime dtype, for example, which also does not make much sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach thought here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we can restrict to just numeric here.
with pytest.raises( | ||
ValueError, match="string dtype not supported, please use a numeric dtype" | ||
): | ||
s.str.get_dummies("|", dtype="str[pyarrow]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test above is sufficient (so keep test_get_dummies_with_str_dtype
but can remove test_get_dummies_with_pa_str_dtype
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@@ -434,7 +434,7 @@ def _str_get_dummies(self, sep: str = "|", dtype: NpDtype | None = None): | |||
dummies_dtype = _dtype | |||
else: | |||
dummies_dtype = np.bool_ | |||
dummies = np.empty((len(arr), len(tags2)), dtype=dummies_dtype) | |||
dummies = np.empty((len(arr), len(tags2)), dtype=dummies_dtype, order="F") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronchucarroll @rhshadrach do you remember why this change was included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see that is based on a comment from @mroeschke at #59577 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @aaronchucarroll and @jorisvandenbossche |
Follow-up on #59577 (comment)