-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
numpydev ragged array dtype warning #31203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpydev ragged array dtype warning #31203
Conversation
I think right now I'd prefer the user take care of that categorical issue by passing in an ndarray. I'm going to update the tests to reflect that, but am certainly open to suggestions for how we can handle it within pandas. Edit: 78514cf So users will see the warning if they just pass |
@TomAugspurger |
We don't want that. You can have a categorical holding native types like integers, timestamps, etc. We want the In [4]: type(pd.Categorical([1, 2]).categories)
Out[4]: pandas.core.indexes.numeric.Int64Index |
Seems fair. Especially as the warning is thrown in a pretty obscure use case, and changing the default would mess with all cases. |
this looks good @TomAugspurger
agreed that this should just pass thru the warning (or maybe we want to have a warning; to having non-scalar based categories is odd); yes you can do it, but you should have to be specific about this |
All green. Merging in ~1 hour to get CI passing again. I'll be able to do followups as needed. |
@@ -2058,7 +2058,7 @@ def drop(self, codes, level=None, errors="raise"): | |||
|
|||
if not isinstance(codes, (np.ndarray, Index)): | |||
try: | |||
codes = com.index_labels_to_array(codes) | |||
codes = com.index_labels_to_array(codes, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, at least in our tests, this isn't changing behavior. I added an assert com.index_labels_to_array(codes, dtype=object).dtype == com.index_labels_to_array(codes)
temporarily, so we were always inferring object dtype here (in our tests).
@TomAugspurger This needs a backport? |
Yep. Not sure why the bot hasn't picked it up yet. |
@meeseeksdev backport to 1.0.x |
@@ -79,7 +79,7 @@ def cat_core(list_of_columns: List, sep: str): | |||
return np.sum(arr_of_cols, axis=0) | |||
list_with_sep = [sep] * (2 * len(list_of_columns) - 1) | |||
list_with_sep[::2] = list_of_columns | |||
arr_with_sep = np.asarray(list_with_sep) | |||
arr_with_sep = np.asarray(list_with_sep, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
darn, i thought i had already fixed this one
array = data_missing._from_sequence([na, fill_value, na]) | ||
array = data_missing._from_sequence( | ||
[na, fill_value, na], dtype=data_missing.dtype | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sort of changes the interface. do we want authors to handle this on their own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it change the interface?
We are reducing coverage of _from_sequence
inferring the dtype from an untyped list. We could restore that if desired (and probably skip for problematic arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are reducing coverage of _from_sequence inferring the dtype from an untyped list.
Yah, i guess that is a better description than "changes the interface"
Sorry for the breakage. and thanks for the quick fix. I thought I had tested this before re-merging the NumPy PR. Please ping me if I can help with any further issues. |
Co-authored-by: Tom Augspurger <[email protected]>
Closes #31201
There are still two failures in the Categorical constructor I'm looking into. Not sure what's best yet.
I think that perhaps the user should see this warning, but have an option to pass through a
dtype
to silence it... Not sure yet.Tagged for backport to keep the CI passing on that branch.