-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Two possible improvements that i noticed while working on this :
It was introduced a very long time ago (#440) to try and infer the type of indexes with dtype object. Removing it breaks one test that relies on the fact that an index with integer underlying data and object dtype will be correctly casted to an Int64Index. I didn't have enough knowledge of the code base to know if inferring the type of object indexes is still the intended behavior, but if it's not these lines could be removed. |
- Removed unused import - Enforced column order in tests for compat with python 3.5
@jreback for info: I have made all the modifications you requested up to now and on my side its ready. If you need further changes I can try to handle them before next friday, but after that I will be away for 3 weeks and unable to update the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you ensure that we have a basic test for #19602. Something like
In [11]: idx = pd.MultiIndex.from_product([[0, 1], [1, 2]])
In [12]: pd.DataFrame(index=idx)[:0].reset_index().dtypes
Out[12]:
level_0 float64
level_1 float64
dtype: object
asserting that those are ints, not floats?
@@ -1391,3 +1391,42 @@ def maybe_cast_to_integer_array(arr, dtype, copy=False): | |||
|
|||
if is_integer_dtype(dtype) and (is_float_dtype(arr) or is_object_dtype(arr)): | |||
raise ValueError("Trying to coerce float values to integers") | |||
|
|||
|
|||
def maybe_casted_values(index, codes=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mark this as private with a leading underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure ? It's not really private since it's used in frame.py, and @jreback already asked me to remove the underscore when moving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe wait to hear from @jreback but .core.*
is supposed to be private, but people still use methods from there. We plan to deprecate it, but until then it's probably best to prefix methods with underscores.
# we can have situations where the whole mask is -1, | ||
# meaning there is nothing found in labels, so make all nan's | ||
if mask.all(): | ||
values = np.empty(len(mask), dtype=values.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get here with an extension dtype? Like pd.date_range('2000', periods=4, tz="CET")
? If so, I suspect that will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time to really get into it, but after a quick look I would say that it's possible. I'll have to do something about it when i come back.
@ldmnt can you merge master and we'll take a look. |
can you merge master |
Sorry I haven't found time to do that recently. I saw your message, I will try to do the merge and think about Tom's question this week. |
Thanks for the PR but I think this has gone stale. Closing to clean up the queue for now, but certainly ping if you'd like to pick back up |
…9602 and #24206)
Fixed the bad path in reset_index where the dtype was ignored if there are only
missing values.
Simplified the structure of _maybe_casted_values in reset_index by
using as much as possible the take method of ExtensionArray's
closes reset_index() on MultiIndexed empty dataframe does not preserve dtypes #19602
closes BUG: reset_index of MultiIndex with CategoricalIndex levels with missing values fails #24206
tests added / passed
passes
black pandas
passes
git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry