BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370

ldmnt · 2019-07-12T21:58:52Z

…9602 and #24206)

Fixed the bad path in reset_index where the dtype was ignored if there are only
missing values.
Simplified the structure of _maybe_casted_values in reset_index by
using as much as possible the take method of ExtensionArray's
closes reset_index() on MultiIndexed empty dataframe does not preserve dtypes #19602
closes BUG: reset_index of MultiIndex with CategoricalIndex levels with missing values fails #24206
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…9602 and #24206) - Fixed the bad path in reset_index where the dtype was ignored if there are only missing values. - Simplified the structure of _maybe_casted_values in reset_index by using as much as possible the take method of ExtensionArray's

pandas/tests/frame/test_alter_axes.py

ldmnt · 2019-07-12T22:25:10Z

Two possible improvements that i noticed while working on this :

The multiindex path of _maybe_casted_values is essentially a take(…, allow_fill=True) function that works on both ExtensionArrays and ndarrays. Since it's the type of the _values attribute, it might be a good idea to extract it outside of reset_index if it's used in other places. (I thought that it likely is but I didn't really know.)
When refactoring _maybe_casted_values in reset_index, I tried removing the cast at the start of the function (frame.py line 4607) :

            if not isinstance(index, (PeriodIndex, DatetimeIndex)):
                if values.dtype == np.object_:
                    values = lib.maybe_convert_objects(values)`

It was introduced a very long time ago (#440) to try and infer the type of indexes with dtype object. Removing it breaks one test that relies on the fact that an index with integer underlying data and object dtype will be correctly casted to an Int64Index. I didn't have enough knowledge of the code base to know if inferring the type of object indexes is still the intended behavior, but if it's not these lines could be removed.

- Removed unused import - Enforced column order in tests for compat with python 3.5

pandas/tests/frame/test_alter_axes.py

pandas/core/frame.py

ldmnt · 2019-08-26T13:51:01Z

@jreback for info: I have made all the modifications you requested up to now and on my side its ready. If you need further changes I can try to handle them before next friday, but after that I will be away for 3 weeks and unable to update the pull request.

TomAugspurger

Can you ensure that we have a basic test for #19602. Something like

In [11]: idx = pd.MultiIndex.from_product([[0, 1], [1, 2]])

In [12]: pd.DataFrame(index=idx)[:0].reset_index().dtypes
Out[12]:
level_0    float64
level_1    float64
dtype: object

asserting that those are ints, not floats?

doc/source/whatsnew/v0.25.1.rst

pandas/core/dtypes/cast.py

TomAugspurger · 2019-08-26T14:55:44Z

pandas/core/dtypes/cast.py

@@ -1391,3 +1391,42 @@ def maybe_cast_to_integer_array(arr, dtype, copy=False):

    if is_integer_dtype(dtype) and (is_float_dtype(arr) or is_object_dtype(arr)):
        raise ValueError("Trying to coerce float values to integers")
+
+
+def maybe_casted_values(index, codes=None):


I think mark this as private with a leading underscore.

Are you sure ? It's not really private since it's used in frame.py, and @jreback already asked me to remove the underscore when moving it.

Maybe wait to hear from @jreback but .core.* is supposed to be private, but people still use methods from there. We plan to deprecate it, but until then it's probably best to prefix methods with underscores.

TomAugspurger · 2019-08-26T14:59:57Z

pandas/core/dtypes/cast.py

+            # we can have situations where the whole mask is -1,
+            # meaning there is nothing found in labels, so make all nan's
+            if mask.all():
+                values = np.empty(len(mask), dtype=values.dtype)


Is it possible to get here with an extension dtype? Like pd.date_range('2000', periods=4, tz="CET")? If so, I suspect that will fail.

I didn't have time to really get into it, but after a quick look I would say that it's possible. I'll have to do something about it when i come back.

jreback · 2019-10-06T22:42:11Z

@ldmnt can you merge master and we'll take a look.

jreback · 2019-10-18T21:33:13Z

can you merge master

ldmnt · 2019-11-01T17:49:07Z

Sorry I haven't found time to do that recently. I saw your message, I will try to do the merge and think about Tom's question this week.

WillAyd · 2019-12-17T17:42:06Z

Thanks for the PR but I think this has gone stale. Closing to clean up the queue for now, but certainly ping if you'd like to pick back up

jreback requested changes Jul 12, 2019

View reviewed changes

pandas/tests/frame/test_alter_axes.py Outdated Show resolved Hide resolved

Extracted tests into new functions.

4d2b2e8

Louis Dumont added 3 commits July 13, 2019 10:18

Fixed linting and compatibility problems.

cbf7499

- Removed unused import - Enforced column order in tests for compat with python 3.5

Reformatted to pass black pandas.

07f8161

Added whatsnew entries.

865a6eb

jreback requested changes Jul 15, 2019

View reviewed changes

pandas/tests/frame/test_alter_axes.py Outdated Show resolved Hide resolved

pandas/core/frame.py Outdated Show resolved Hide resolved

pandas/core/frame.py Outdated Show resolved Hide resolved

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Jul 15, 2019

Louis Dumont added 5 commits July 16, 2019 05:47

Refactored tests with parametrization.

94f3a28

Moved _maybe_convert_values to pandas.core.dtypes.cast

2af935a

Removed weird type check in _maybe_convert_values.

425acee

Merged master.

9e56281

Fixed import sorting.

7569ba8

TomAugspurger reviewed Aug 26, 2019

View reviewed changes

Louis Dumont added 3 commits August 26, 2019 19:41

Shortened summary for maybe_casted_values.

0cf50d0

Added basic test for GH19602.

e2288a1

Moved whatsnew entries to v1.0.0

f1d110a

WillAyd closed this Dec 17, 2019

jreback mentioned this pull request Jan 4, 2020

DataFrame.set_index() may not preserve dtype #30517

Open

trevorbye mentioned this pull request Jan 9, 2020

BUG: set_index() and reset_index() not preserving object dtypes #30870

Closed

5 tasks

jreback mentioned this pull request May 23, 2020

BUG: Setting the index changes dtype #34304

Closed

3 tasks

bergkvist mentioned this pull request May 24, 2020

BUG: reset_index() and set_index() expands uint32 to uint64 #34358

Closed

3 tasks

arw2019 mentioned this pull request Oct 8, 2020

CLN: move maybe_casted_values from pandas/core/frame.py to pandas/core/dtype/cast.py #36985

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370

BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370

ldmnt commented Jul 12, 2019

ldmnt commented Jul 12, 2019 •

edited

Loading

ldmnt commented Aug 26, 2019

TomAugspurger left a comment

TomAugspurger Aug 26, 2019

ldmnt Aug 26, 2019

TomAugspurger Aug 26, 2019

TomAugspurger Aug 26, 2019

ldmnt Aug 28, 2019

jreback commented Oct 6, 2019

jreback commented Oct 18, 2019

ldmnt commented Nov 1, 2019

WillAyd commented Dec 17, 2019

BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370

BUG: Wrong dtype when resetting a multiindex with missing values. (#1… #27370

Conversation

ldmnt commented Jul 12, 2019

ldmnt commented Jul 12, 2019 • edited Loading

ldmnt commented Aug 26, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Aug 26, 2019

Choose a reason for hiding this comment

ldmnt Aug 26, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 26, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 26, 2019

Choose a reason for hiding this comment

ldmnt Aug 28, 2019

Choose a reason for hiding this comment

jreback commented Oct 6, 2019

jreback commented Oct 18, 2019

ldmnt commented Nov 1, 2019

WillAyd commented Dec 17, 2019

ldmnt commented Jul 12, 2019 •

edited

Loading