ENH: Add allow_duplicates to MultiIndex.to_frame #45318

johnzangwill · 2022-01-11T19:19:20Z

tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Part of issue #45245 for multi-indexes with repeated level names.

Old behavior was to remove columns with repeated level names:

import pandas as pd
pd.MultiIndex.from_tuples([[1, 2]], names=["a", "a"]).to_frame()
     a
a a   
1 2  2

New behavior is to raise, unless allow_duplicates is True:

pd.MultiIndex.from_tuples([[1, 2]], names=["a", "a"]).to_frame(allow_duplicates=True)
     a  a
a a      
1 2  1  2

Note: The other issues in #45245

Missing column names. This has been left as it was: the name is filled with the level number, n. This is different from many other methods that fill with level_n, but obviously needs discussing.
bool values: might be fixed by ENH: Index[bool] #45061

doc/source/whatsnew/v1.4.0.rst

pandas/core/indexes/multi.py

jreback · 2022-01-16T16:42:22Z

pandas/core/indexes/multi.py

@@ -1774,14 +1784,20 @@ def to_frame(self, index: bool = True, name=lib.no_default) -> DataFrame:
        else:
            idx_names = self.names

+        idx_names = [


umm why are you repeating L1785?

It is doing a transform: filling in None names with the level number.

Whether that is the right thing to do is another issue. I am just preserving the existing behavior,

then this needs another argument similar to how this is done in .reset_index

I am not changing anything here. The old code is https://github.com/johnzangwill/pandas/blob/6cc5584bba59ef8f06d4dc901dc39ddd08d1519f/pandas/core/indexes/multi.py#L1780:

(level if lvlname is None else lvlname): self._get_level_values(level)

and I have just moved that logic earlier, since I need unique dictionary indexes.

In any case, insert and reset_index do this differently, replacing None level labels with level_n. As I say, that is a separate issue and I have raised it elsewhere (#45245), but is is not the subject of this PR,

I don't think that this is conditional in reset_index or that there is an argument for it. Which argument are you referring to?

This is the code in reset_index:

if isinstance(self.index, MultiIndex): names = com.fill_missing_names(self.index.names) to_insert = zip(self.index.levels, self.index.codes) else: default = "index" if "index" not in self else "level_0" names = [default] if self.index.name is None else [self.index.name] to_insert = ((self.index, None),)

that puts in "level_n" for multi-index and "index" or "level_0" for simple index.

ok can you just make a method on Index then to do this, repeating this code is not great

I have searched Pandas and I cannot find any other instance of this. The nearest is

pandas/pandas/core/indexes/base.py

Line 1631 in 4e034ec

name = self.name or 0

which does implement the policy (on self.name, not self.names). I can factor that down if you think that it is worth it.

yeah i think a common method on index is worth it here (to share here & reset_index)

Ok, but I have already explained that reset_index does not do this.

reset_index, to_records and many other methods all fill the None entries with "level_n", not with "n". As you know, I factored those out into a common method (com.fill_missing_names #44878) which is invoked in 6 different places.

These MI/Index.to_frame methods are the only ones which do it differently, filling the gaps with the column number. This difference could be discussed, and I have made an issue (#45245), but I don't suggest changing it without a lot of thought. Changing to_frame would break virtually all its tests.

pandas/core/indexes/multi.py

jreback · 2022-01-17T13:58:27Z

pandas/core/indexes/multi.py

@@ -1774,14 +1784,20 @@ def to_frame(self, index: bool = True, name=lib.no_default) -> DataFrame:
        else:
            idx_names = self.names

+        idx_names = [


yeah i think a common method on index is worth it here (to share here & reset_index)

jreback · 2022-01-17T13:58:33Z

doc/source/whatsnew/v1.4.0.rst

@@ -227,6 +227,7 @@ Other enhancements
 - Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
 - :meth:`DataFrame.to_sql` now returns an ``int`` of the number of written rows (:issue:`23998`)

+


revert this

pep8speaks · 2022-01-17T18:45:18Z

Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-20 11:35:58 UTC

johnzangwill · 2022-01-17T18:54:10Z

I have factored out a private Index._make_labels. It is just used in Index and MI to_frame (see my note above #45318 (comment))

A tiny change of behavior would be e.g. Index([0], name="") which would have been changed to 0 before. Now it is left as "" because I am looking for explicit None, rather than just Falsey.

The tests all seem to pass.

jreback

small change, ping on green.

jreback · 2022-01-20T00:07:53Z

pandas/core/indexes/base.py

@@ -1353,6 +1353,18 @@ def _format_attrs(self) -> list[tuple[str_t, str_t | int | bool | None]]:
            attrs.append(("length", len(self)))
        return attrs

+    @final
+    def _make_labels(self) -> Hashable | Sequence[Hashable]:


can you change to _get_level_names to be consistent with naming.

…/pandas into MultiIndex-to_frame

johnzangwill · 2022-01-20T14:24:03Z

This is as Green as CI allows...
@jreback Please see my note (#44755 (comment)) . I think that you will find that my solution there is not so bad as you first thought.

jreback · 2022-01-22T00:14:39Z

thanks @johnzangwill

Add allow_duplicates to to_frame

3194d97

johnzangwill mentioned this pull request Jan 11, 2022

BUG: DataFrameGroupBy.value_counts() fails if as_index=False and there are duplicate column labels #45160

Merged

4 tasks

jreback requested changes Jan 16, 2022

View reviewed changes

jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 16, 2022

johnzangwill added 2 commits January 16, 2022 19:14

Review changes

a3359b9

Merge branch 'main' into MultiIndex-to_frame

670ecf5

johnzangwill requested a review from jreback January 16, 2022 19:46

johnzangwill added 2 commits January 16, 2022 22:21

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

025c9d6

Merge branch 'main' into MultiIndex-to_frame

61bff46

jreback requested changes Jan 17, 2022

View reviewed changes

johnzangwill added 2 commits January 17, 2022 14:21

Update v1.4.0.rst

0444265

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

8052b13

johnzangwill requested a review from jreback January 17, 2022 15:31

factor out _make_labels

31110eb

Update base.py

4c0b994

johnzangwill added 3 commits January 18, 2022 14:10

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

066c34f

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

dbca195

Trigger CI

e69fc47

jreback requested changes Jan 20, 2022

View reviewed changes

johnzangwill added 4 commits January 20, 2022 08:27

_get_level_names

7f8fd32

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

e6f5894

Trigger CI

29ac6b6

Merge branch 'MultiIndex-to_frame' of https://github.com/johnzangwill…

ed26844

…/pandas into MultiIndex-to_frame

johnzangwill requested a review from jreback January 20, 2022 14:24

Merge branch 'pandas-dev:main' into MultiIndex-to_frame

3cfec79

jreback added this to the 1.5 milestone Jan 22, 2022

jreback approved these changes Jan 22, 2022

View reviewed changes

jreback merged commit 7dbfe9f into pandas-dev:main Jan 22, 2022

johnzangwill deleted the MultiIndex-to_frame branch January 22, 2022 10:30

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

ENH: Add allow_duplicates to MultiIndex.to_frame (pandas-dev#45318)

dd80df8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add allow_duplicates to MultiIndex.to_frame #45318

ENH: Add allow_duplicates to MultiIndex.to_frame #45318

johnzangwill commented Jan 11, 2022 •

edited

Loading

jreback Jan 16, 2022

johnzangwill Jan 16, 2022

jreback Jan 16, 2022

johnzangwill Jan 16, 2022

jreback Jan 17, 2022

johnzangwill Jan 17, 2022

jreback Jan 17, 2022

johnzangwill Jan 17, 2022 •

edited

Loading

jreback Jan 17, 2022

jreback Jan 17, 2022

johnzangwill Jan 17, 2022

pep8speaks commented Jan 17, 2022 •

edited

Loading

johnzangwill commented Jan 17, 2022 •

edited

Loading

jreback left a comment

jreback Jan 20, 2022

johnzangwill commented Jan 20, 2022

jreback commented Jan 22, 2022

		@@ -227,6 +227,7 @@ Other enhancements
		- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
		- :meth:`DataFrame.to_sql` now returns an ``int`` of the number of written rows (:issue:`23998`)

ENH: Add allow_duplicates to MultiIndex.to_frame #45318

ENH: Add allow_duplicates to MultiIndex.to_frame #45318

Conversation

johnzangwill commented Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnzangwill Jan 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jan 17, 2022 • edited Loading

Comment last updated at 2022-01-20 11:35:58 UTC

johnzangwill commented Jan 17, 2022 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnzangwill commented Jan 20, 2022

jreback commented Jan 22, 2022

johnzangwill commented Jan 11, 2022 •

edited

Loading

johnzangwill Jan 17, 2022 •

edited

Loading

pep8speaks commented Jan 17, 2022 •

edited

Loading

johnzangwill commented Jan 17, 2022 •

edited

Loading