MultiIndex Support for DataFrame.pivot #25330

thoo · 2019-02-15T04:47:24Z

closes BUG: DataFrame.pivot fails on multiple columns to set as index #21425. Related to BUG: DataFrame.pivot fails on multiple columns to set as index #21425
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2019-02-15T05:29:42Z

Codecov Report

Merging #25330 into master will decrease coverage by 50%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           master   #25330       +/-   ##
===========================================
- Coverage   91.72%   41.71%   -50.01%     
===========================================
  Files         173      173               
  Lines       52831    52841       +10     
===========================================
- Hits        48457    22045    -26412     
- Misses       4374    30796    +26422

Flag	Coverage Δ
#multiple	`?`
#single	`41.71% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/pivot.py	`8.51% <0%> (-88.05%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.15%)`	⬇️
... and 130 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53281a5...b3dad89. Read the comment docs.

codecov · 2019-02-15T05:29:43Z

Codecov Report

Merging #25330 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25330      +/-   ##
==========================================
+ Coverage   91.27%   91.27%   +<.01%     
==========================================
  Files         173      173              
  Lines       53002    53011       +9     
==========================================
+ Hits        48375    48384       +9     
  Misses       4627     4627

Flag	Coverage Δ
#multiple	`89.83% <100%> (ø)`	⬆️
#single	`41.76% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/pivot.py	`96.64% <100%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85c3f82...a4e8d38. Read the comment docs.

doc/source/whatsnew/v0.25.0.rst

gfyoung · 2019-02-15T09:09:04Z

pandas/core/reshape/pivot.py

        else:
            index = data[index]
-        index = MultiIndex.from_arrays([index, data[columns]])


Can't you construct "the first draft" of index and then call index = index = MultiIndex.from_arrays([index, data[columns]]) outside of the if-else block?

pandas/tests/reshape/test_pivot.py

doc/source/whatsnew/v0.25.0.rst

jreback · 2019-02-16T16:54:54Z

pandas/core/reshape/pivot.py

@@ -368,15 +368,29 @@ def _convert_by(by):
 @Appender(_shared_docs['pivot'], indents=1)
 def pivot(data, index=None, columns=None, values=None):
    if values is None:
-        cols = [columns] if index is None else [index, columns]
+        if index is None:


can you add comments here delineating the cases

jreback · 2019-02-16T16:55:13Z

pandas/core/reshape/pivot.py

        append = index is None
        indexed = data.set_index(cols, append=append)
+
    else:
        if index is None:


jreback · 2019-02-16T16:55:20Z

pandas/core/reshape/pivot.py

    else:
        if index is None:
            index = data.index
+            index = MultiIndex.from_arrays([index, data[columns]])


do you need to pass names?

jreback · 2019-02-16T16:55:28Z

pandas/core/reshape/pivot.py

+            # Iterating through the list of multiple columns of an index
+            indexes = [data[column] for column in index]
+            indexes.append(data[columns])
+            index = MultiIndex.from_arrays(indexes)


I don't think so but let me know if I should.

jreback · 2019-02-16T16:56:25Z

pandas/tests/reshape/test_pivot.py

+                          values='values')
+        result_no_values = df.pivot(index=['lev1', 'lev2'],
+                                    columns='lev3')
+        data = [[0, 1], [2, 3], [4, 5], [6, 7]]


can you do the test caess 1 after the other and use the standard

result=
expected=
assert_frame_equal(result, expected)

pep8speaks · 2019-02-16T19:26:04Z

Hello @thoo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-03 06:10:42 UTC

jreback · 2019-02-16T19:27:47Z

pandas/core/reshape/pivot.py

-        cols = [columns] if index is None else [index, columns]
+        if index is None:
+            cols = [columns]
+        else:


can you make this an if/elsif/else

jreback · 2019-02-16T19:28:05Z

pandas/core/reshape/pivot.py

        else:
+            # Build multi-indexes if index is not None and not a list.


can you make this an if/elseif/else

I am not sure about the last if/else statement. I can pull if/elif/else for the first three but I still need to have another condition for the last two.

jreback · 2019-02-16T19:28:27Z

pandas/core/reshape/pivot.py

        else:
+            # Build multi-indexes if index is not None and not a list.


these are comments are not very useful, you are just repeating the code, can you make more informative

jreback · 2019-02-16T19:28:42Z

pandas/core/reshape/pivot.py

+            cols = [columns]
+        else:
+            if is_list_like(index):
+                # If a given index is a list, set cols to index.


similar to below can you make more informative comments

Any suggestion ? I am still new to these. Thanks.

pandas/tests/reshape/test_pivot.py

* upstream/master: BUG: Fix passing of numeric_only argument for categorical reduce (pandas-dev#25304) ENH: Support times with timezones in at_time (pandas-dev#25280) COMPAT: alias .to_numpy() for timestamp and timedelta scalars (pandas-dev#25142) DOC/CLN: Fix various docstring errors (pandas-dev#25295) Bug: OverflowError in resample.agg with tz data (pandas-dev#25297) Fixes Formatting Exception (pandas-dev#25088) BUG: groupby.transform retains timezone information (pandas-dev#25264) Doc: corrects spelling in generic.py (pandas-dev#25333) Fix typos in docs (pandas-dev#25305)

* upstream/master: TST: use a fixed seed to have the same uniques across python versions (pandas-dev#25346)

* upstream/master: TST: xfail excel styler tests, xref GH25351 (pandas-dev#25352)

doc/source/whatsnew/v0.25.0.rst

jreback · 2019-02-19T14:07:13Z

pandas/core/reshape/pivot.py

+        # Make acceptable for multiple column indexes.
+        cols = []
+        if is_list_like(index):
+            cols.extend(index)


the if/elif/else was fine here; don't create then extend, just create in each block

pandas/tests/reshape/test_pivot.py

jreback · 2019-03-10T23:15:37Z

can you merge master and update to comments

* upstream/master: (110 commits) DOC: hardcode contributors for 0.24.x releases (pandas-dev#25662) DOC: restore toctree maxdepth (pandas-dev#25134) BUG: Redefine IndexOpsMixin.size, fix pandas-dev#25580. (pandas-dev#25584) BUG: to_csv line endings with compression (pandas-dev#25625) DOC: file obj for to_csv must be newline='' (pandas-dev#25624) Suppress incorrect warning in nargsort for timezone-aware DatetimeIndex (pandas-dev#25629) TST: fix incorrect sparse test (now failing on scipy master) (pandas-dev#25653) CLN: Removed debugging code (pandas-dev#25647) DOC: require Return section only if return is not None nor commentary (pandas-dev#25008) DOC:Remove hard-coded examples from _flex_doc_SERIES (pandas-dev#24589) (pandas-dev#25524) TST: xref pandas-dev#25630 (pandas-dev#25643) BUG: Fix pandas-dev#25481 by fixing the error message in TypeError (pandas-dev#25540) Fixturize tests/frame/test_mutate_columns.py (pandas-dev#25642) Fixturize tests/frame/test_join.py (pandas-dev#25639) Fixturize tests/frame/test_combine_concat.py (pandas-dev#25634) Fixturize tests/frame/test_asof.py (pandas-dev#25628) BUG: Fix user-facing AssertionError with to_html (pandas-dev#25608) (pandas-dev#25620) DOC: resolve all GL03 docstring validation errors (pandas-dev#25525) TST: failing wheel building on PY2 and old numpy (pandas-dev#25631) DOC: Remove makePanel from docs (pandas-dev#25609) (pandas-dev#25612) ...

jreback · 2019-03-20T01:51:58Z

can you merge master

* upstream/master: (55 commits) PERF: Improve performance of StataReader (pandas-dev#25780) Speed up tokenizing of a row in csv and xstrtod parsing (pandas-dev#25784) BUG: Fix _binop for operators for serials which has more than one returns (divmod/rdivmod). (pandas-dev#25588) BUG-24971 copying blocks also considers ndim (pandas-dev#25521) CLN: Panel reference from documentation (pandas-dev#25649) ENH: Quoting column names containing spaces with backticks to use them in query and eval. (pandas-dev#24955) BUG: reading windows utf8 filenames in py3.6 (pandas-dev#25769) DOC: clean bug fix section in whatsnew (pandas-dev#25792) DOC: Fixed PeriodArray api ref (pandas-dev#25526) Move locale code out of tm, into _config (pandas-dev#25757) Unpin pycodestyle (pandas-dev#25789) Add test for rdivmod on EA array (GH23287) (pandas-dev#24047) ENH: Support datetime.timezone objects (pandas-dev#25065) Cython language level 3 (pandas-dev#24538) API: concat on sparse values (pandas-dev#25719) TST: assert_produces_warning works with filterwarnings (pandas-dev#25721) make core.config self-contained (pandas-dev#25613) CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721) TST: Check pytables<3.5.1 when skipping (pandas-dev#25773) DOC: Fix typo in docstring of DataFrame.memory_usage (pandas-dev#25770) ...

WillAyd · 2019-05-03T05:42:29Z

Closing this one as stale. Let us know if you'd like to pick it back up @thoo !

WillAyd · 2019-05-03T06:10:39Z

Sorry reopening this as I think it went stale on the review side. Will try to take a look and give feedback the next few days

WillAyd · 2019-05-05T00:28:25Z

pandas/tests/reshape/test_pivot.py

+        else:
+            result = df.pivot(index=['lev1', 'lev2'],
+                              columns='lev3')
+            exp_columns = MultiIndex(levels=[['values'], [1, 2]],


Can you use one of the more idiomatic MultiIndex.from_ constructors here instead? Would help readability

WillAyd · 2019-05-05T00:33:46Z

pandas/core/reshape/pivot.py

    else:
        if index is None:
-            index = data.index
+            index = MultiIndex.from_arrays([data.index, data[columns]])
+        elif is_list_like(index):


I think it would help readability if the conditions here were refactored to be in the same sequence as the conditions in the branch above, as IIUC we essentially evaluate the same conditions in both branches

WillAyd · 2019-05-05T00:35:30Z

pandas/core/reshape/pivot.py

+        elif is_list_like(index):
+            # Iterating through the list of multiple columns of an index.
+            indexes = [data[column] for column in index]
+            indexes.append(data[columns])


What does this do?

mroeschke · 2019-05-05T01:04:52Z

doc/source/user_guide/reshaping.rst

@@ -90,6 +90,19 @@ You can then select subsets from the pivoted ``DataFrame``:
 Note that this returns a view on the underlying data in the case where the data
 are homogeneously-typed.

+Now :meth:`DataFrame.pivot` method also supports multiple columns as indexes.


Add a versionadded tag here.

WillAyd · 2019-05-05T01:50:30Z

pandas/core/reshape/pivot.py

@@ -368,18 +368,34 @@ def _convert_by(by):
 @Appender(_shared_docs['pivot'], indents=1)


Also think you will need to update the docstring

jreback · 2019-05-05T22:00:44Z

pandas/core/reshape/pivot.py

+            cols = [index]
+        else:
+            cols = []
+        cols.append(columns)


what happens when columns=None?

jreback · 2019-05-05T22:01:39Z

pandas/tests/reshape/test_pivot.py

+                                     codes=[[0, 0], [0, 1]],
+                                     names=[None, 'lev3'])
+
+        expected = DataFrame(data=data, index=exp_index,


is there a test for columns=None and list-like index?

jreback · 2019-05-12T21:09:35Z

can you merge master and update to comments

jreback · 2019-06-08T20:36:58Z

can you merge master

jreback · 2019-07-11T16:10:00Z

closing as stale, but pretty close if you'd like to finish up.

thoo added 4 commits February 14, 2019 23:25

DataFrame.pivot to support multiple columns as index

a92c29c

fix flake8

08650a2

update whatsnew

c6dddf2

Fix future_warning

b3dad89

gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Enhancement and removed Bug labels Feb 15, 2019

gfyoung reviewed Feb 15, 2019

View reviewed changes

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved

gfyoung reviewed Feb 15, 2019

View reviewed changes

pandas/tests/reshape/test_pivot.py Show resolved Hide resolved

jreback requested changes Feb 16, 2019

View reviewed changes

Fix based on suggestions

def0e2d

thoo added 2 commits February 16, 2019 14:28

fix flake8

6215146

Add more comments

565c41c

jreback requested changes Feb 16, 2019

View reviewed changes

thoo added 9 commits February 16, 2019 14:37

refactor pytest

d795e86

change comments

0a6e4be

fix if/else

ad4a761

Merge remote-tracking branch 'upstream/master' into pivot

064c085

* upstream/master: TST: use a fixed seed to have the same uniques across python versions (pandas-dev#25346)

fix failing tests

2e5dd67

fix failing tests

caed2cd

Retrigger :pandas-devs failed

406b690

Merge remote-tracking branch 'upstream/master' into pivot

b1db902

* upstream/master: TST: xfail excel styler tests, xref GH25351 (pandas-dev#25352)

jreback requested changes Feb 19, 2019

View reviewed changes

thoo added 4 commits March 11, 2019 13:38

Add example in reshaping.rst

2108301

if-elif

1d0ce5d

fix failing tests

0c25bba

WillAyd closed this May 3, 2019

WillAyd reopened this May 3, 2019

WillAyd requested changes May 5, 2019

View reviewed changes

mroeschke reviewed May 5, 2019

View reviewed changes

WillAyd requested changes May 5, 2019

View reviewed changes

jreback requested changes May 5, 2019

View reviewed changes

jreback closed this Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiIndex Support for DataFrame.pivot #25330

MultiIndex Support for DataFrame.pivot #25330

thoo commented Feb 15, 2019

codecov bot commented Feb 15, 2019

codecov bot commented Feb 15, 2019 •

edited

Loading

gfyoung Feb 15, 2019

jreback Feb 16, 2019

jreback Feb 16, 2019

jreback Feb 16, 2019

jreback Feb 16, 2019

thoo Feb 16, 2019

jreback Feb 16, 2019

pep8speaks commented Feb 16, 2019 •

edited

Loading

jreback Feb 16, 2019

jreback Feb 16, 2019

thoo Feb 16, 2019

jreback Feb 16, 2019

jreback Feb 16, 2019

thoo Feb 16, 2019

jreback Feb 19, 2019

jreback commented Mar 10, 2019

jreback commented Mar 20, 2019

WillAyd commented May 3, 2019

WillAyd commented May 3, 2019

WillAyd May 5, 2019

WillAyd May 5, 2019

WillAyd May 5, 2019

mroeschke May 5, 2019

WillAyd May 5, 2019

jreback May 5, 2019

jreback May 5, 2019

jreback commented May 12, 2019

jreback commented Jun 8, 2019

jreback commented Jul 11, 2019

		else:
		# Build multi-indexes if index is not None and not a list.

		@@ -368,18 +368,34 @@ def _convert_by(by):
		@Appender(_shared_docs['pivot'], indents=1)

MultiIndex Support for DataFrame.pivot #25330

MultiIndex Support for DataFrame.pivot #25330

Conversation

thoo commented Feb 15, 2019

codecov bot commented Feb 15, 2019

Codecov Report

codecov bot commented Feb 15, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Feb 16, 2019 • edited Loading

Comment last updated at 2019-05-03 06:10:42 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 10, 2019

jreback commented Mar 20, 2019

WillAyd commented May 3, 2019

WillAyd commented May 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 12, 2019

jreback commented Jun 8, 2019

jreback commented Jul 11, 2019

codecov bot commented Feb 15, 2019 •

edited

Loading

pep8speaks commented Feb 16, 2019 •

edited

Loading