-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Adding Multiindex support to dataframe pivot function(Fixes #21425) #21467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21467 +/- ##
==========================================
+ Coverage 91.9% 91.9% +<.01%
==========================================
Files 154 154
Lines 49657 49667 +10
==========================================
+ Hits 45638 45648 +10
Misses 4019 4019
Continue to review full report at Codecov.
|
@NikhilKumarM : Off to a good start here! Going to need a |
Thanks @gfyoung . I have added test cases and whatsnew entry |
@NikhilKumarM : You actually need to add the |
@NikhilKumarM Did you push your new changes? Because they do not show up on github here. Normally you only need to add new commits to the same branch, push that to github, and the PR here should get updated. |
@jorisvandenbossche : If you look at the PR description, you can see that @NikhilKumarM accidentally added them there. If this PR goes unresponsive, I can bring it to the finish line. |
Hello @NikhilKumarM! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 01, 2018 at 13:52 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you also add the ability to pass multiple values for
columns
? - Can you also update the docstring of
DataFrame.pivot
?
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -66,7 +66,7 @@ Bug Fixes | |||
|
|||
**Reshaping** | |||
|
|||
- | |||
- Bug in: DataFrame.pivot() function where error was raised when multiple columns are set as index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the 0.24.0.txt
file? And you can mention it in the Enhancements section
(as this is an new feature, not really a bug fix, I think, as it was just not yet implemented)
pandas/core/reshape/reshape.py
Outdated
@@ -392,12 +392,22 @@ def pivot(self, index=None, columns=None, values=None): | |||
cols = [columns] if index is None else [index, columns] | |||
append = index is None | |||
indexed = self.set_index(cols, append=append) | |||
# adding the support for multi-index in pivot function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not really needed I think
pandas/tests/reshape/test_pivot.py
Outdated
@@ -283,6 +283,23 @@ def test_pivot_multi_functions(self): | |||
expected = concat([means, stds], keys=['mean', 'std'], axis=1) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
# adding the test case for multiple columns as index (#21425) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this comment to the first line of the test function
pandas/tests/reshape/test_pivot.py
Outdated
expected = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], | ||
exp_index, | ||
exp_columns) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example where the values
argument is not mentioned? eg df.pivot(index=['lev1', 'lev2'], columns='lev3')
(I think this will not yet work, but should work)
|
@NikhilKumarM there is a section in the contributing guide focused on docstrings: https://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html |
pandas/core/reshape/reshape.py
Outdated
index = MultiIndex.from_arrays([index, self[columns]]) | ||
# added this case to handle multi-index | ||
elif isinstance(index, list): | ||
indexes = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a list-comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
pandas/core/reshape/reshape.py
Outdated
else: | ||
if index is None: | ||
index = self.index | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
# added this case to handle multi-index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a comment that reflects what is here (not added)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -16,6 +16,8 @@ Other Enhancements | |||
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`) | |||
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`) | |||
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`) | |||
- :func:'Dataframe.pivot' now supports multiple columns as index. (:issue:'21425') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'columns as an index'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will make those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
Why is continuous-integration/travis-ci failing? |
@NikhilKumarM : Restarted Travis for you. Failure unrelated to PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small change otherwise lgtm. does the doc string need updating?
pandas/core/reshape/reshape.py
Outdated
else: | ||
if index is None: | ||
index = self.index | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
elif isinstance(index, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_list_like
instead here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will do it.
@jreback @gfyoung @jorisvandenbossche |
@NikhilKumarM : Got a linting error on Travis: https://travis-ci.org/pandas-dev/pandas/jobs/398632775#L2904 |
@gfyoung Fixed the linting error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note for 0.24. I guess this is a bug fix, though was this documented before? if so its an enhancement. Can you also add an exampl ein reshape.rst with this. And in the pivot doc-string as well.
@@ -395,15 +395,29 @@ def pivot(self, index=None, columns=None, values=None): | |||
See DataFrame.pivot | |||
""" | |||
if values is None: | |||
cols = [columns] if index is None else [index, columns] | |||
if index is None: | |||
cols = [columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tests for each one of these paths here? e.g. branching a lot more now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an enhancement. I have added the test cases and need to update the docs. Thanks for your time.
also needs to rebase |
can you merge master |
Sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge master & add a whatsnew note.
cols = [columns] if index is None else [index, columns] | ||
if index is None: | ||
cols = [columns] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining what is going on here, ideally we would have a comment for each branch of the if/else
can you update |
can you merge master |
I think pivot function has been removed from reshape.py and put in pivot.py. I will rebase with the master and make necessary changes by this weekend. Thank you !! |
can you merge master and update |
Closing as stale - ping if you'd like to reopen and continue working this |
git diff upstream/master -u -- "*.py" | flake8 --diff
Added extra case to handle indexing on multiple colums in dataframe.pivot function