-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix pivot index bug #37771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pivot index bug #37771
Conversation
Previously, the variable "cols" was created solely to concatenate the lists "index" and "columns" when the "values" parameter was left as None. It did so in a way that it modified the variable passed as "index" using list.extend(), affecting the caller's namespace. This change simply uses an anonymous list in place of the variable "cols"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jacob-Stevens-Haas for the PR!
Can you add tests? That's the first thing we look for when reviewing
Thanks for the quick reply @arw2019! Quick question: I'm essentially copying |
It's easiest if you post the test then will look. As a general rule though we like DRY code yes |
This is a amended commit to solve pre-commit test: "rst ``code`` is two backticks"
d683ad3
to
edebfaa
Compare
pandas/tests/reshape/test_pivot.py
Outdated
@@ -2153,3 +2153,20 @@ def test_pivot_index_none(self): | |||
|
|||
expected.columns.name = "columns" | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_pivot_index_list_values_none(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong but I don't think this is testing the OP...?
I'd use the example from the issue. Do the pivot and test the result of the pivot operation, then test that the arguments have not changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually definitely use that. This isn't quite the same use case
pandas/tests/reshape/test_pivot.py
Outdated
frame = DataFrame(data) | ||
frame.pivot(index=index, columns=columns, values=None) | ||
|
||
expected_ser = Series(["index"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you converting to Series? you want to check that index
itself (the list) did not change, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was trying to use the assert functions in pandas._testing
. Should I just use standard assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes use a standard assert here
* Use test case from the issue * Use standard assert instead of making series and using pandas._testing
Now I think I've got the hang of when to use one backtick versus two
Hello @Jacob-Stevens-Haas! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-11-13 18:14:02 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some comments
pandas/tests/reshape/test_pivot.py
Outdated
|
||
def test_pivot_index_list_values_none(self): | ||
# Tests when index is a list and values None. See GH#37635 | ||
frame = pd.DataFrame({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick but call it df
(convention)
index = ["lev1", "lev2"] | ||
columns = ["lev3"] | ||
frame.pivot(index=index, columns=columns, values=None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check here that you got the correct result from the pivot (might need to hard-code)
pandas/tests/reshape/test_pivot.py
Outdated
@@ -2153,3 +2153,18 @@ def test_pivot_index_none(self): | |||
|
|||
expected.columns.name = "columns" | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_pivot_index_list_values_none(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it test_pivot_index_list_values_none_modifies_args
pandas/tests/reshape/test_pivot.py
Outdated
@@ -2153,3 +2153,18 @@ def test_pivot_index_none(self): | |||
|
|||
expected.columns.name = "columns" | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_pivot_index_list_values_none(self): | |||
# Tests when index is a list and values None. See GH#37635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do
# GH37635
* My previous change caused the append = index is None to never set append = True * black pandas fixed my KR braces in test_pivot.py * Also black pandas fixed my KR braces on test_pivot.py
* Naming the test * Commenting the test * `frame` is now named `df` * Added check that the frame actually pivots correctly * Also: solved inconsistent pandas namespace in tests (precommit CI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm pending CI turning anything up
pandas/tests/reshape/test_pivot.py
Outdated
"d": [1.0, float("nan"), 3.0, 5.0], | ||
} | ||
) | ||
expected.index = MultiIndex.from_arrays( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the index and column names into the data frame constructor instead of setting them like this (can pass them in as keyword args)
pandas/tests/reshape/test_pivot.py
Outdated
|
||
expected = DataFrame( | ||
{ | ||
"a": [1.0, 3.0, 5.0, float("nan")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do "np.nan" here?
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -553,6 +553,7 @@ Reshaping | |||
- Bug in :meth:`DataFrame.agg` with ``func={'name':<FUNC>}`` incorrectly raising ``TypeError`` when ``DataFrame.columns==['Name']`` (:issue:`36212`) | |||
- Bug in :meth:`Series.transform` would give incorrect results or raise when the argument ``func`` was dictionary (:issue:`35811`) | |||
- Bug in :meth:`DataFrame.pivot` did not preserve :class:`MultiIndex` level names for columns when rows and columns both multiindexed (:issue:`36360`) | |||
- Bug in :meth:`DataFrame.pivot` modified caller's ``index`` parameter when ``columns`` was passed but ``values`` was not (:issue:`37635`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... modified index argument
pandas/tests/reshape/test_pivot.py
Outdated
) | ||
index = ["lev1", "lev2"] | ||
columns = ["lev3"] | ||
pivoted = df.pivot(index=index, columns=columns, values=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a total nit but can you call this result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jacob-Stevens-Haas, thank you for the bug fix!
Please see the comment below.
pandas/tests/reshape/test_pivot.py
Outdated
def test_pivot_index_list_values_none_modifies_args(self): | ||
# GH37635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is about making sure that the index list does not change.
Would it be more reasonable to test here only that, without comparing pivoted table with the expected one?
This is related to one of your prior questions.
Thanks for the quick reply @arw2019!
Quick question: I'm essentially copying
TestPivot.test_pivot()
. The point of the bugfix is to prevent thepivot()
method from modifying the passed argumentindex
. Should I still include the assert statements fromTestPivot.test_pivot()
to verify that the pivot itself worked, or just include an assert statement focused on the bugfix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that originally, but a previous review asked that I
check here that you got the correct result from the pivot (might need to hard-code)
Doesn't much matter to me - now that the code is written, it's easy to delete or leave in as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly on this
But I'd say it's useful to check that the pivot is doing what it's supposed to before checking that the args aren't changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the test module and noticed that there are no tests on pivot
when index is a list and values are None.
Thus, there is no repetition of any kind and it is perfectly fine to have the test as it is.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Only |
That too was a code-review-requested name, so I'll wait for @arw2019 to comment on that suggestion Separate issue that I may need guidance on - the test that's failing on the Azure pipeline is |
Yeah that's a better name!
Don't worry about it, it's unrelated (that test if flaky) |
* renamed test variable to 'result' * renamed test
🥳 . Now that it's passing tests, is there anything else that needs to be done? Also, should I also try to apply the changes to the 1.1.x branch too? (1.0.x branch is stale too... is that one at end of life?) |
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
assert index == ["lev1", "lev2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also assert that columns is the same as original, ping on green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done but not green; new comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's green, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks! @jreback?
* Added test that columns variable is immutable * Changed name of test per code review (missed in prev commit)
Ok so same error to last time ( |
thanks @Jacob-Stevens-Haas yeah CI has been having troubles with some windows lately. |
Previously, the variable "cols" was created solely to concatenate
the lists "index" and "columns" when the "values" parameter was
left as None. It did so in a way that it modified the variable
passed as "index" using list.extend(), affecting the caller's
namespace.
This change simply uses an anonymous list in place of the
variable "cols"
Had some trouble setting up the dev environment to build pandas from source,
so no tests checked or docs built. However, it's fairly minor change, so hoping
that it builds green regardless...
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff