-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pivot nans fix #28540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pivot nans fix #28540
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not fully understand the referenced issue yet (couldn't copy / paste the example to match what was written) but I don't think adding a new keyword is appropriate here. Option 2 seems more appropriate - the existing behavior is a bug right?
], | ||
names=["measurer", "product"], | ||
) | ||
tm.assert_index_equal(pv_col.columns, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you construct the expected output and use tm.assert_frame_equal(result, expected
instead? You should see that in a lot of other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did one way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a non-obvious api and would some discussion as this is just api bloat AFIACT.
why can you not simply used the value in the |
Made some changes to move towards solution 2, but this requires a change in default value to observed, and it breaks 2 tests referencing #21133 those two tests basically require what this new one would like to eliminate (keeping records where values are all nan) which was possible with a new kwarg, but without it the combinations from observed/dropna cant work it out without modifying the old test |
I'm sorry, is there any way forward with this? Either past tests need to be modified and backwards compatibility broken, or the API needs to be bloated, or the whole thing abandoned. Should I modify the tests that drop the nancols now? |
@endremborza can you help summarize what behavior you are expecting from the various combinations of |
simply put, I think simplest example I can produce: df = pd.DataFrame(
{"number": ["one", "two", "one", "three"],
"letter": ["a", "a" ,"b", "b"],
"y": [1, 2, np.nan, np.nan],
"x": [1, 2, 3, 4]}
)
result = df.pivot_table(values="y",
columns=["number", "letter"],
index="x",
observed=True, dropna=False) here result is
I suppose it should be
However, this breaks current tests as I (and the red CI) pointed out
|
Thanks for the PR but I think this is stale and has some disagreement over implementation. If interested in pursuing further let's discuss more in issue before coming back to code |
@WillAyd I'm sorry but this has been an absolute disappointment. I like pandas and I believe in the idea of actively trying to fix your own issues with open source software, but the indecision and disinterest here has been painful. I think I've been quite clear with how all this could be handled, but it seems nobody took the time and the effort to evaluate these options. Calling it stale just claims that there was nothing wrong with the process, or the way this was handled, which is quite off-putting. |
@endremborza you never responded to any of my queries above you need to engage in discussion otherwise we cannot move forwards |
this gets the ball rolling on #18030 as it is very easy to fix, but requires some decisions, mostly for structure and naming
the basic options as I see them:
0 - this PR as is (maybe some name change to this 26 character monster)
1 - Outright changing the
dropna=False
default behavior I thought would be a bad idea as it breaks backwards compatibility, but this solution is quite clunky.2 - Change
dropna=True
behavior to satisfy the test added here, but again that is not backwards compatible, but seems reasonable.Also, I've noticed that taking out the line
pandas/pandas/core/reshape/pivot.py
Line 177 in 3f8c0c4
doesn't break any of the tests, so there is that. (this helps 2)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff