Pivot nans fix #28540

endremborza · 2019-09-19T20:33:41Z

this gets the ball rolling on #18030 as it is very easy to fix, but requires some decisions, mostly for structure and naming

the basic options as I see them:

0 - this PR as is (maybe some name change to this 26 character monster)

1 - Outright changing the dropna=False default behavior I thought would be a bad idea as it breaks backwards compatibility, but this solution is quite clunky.

2 - Change dropna=True behavior to satisfy the test added here, but again that is not backwards compatible, but seems reasonable.

Also, I've noticed that taking out the line

pandas/pandas/core/reshape/pivot.py

Line 177 in 3f8c0c4

table = table.dropna(how="all", axis=1)

doesn't break any of the tests, so there is that. (this helps 2)

closes Pandas pivot_table MultiIndex and dropna=False generates all combinations of modalities instead of keeping existing one only #18030
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

update

WillAyd

I might not fully understand the referenced issue yet (couldn't copy / paste the example to match what was written) but I don't think adding a new keyword is appropriate here. Option 2 seems more appropriate - the existing behavior is a bug right?

WillAyd · 2019-09-20T02:49:27Z

pandas/tests/reshape/test_pivot.py

+            ],
+            names=["measurer", "product"],
+        )
+        tm.assert_index_equal(pv_col.columns, m)


Can you construct the expected output and use tm.assert_frame_equal(result, expected instead? You should see that in a lot of other tests

did one way

pandas/tests/reshape/test_pivot.py

jreback

this is a non-obvious api and would some discussion as this is just api bloat AFIACT.

jreback · 2019-09-20T12:41:35Z

why can you not simply used the value in the observed keyword?

endremborza · 2019-09-20T13:00:32Z

Made some changes to move towards solution 2, but this requires a change in default value to observed, and it breaks 2 tests referencing #21133

those two tests basically require what this new one would like to eliminate (keeping records where values are all nan) which was possible with a new kwarg, but without it the combinations from observed/dropna cant work it out without modifying the old test

endremborza · 2019-10-02T18:23:13Z

I'm sorry, is there any way forward with this?

Either past tests need to be modified and backwards compatibility broken, or the API needs to be bloated, or the whole thing abandoned.

Should I modify the tests that drop the nancols now?

WillAyd · 2019-10-22T01:29:05Z

@endremborza can you help summarize what behavior you are expecting from the various combinations of dropna / observed and how they differ from today? We typically don't just change the default values (note CI going red) so would have to think through options and best approach

endremborza · 2019-10-28T23:02:07Z

simply put, I think observed=True with dropna=False should not drop combinations that are present, but filled with nans.

simplest example I can produce:

df = pd.DataFrame(
    {"number": ["one", "two", "one", "three"], 
     "letter": ["a", "a" ,"b", "b"], 
     "y": [1, 2, np.nan, np.nan],
     "x": [1, 2, 3, 4]}
)

result = df.pivot_table(values="y",
                        columns=["number", "letter"],
                        index="x",
                        observed=True, dropna=False)

here result is

number	one		three		two
letter	a	b	a	b	a	b
x
1	1.0	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	2.0	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN

I suppose it should be

number	one		three	two
letter	a	b	b	a
x
1	1.0	NaN	NaN	NaN
2	NaN	NaN	NaN	2.0
3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN

However, this breaks current tests as I (and the red CI) pointed out

also, I just noticed that if I leave out the x column and the index="x" I get a weird attribute error, that I think should not happen, but that is another issue

WillAyd · 2019-12-17T17:39:34Z

Thanks for the PR but I think this is stale and has some disagreement over implementation. If interested in pursuing further let's discuss more in issue before coming back to code

endremborza · 2019-12-18T08:37:33Z

@WillAyd I'm sorry but this has been an absolute disappointment. I like pandas and I believe in the idea of actively trying to fix your own issues with open source software, but the indecision and disinterest here has been painful.

I think I've been quite clear with how all this could be handled, but it seems nobody took the time and the effort to evaluate these options.

Calling it stale just claims that there was nothing wrong with the process, or the way this was handled, which is quite off-putting.

jreback · 2019-12-18T11:16:38Z

@endremborza you never responded to any of my queries above

you need to engage in discussion otherwise we cannot move forwards

endremborza added 4 commits September 19, 2019 20:07

Merge pull request #1 from pandas-dev/master

79a08d7

update

keep nan cols test

114e2d0

add keep nan cols functionality

b9c237a

black formatting

07b3df7

WillAyd requested changes Sep 20, 2019

View reviewed changes

WillAyd added MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 20, 2019

extend pivot nancol test

39a6427

jreback requested changes Sep 20, 2019

View reviewed changes

endremborza added 2 commits September 20, 2019 14:54

restructure keep nan cols in tests

8c5fd32

restructure keep nan cols implementation

2bfdf2b

rumbin mentioned this pull request Oct 22, 2019

Pivot table null values in columns apache/superset#7406

Closed

3 tasks

WillAyd closed this Dec 17, 2019

randolf-scholz mentioned this pull request Feb 15, 2022

Pandas pivot_table MultiIndex and dropna=False generates all combinations of modalities instead of keeping existing one only #45994

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot nans fix #28540

Pivot nans fix #28540

endremborza commented Sep 19, 2019 •

edited

Loading

WillAyd left a comment

WillAyd Sep 20, 2019

endremborza Sep 20, 2019

jreback left a comment

jreback commented Sep 20, 2019

endremborza commented Sep 20, 2019

endremborza commented Oct 2, 2019

WillAyd commented Oct 22, 2019

endremborza commented Oct 28, 2019

WillAyd commented Dec 17, 2019

endremborza commented Dec 18, 2019

jreback commented Dec 18, 2019

Pivot nans fix #28540

Pivot nans fix #28540

Conversation

endremborza commented Sep 19, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Sep 20, 2019

Choose a reason for hiding this comment

endremborza Sep 20, 2019

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Sep 20, 2019

endremborza commented Sep 20, 2019

endremborza commented Oct 2, 2019

WillAyd commented Oct 22, 2019

endremborza commented Oct 28, 2019

WillAyd commented Dec 17, 2019

endremborza commented Dec 18, 2019

jreback commented Dec 18, 2019

endremborza commented Sep 19, 2019 •

edited

Loading