-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: Some dropna behaviors in DataFrame.pivot_table #53521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Instead of adding Doing this would mean more than just passing If we're going this route, then I think we should also adhere to groupby semantics with unobserved groupings for various ops. For example:
currently results in |
Looking at this again - I think we should also deprecate (2). This can be done by the user as it's just a matter of dropping NA values from the input data. |
+1 As a user, I've learned I have to avoid |
Currently
dropna
is used in four places withinDataFrame.pivot_table
:1, 2, and 4 were all implemented for crosstab, which is essentially a call to pivot_table.
The API docs for crosstab document the
dropna
argument as:The only other documentation in the API and User Guide mentions using
dropna=False
to include rows/columns for categorical data with missing categorical values.I think this is too much for a single Boolean argument to handle. I propose the following:
a. Add
cartesian_product=[True|False]
to pivot_table and crosstabb. Add
observed=[True|False]
to crosstab for use with categoricalsc. Deprecate behavior (1) (with dropna), (3), and (4) above. The user may do each of these by dropping null values from the input data if they so desire.
We can implement (c) without affecting the behavior of crosstab by changing the data there to be a mixture of null/non-null values depending on the input and using the aggregation
count
instead oflen
.The text was updated successfully, but these errors were encountered: