From 7c5f8c7b821178633575a893ff1691521fe0a7ed Mon Sep 17 00:00:00 2001
From: richard <rhshadrach@gmail.com>
Date: Thu, 4 May 2023 22:23:26 -0400
Subject: [PATCH 1/3] PDEP-11: Change default of dropna to False

---
 web/pandas/pdeps/0011-dropna-default.md | 78 +++++++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 web/pandas/pdeps/0011-dropna-default.md

diff --git a/web/pandas/pdeps/0011-dropna-default.md b/web/pandas/pdeps/0011-dropna-default.md
new file mode 100644
index 0000000000000..83954166d388c
--- /dev/null
+++ b/web/pandas/pdeps/0011-dropna-default.md
@@ -0,0 +1,78 @@
+# PDEP-11: dropna default in pandas
+
+- Created: 4 May 2023
+- Status: Under discussion
+- Discussion: [PR ??](https://github.com/pandas-dev/pandas/pull/??)
+- Authors: [Richard Shadrach](https://github.com/rhshadrach)
+- Revision: 1
+
+## Abstract
+
+Throughout pandas, almost all of the methods that have a `dropna` argument default
+to `True`. Being the default, this can cause NA values to be silently dropped.
+This PDEP proposes to deprecate the current default value of `True` and change it
+to `False` in the next major release of pandas.
+
+## Motivation and Scope
+
+Upon seeing the output for a Series `ser`:
+
+```python
+print(ser.value_counts())
+
+1    3
+2    1
+dtype: Int64
+```
+
+users may be surprised that the Series can contain NA values. By then operating
+on data under the assumption NA values are not present, erroroneous results can
+arise. The same issue can occur with `groupby`, which can also be used to produce
+detailed summary statistics of data. We think it is not unreasonable that an
+experienced pandas user seeing the code
+
+    df[["a", "b"]].groupby("a").sum()
+
+would describe this operation as something like the following.
+
+> For each unique value in column `a`, compute the sum of corresponding values
+> in column `b` and return the results in a DataFrame indexed by the unique
+> values of `a`.
+
+This is correct, except that NA values in the column `a` will be dropped from
+the computation. That pandas is taking this additional step in the computation
+is not apparent from the code, and can surprise users.
+
+## Detailed Description
+
+We propose to deprecate the current default of `dropna` and change it to
+`False` across all applicable methods. The following methods have a dropna
+argument, those marked with a `*` already default to `False`.
+
+```python
+Series.groupby
+Series.mode
+Series.nunique
+*Series.to_hdf
+Series.value_counts
+DataFrame.groupby
+DataFrame.mode
+DataFrame.nunique
+DataFrame.pivot_table
+DataFrame.stack
+*DataFrame.to_hdf
+DataFrame.value_counts
+SeriesGroupBy.nunique
+SeriesGroupBy.value_counts
+DataFrameGroupBy.nunique
+DataFrameGroupBy.value_counts
+```
+
+## Timeline
+
+If accepted, the current `dropna` default would be deprecated as part of pandas
+2.x and this deprecation would be enforced in pandas 3.0.
+
+## PDEP History
+
+- 4 May 2023: Initial draft

From e45bfeb321d0e61c7c44592fbe3611cae7ba341f Mon Sep 17 00:00:00 2001
From: richard <rhshadrach@gmail.com>
Date: Thu, 4 May 2023 22:29:24 -0400
Subject: [PATCH 2/3] PR #, fixups

---
 web/pandas/pdeps/0011-dropna-default.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/web/pandas/pdeps/0011-dropna-default.md b/web/pandas/pdeps/0011-dropna-default.md
index 83954166d388c..d3afb1a852101 100644
--- a/web/pandas/pdeps/0011-dropna-default.md
+++ b/web/pandas/pdeps/0011-dropna-default.md
@@ -2,7 +2,7 @@
 
 - Created: 4 May 2023
 - Status: Under discussion
-- Discussion: [PR ??](https://github.com/pandas-dev/pandas/pull/??)
+- Discussion: [PR #53094](https://github.com/pandas-dev/pandas/pull/53094)
 - Authors: [Richard Shadrach](https://github.com/rhshadrach)
 - Revision: 1
 
@@ -53,14 +53,14 @@ argument, those marked with a `*` already default to `False`.
 Series.groupby
 Series.mode
 Series.nunique
-*Series.to_hdf
+Series.to_hdf*
 Series.value_counts
 DataFrame.groupby
 DataFrame.mode
 DataFrame.nunique
 DataFrame.pivot_table
 DataFrame.stack
-*DataFrame.to_hdf
+DataFrame.to_hdf*
 DataFrame.value_counts
 SeriesGroupBy.nunique
 SeriesGroupBy.value_counts

From e1194a55e378e315d4c7468bd4cacc67049db3d8 Mon Sep 17 00:00:00 2001
From: Richard Shadrach <rhshadrach@gmail.com>
Date: Tue, 9 May 2023 16:56:04 -0400
Subject: [PATCH 3/3] Update from feedback

---
 web/pandas/pdeps/0011-dropna-default.md | 40 +++++++++++++++++++++----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/web/pandas/pdeps/0011-dropna-default.md b/web/pandas/pdeps/0011-dropna-default.md
index d3afb1a852101..30cb9f00cc319 100644
--- a/web/pandas/pdeps/0011-dropna-default.md
+++ b/web/pandas/pdeps/0011-dropna-default.md
@@ -25,7 +25,8 @@ print(ser.value_counts())
 dtype: Int64
 ```
 
-users may be surprised that the Series can contain NA values. By then operating
+users may be surprised that the Series can contain NA values, as is argued in
+[#21890](https://github.com/pandas-dev/pandas/issues/21890). By then operating
 on data under the assumption NA values are not present, erroroneous results can
 arise. The same issue can occur with `groupby`, which can also be used to produce
 detailed summary statistics of data. We think it is not unreasonable that an
@@ -43,11 +44,35 @@ This is correct, except that NA values in the column `a` will be dropped from
 the computation. That pandas is taking this additional step in the computation
 is not apparent from the code, and can surprise users.
 
+###
+
+### Keeping the default `skipna=True`
+
+Many reductions methods, such as `sum`, `mean`, and `var`, have a `skipna` argument.
+In such operations, setting `skipna=False` would make the output of any operation
+NA if a single NA value is encountered.
+
+```python
+df = pd.DataFrame({'a': [1, np.nan], 'b': [2, np.nan]})
+print(df.sum(skipna=False))
+# a   NaN
+# b   NaN
+# dtype: float64
+```
+
+This makes `skipna=False` an undesirable default. In the methods with `dropna`, this phenomena does not occur. By defaulting to `dropna=False` in these
+methods, the results when NA values are encountered do not obscure the results of non-NA values.
+
+### Possible deprecation of `dropna`
+
+This PDEP takes no position on whether some methods with a `dropna` argument should have said argument deprecated.
+However, if such a deprecation is to be pursued, then we believe that the final behavior should
+be that of `dropna=False` across any of the methods listed below. With this, a necessary first step
+in the deprecation process would be to change the default value to `dropna=False`.
+
 ## Detailed Description
 
-We propose to deprecate the current default of `dropna` and change it to
-`False` across all applicable methods. The following methods have a dropna
-argument, those marked with a `*` already default to `False`.
+The following methods have a dropna argument, those marked with a `*` already default to `False`.
 
 ```python
 Series.groupby
@@ -68,10 +93,15 @@ DataFrameGroupBy.nunique
 DataFrameGroupBy.value_counts
 ```
 
+We propose to deprecate the current default of `dropna` and change it to
+`False` across all methods listed above.
+
 ## Timeline
 
 If accepted, the current `dropna` default would be deprecated as part of pandas
-2.x and this deprecation would be enforced in pandas 3.0.
+2.x and this deprecation would be enforced in pandas 3.0. In pandas 2.x, `FutureWarning` messages would
+be emitted on any calls to these methods where the value of `dropna` is unspecified and
+an NA value is present.
 
 ## PDEP History