From 3a16f7b1ae24f37b89bc92ecb8a01b2bb5d45d62 Mon Sep 17 00:00:00 2001
From: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Date: Fri, 10 Jul 2020 11:13:03 +0200
Subject: [PATCH 1/3] ROADMAP: add consistent missing values for all dtypes to
 the roadmap

---
 doc/source/development/roadmap.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/doc/source/development/roadmap.rst b/doc/source/development/roadmap.rst
index d331491d02883..6ddc0dc007d7c 100644
--- a/doc/source/development/roadmap.rst
+++ b/doc/source/development/roadmap.rst
@@ -53,6 +53,32 @@ need to implement certain operations expected by pandas users (for example
 the algorithm used in, ``Series.str.upper``). That work may be done outside of
 pandas.
 
+Consistent missing value handling
+---------------------------------
+
+Currently, pandas has varying missing data interfaces depending on the data
+type: pandas uses ``np.nan`` as missing value indicator in floating point data,
+``np.nan`` or ``None`` in object dtype data (eg strings, or booleans with
+missing values are cast to object), ``pd.NaT`` in datetimelike data. For
+categorical or interval data, they return ``np.nan`` on access even when the
+categories or intervals are datetime-like. Integer data cannot store missing
+data or are cast to float.
+
+Long term, we want to introduce consistent missing value handling accross the
+different data types: all data types should support missing values and with the
+same behaviour.
+
+To this end, a new experimental ``pd.NA`` scalar to be used as missing value
+indicator has already been added in pandas 1.0 (and used in the experimental
+nullable dtypes). Further work is needed to integrate this with other data
+types, and to provide a path forward to make this the default in a future
+version of pandas.
+
+This has been discussed at
+`github #28095 <https://github.com/pandas-dev/pandas/issues/28095>`__ (and
+linked issues), and described in more detail in this
+`design doc <https://hackmd.io/@jorisvandenbossche/Sk0wMeAmB>`__.
+
 Apache Arrow interoperability
 -----------------------------
 

From ee62bd07fd40c343dc0b9387fc4ec48ae5b306f5 Mon Sep 17 00:00:00 2001
From: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Date: Fri, 7 Aug 2020 13:19:22 +0200
Subject: [PATCH 2/3] add notion of different semantics

---
 doc/source/development/roadmap.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/source/development/roadmap.rst b/doc/source/development/roadmap.rst
index 6ddc0dc007d7c..33a4f547cb931 100644
--- a/doc/source/development/roadmap.rst
+++ b/doc/source/development/roadmap.rst
@@ -62,15 +62,17 @@ type: pandas uses ``np.nan`` as missing value indicator in floating point data,
 missing values are cast to object), ``pd.NaT`` in datetimelike data. For
 categorical or interval data, they return ``np.nan`` on access even when the
 categories or intervals are datetime-like. Integer data cannot store missing
-data or are cast to float.
+data or are cast to float. In addition, ``NaN`` has different semantics as
+"nulls" in many other data tools. 
 
 Long term, we want to introduce consistent missing value handling accross the
 different data types: all data types should support missing values and with the
 same behaviour.
 
-To this end, a new experimental ``pd.NA`` scalar to be used as missing value
-indicator has already been added in pandas 1.0 (and used in the experimental
-nullable dtypes). Further work is needed to integrate this with other data
+To this end, a new experimental ``pd.NA`` scalar that can be used as missing
+value indicator and with a behaviour that deviates from ``np.nan`` has already
+been added in pandas 1.0 (and used in the experimental nullable dtypes). Further
+work and research is needed to integrate these new semantics with other data
 types, and to provide a path forward to make this the default in a future
 version of pandas.
 

From 7bcb4e66a9dc77e937716604c66bb62a3bafe3eb Mon Sep 17 00:00:00 2001
From: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Date: Fri, 7 Aug 2020 20:57:34 +0200
Subject: [PATCH 3/3] update with Tom's suggestions

---
 doc/source/development/roadmap.rst | 36 ++++++++++++++----------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/doc/source/development/roadmap.rst b/doc/source/development/roadmap.rst
index 33a4f547cb931..efee21b5889ed 100644
--- a/doc/source/development/roadmap.rst
+++ b/doc/source/development/roadmap.rst
@@ -56,25 +56,23 @@ pandas.
 Consistent missing value handling
 ---------------------------------
 
-Currently, pandas has varying missing data interfaces depending on the data
-type: pandas uses ``np.nan`` as missing value indicator in floating point data,
-``np.nan`` or ``None`` in object dtype data (eg strings, or booleans with
-missing values are cast to object), ``pd.NaT`` in datetimelike data. For
-categorical or interval data, they return ``np.nan`` on access even when the
-categories or intervals are datetime-like. Integer data cannot store missing
-data or are cast to float. In addition, ``NaN`` has different semantics as
-"nulls" in many other data tools. 
-
-Long term, we want to introduce consistent missing value handling accross the
-different data types: all data types should support missing values and with the
-same behaviour.
-
-To this end, a new experimental ``pd.NA`` scalar that can be used as missing
-value indicator and with a behaviour that deviates from ``np.nan`` has already
-been added in pandas 1.0 (and used in the experimental nullable dtypes). Further
-work and research is needed to integrate these new semantics with other data
-types, and to provide a path forward to make this the default in a future
-version of pandas.
+Currently, pandas handles missing data differently for different data types. We
+use different types to indicate that a value is missing (``np.nan`` for
+floating-point data, ``np.nan`` or ``None`` for object-dtype data -- typically
+strings or booleans -- with missing values, and ``pd.NaT`` for datetimelike
+data). Integer data cannot store missing data or are cast to float. In addition,
+pandas 1.0 introduced a new missing value sentinel, ``pd.NA``, which is being
+used for the experimental nullable integer, boolean, and string data types.
+
+These different missing values have different behaviors in user-facing
+operations. Specifically, we introduced different semantics for the nullable
+data types for certain operations (e.g. propagating in comparison operations
+instead of comparing as False).
+
+Long term, we want to introduce consistent missing data handling for all data
+types. This includes consistent behavior in all operations (indexing, arithmetic
+operations, comparisons, etc.). We want to eventually make the new semantics the
+default.
 
 This has been discussed at
 `github #28095 <https://github.com/pandas-dev/pandas/issues/28095>`__ (and