pandas-dev · TomAugspurger · Jan 1, 2019 · Nov 10, 2018 · Nov 10, 2018 · Nov 10, 2018
diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -84,7 +84,10 @@ unlike the axis labels, cannot be assigned to.
     When working with heterogeneous data, the dtype of the resulting ndarray
     will be chosen to accommodate all of the data involved. For example, if
     strings are involved, the result will be of object dtype. If there are only
-    floats and integers, the resulting array will be of float dtype.
+    floats and integers, the resulting array will be of float dtype. If you
+    need to store an integer column with missing data, use one of the "Int"
+    dtypes (``"Int8"``, ``"Int16"``, ``"Int32"``, ``"Int64"``). See
+    :ref:`integer_na` for more.
 
 .. _basics.accelerate:
 

diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -228,8 +228,17 @@ arrays. For example:
    s2.dtype
 
 This trade-off is made largely for memory and performance reasons, and also so
-that the resulting ``Series`` continues to be "numeric". One possibility is to 
-use ``dtype=object`` arrays instead.
+that the resulting ``Series`` continues to be "numeric".
+
+If you need to represent integers with possibly missing values, use one of
+the ``"Int"`` dtypes provided by pandas
+
+* :class:`Int8Dtype`
+* :class:`Int16Dtype`
+* :class:`Int32Dtype`
+* :class:`Int64Dtype`
+
+See :ref:`integer_na` for more.
 
 ``NA`` type promotions
 ~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template
@@ -139,6 +139,7 @@ See the package overview for more detail about what's in the library.
     timeseries
     timedeltas
     categorical
+    integer_na
     visualization
     style
     io

diff --git a/doc/source/integer_na.rst b/doc/source/integer_na.rst
@@ -0,0 +1,87 @@
+.. currentmodule:: pandas
+
+.. ipython:: python
+    :suppress:
+
+    import numpy as np
+    import pandas as pd
+
+ .. _integer_na:
+
+**************************
+Nullable Integer Data Type
+**************************
+
+.. versionadded:: 0.24.0
+
+In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
+missing data. Because ``NaN`` is a float, this forces an array of integers with
+any missing values to become floating point. In some cases, this may not matter
+much. But if your integer column is, say, and identifier, casting to float can
-much. But if your integer column is, say, and identifier, casting to float can
+much. But if your integer column is, say, an identifier, casting to float can
-much. But if your integer column is, say, and identifier, casting to float can
+much. But if your integer column is, say, an identifier, casting to float can
+be problematic.
+
+Pandas can represent integer data with missing values with the
+:class:`arrays.IntegerArray` array. This is an :ref:`extension types <extending.extension-types>`
+implemented within pandas. It is not the default dtype for integers, and will not be inferred;
+you must explicitly pass the dtype into the :meth:`array` or :class:`Series` method:
+
+.. ipython:: python
+
+   pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
+
+Or the string alias "Int64" (note the capital ``"I"``, to differentiate from
+NumPy's ``'int64'`` dtype:
+
+.. ipython:: python
+
+   pd.array([1, 2, np.nan], dtype="Int64")
+
+This array can be stored in a :class:`DataFrame` or :class:`Series` like any
+NumPy array.
+
+.. ipython:: python
+
+   pd.Series(arr)
+
+You can also pass the list-like object to the :class:`Series` constructor
+with the dtype.
+
+.. ipython:: python
+
+   s = pd.Series([1, 2, np.nan], dtype="Int64")
+   s
+
+By default (if you don't specify ``dtype``), NumPy is used, and you'll end
+up with a ``float64`` dtype Series:
+
+.. ipython:: python
+
+   pd.Series([1, 2, np.nan])
+
+Operations involving an integer array will behave similar to NumPy arrays.
+Missing values will be propagated, and and the data will be coerced to another
+dtype if needed.
+
+.. ipython:: python
+
+   # arithmetic
+   s + 1
+
+   # comparison
+   s == 1
+
+   # indexing
+   s.iloc[1:3]
+
+   # operate with other dtypes
+   s + s.iloc[1:3].astype('Int8')
+
+   # coerce when needed
+   s + 0.01
+
+Reduction and groupby operations such as 'sum' work as well.
+
+.. ipython:: python
+
+   df.sum()
+   df.groupby('B').A.sum()
diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst
@@ -29,76 +29,26 @@ pandas.
 
 See the :ref:`cookbook<cookbook.missing_data>` for some advanced strategies.
 
-Missing data basics
--------------------
+Integer Dtypes and Missing Data
+-------------------------------
 
-When / why does data become missing?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Some might quibble over our usage of *missing*. By "missing" we simply mean
-**NA** ("not available") or "not present for whatever reason". Many data sets simply arrive with
-missing data, either because it exists and was not collected or it never
-existed. For example, in a collection of financial time series, some of the time
-series might start on different dates. Thus, values prior to the start date
-would generally be marked as missing.
-
-In pandas, one of the most common ways that missing data is **introduced** into
-a data set is by reindexing. For example:
+Because ``NaN`` is a float, a column of integers with even one missing values
+is cast to floating-point dtype (see :ref:`gotchas.intna` for more). Pandas
+provides a nullable integer array, which can be used by explicitly requesting
+the dtype:
 
 .. ipython:: python
 
-   df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'],
-                     columns=['one', 'two', 'three'])
-   df['four'] = 'bar'
-   df['five'] = df['one'] > 0
-   df
-   df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
-   df2
-
-Values considered "missing"
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-As data comes in many shapes and forms, pandas aims to be flexible with regard
-to handling missing data. While ``NaN`` is the default missing value marker for
-reasons of computational speed and convenience, we need to be able to easily
-detect this value with data of different types: floating point, integer,
-boolean, and general object. In many cases, however, the Python ``None`` will
-arise and we wish to also consider that "missing" or "not available" or "NA".
+   pd.Series([1, 2, np.nan, 4], dtype=pd.Int64Dtype())
 
-.. note::
-
-   If you want to consider ``inf`` and ``-inf`` to be "NA" in computations,
-   you can set ``pandas.options.mode.use_inf_as_na = True``.
-
-.. _missing.isna:
-
-To make detecting missing values easier (and across different array dtypes),
-pandas provides the :func:`isna` and
-:func:`notna` functions, which are also methods on
-Series and DataFrame objects:
+Alternatively, the string alias ``'Int64'`` (note the capital ``"I"``) can be
+used:
 
 .. ipython:: python
 
-   df2['one']
-   pd.isna(df2['one'])
-   df2['four'].notna()
-   df2.isna()
-
-.. warning::
-
-   One has to be mindful that in Python (and NumPy), the ``nan's`` don't compare equal, but ``None's`` **do**.
-   Note that pandas/NumPy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``.
-
-   .. ipython:: python
-
-      None == None
-      np.nan == np.nan
-
-   So as compared to above, a scalar equality comparison versus a ``None/np.nan`` doesn't provide useful information.
-
-   .. ipython:: python
+   pd.Series([1, 2, np.nan, 4], dtype="Int64")
 
-      df2['one'] == np.nan
+See :ref:`integer_na` for more.
 
 Datetimes
 ---------
@@ -760,3 +710,19 @@ However, these can be filled in using :meth:`~DataFrame.fillna` and it will work
 
    reindexed[crit.fillna(False)]
    reindexed[crit.fillna(True)]
+
+Pandas provides a nullable integer dtype, but you must explicitly request it
+when creating the series or column. Notice that we use a capital "I" in
+the ``dtype="Int64"``.
+
+.. ipython:: python
+
+   s = pd.Series(np.random.randn(5), index=[0, 2, 4, 6, 7],
+                 dtype="Int64")
+   s > 0
+   (s > 0).dtype
+   crit = (s > 0).reindex(list(range(8)))
+   crit
+   crit.dtype
+
+See :ref:`integer_na` for more.
diff --git a/doc/source/whatsnew/v0.24.0.txt b/doc/source/whatsnew/v0.24.0.txt
@@ -99,7 +99,9 @@ Reduction and groupby operations such as 'sum' work.
 
 .. warning::
 
-   The Integer NA support currently uses the captilized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This may be changed at a future date.
+   The Integer NA support currently uses the capitalized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This may be changed at a future date.
+
+See :ref:`integer_na` for more.
 
 .. _whatsnew_0240.enhancements.read_html: