pandas-dev
diff --git a/‎doc/source/advanced.rst
+3-1 b/‎doc/source/advanced.rst
+3-1
diff --git a/‎doc/source/api.rst
+4-1 b/‎doc/source/api.rst
+4-1
diff --git a/‎doc/source/categorical.rst
+95-8 b/‎doc/source/categorical.rst
+95-8
diff --git a/‎doc/source/merging.rst
+8-3 b/‎doc/source/merging.rst
+8-3
diff --git a/‎doc/source/whatsnew/v0.21.0.txt
+27 b/‎doc/source/whatsnew/v0.21.0.txt
+27
@@ -638,9 +638,11 @@ and allows efficient indexing and storage of an index with a large number of dup
 
 .. ipython:: python
 
+   from pandas.api.types import CategoricalDtype
+
    df = pd.DataFrame({'A': np.arange(6),
                       'B': list('aabbca')})
-   df['B'] = df['B'].astype('category', categories=list('cab'))
    df
    df.dtypes
    df.B.cat.categories
 
@@ -646,7 +646,10 @@ strings and apply several methods to it. These can be accessed like
 Categorical
 ~~~~~~~~~~~
 
-If the Series is of dtype ``category``, ``Series.cat`` can be used to change the the categorical
+   :members: categories, ordered
+
+If the Series is of dtype ``CategoricalDtype``, ``Series.cat`` can be used to change the categorical
 data. This accessor is similar to the ``Series.dt`` or ``Series.str`` and has the
 following usable methods and properties:
 
 
@@ -89,12 +89,22 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
     df["B"] = raw_cat
     df
 
-You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:
+
+1. categories are inferred from the data
+2. categories are unordered.
+
+To control those behaviors, instead of passing ``'category'``, use an instance
+of :class:`~pandas.api.types.CategoricalDtype`.
 
 .. ipython:: python
 
-    s = pd.Series(["a","b","c","a"])
-    s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
+    s = pd.Series(["a", "b", "c", "a"])
+    cat_type = CategoricalDtype(categories=["b", "c", "d"],
+                                ordered=True)
+    s_cat = s.astype(cat_type)
     s_cat
 
 Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:
@@ -133,6 +143,75 @@ constructor to save the factorize step during normal constructor mode:
     splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
     s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
 
+.. _categorical.categoricaldtype:
+
+CategoricalDtype
+----------------
+
+.. versionchanged:: 0.21.0
+
+A categorical's type is fully described by
+
+1. ``categories``: a sequence of unique values and no missing values
+2. ``ordered``: a boolean
+
+This information can be stored in a :class:`~pandas.api.types.CategoricalDtype`.
+The ``categories`` argument is optional, which implies that the actual categories
+should be inferred from whatever is present in the data when the
+:class:`pandas.Categorical` is created. The categories are assumed to be unordered
+by default.      
+
+.. ipython:: python
+
+   from pandas.api.types import CategoricalDtype
+
+   CategoricalDtype(['a', 'b', 'c'])
+   CategoricalDtype(['a', 'b', 'c'], ordered=True)
+   CategoricalDtype()
+
+A :class:`~pandas.api.types.CategoricalDtype` can be used in any place pandas
+expects a `dtype`. For example :func:`pandas.read_csv`,
+:func:`pandas.DataFrame.astype`, or in the Series constructor.
+
+.. note::
+
+    As a convenience, you can use the string ``'category'`` in place of a
+    :class:`~pandas.api.types.CategoricalDtype` when you want the default behavior of
+    the categories being unordered, and equal to the set values present in the
+    array. In other words, ``dtype='category'`` is equivalent to
+    ``dtype=CategoricalDtype()``.
+
+Equality Semantics
+~~~~~~~~~~~~~~~~~~
+
+Two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal
+whenever they have the same categories and orderedness. When comparing two
+unordered categoricals, the order of the ``categories`` is not considered
+
+.. ipython:: python
+
+   c1 = CategoricalDtype(['a', 'b', 'c'], ordered=False)
+
+   # Equal, since order is not considered when ordered=False
+   c1 == CategoricalDtype(['b', 'c', 'a'], ordered=False)
+
+   # Unequal, since the second CategoricalDtype is ordered
+   c1 == CategoricalDtype(['a',  'b', 'c'], ordered=True)
+
+All instances of ``CategoricalDtype`` compare equal to the string ``'category'``
+
+.. ipython:: python
+
+   c1 == 'category'
+
+.. warning::
+
+   Since ``dtype='category'`` is essentially ``CategoricalDtype(None, False)``,
+   and since all instances ``CategoricalDtype`` compare equal to ``'category'``,
+   all instances of ``CategoricalDtype`` compare equal to a
+   ``CategoricalDtype(None, False)``, regardless of ``categories`` or
+   ``ordered``.
+
 Description
 -----------
 
@@ -184,7 +263,7 @@ It's also possible to pass in the categories in a specific order:
 
     .. ipython:: python
 
-         s = pd.Series(list('babc')).astype('category', categories=list('abcd'))
          s
 
          # categories
@@ -301,7 +380,9 @@ meaning and certain operations are possible. If the categorical is unordered, ``
 
     s = pd.Series(pd.Categorical(["a","b","c","a"], ordered=False))
     s.sort_values(inplace=True)
-    s = pd.Series(["a","b","c","a"]).astype('category', ordered=True)
+        CategoricalDtype(ordered=True)
+    )
     s.sort_values(inplace=True)
     s
     s.min(), s.max()
@@ -401,9 +482,15 @@ categories or a categorical with any list-like object, will raise a TypeError.
 
 .. ipython:: python
 
-    cat = pd.Series([1,2,3]).astype("category", categories=[3,2,1], ordered=True)
-    cat_base = pd.Series([2,2,2]).astype("category", categories=[3,2,1], ordered=True)
-    cat_base2 = pd.Series([2,2,2]).astype("category", ordered=True)
+    cat_base = pd.Series([2,2,2]).astype(
+        CategoricalDtype([3, 2, 1], ordered=True)
+    )
+    cat_base2 = pd.Series([2,2,2]).astype(
+        CategoricalDtype(ordered=True)
+    )
 
     cat
     cat_base
 
@@ -830,8 +830,10 @@ The left frame.
 
 .. ipython:: python
 
+   from pandas.api.types import CategoricalDtype
+
    X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,)))
-   X = X.astype('category', categories=['foo', 'bar'])
 
    left = pd.DataFrame({'X': X,
                         'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})
@@ -842,8 +844,11 @@ The right frame.
 
 .. ipython:: python
 
-   right = pd.DataFrame({'X': pd.Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']),
-                         'Z': [1, 2]})
+                       dtype=CategoricalDtype(['foo', 'bar'])),
+        'Z': [1, 2]
+   })
    right
    right.dtypes
 
 
@@ -10,6 +10,8 @@ users upgrade to this version.
 Highlights include:
 
 - Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>`.
+- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
+  categoricals independent of the data, see :ref:`here <whatsnew_0210.enhancements.categorical_dtype>`.
 
 Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations <whatsnew_0210.deprecations>` before updating.
 
@@ -89,6 +91,31 @@ This does not raise any obvious exceptions, but also does not create a new colum
 
 Setting a list-like data structure into a new attribute now raise a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access <indexing.attribute_access>`.
 
+.. _whatsnew_0210.enhancements.categorical_dtype:
+
+``CategoricalDtype`` for specifying categoricals
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+:class:`pandas.api.types.CategoricalDtype` has been added to the public API and
+expanded to include the ``categories`` and ``ordered`` attributes. A
+``CategoricalDtype`` can be used to specify the set of categories and
+orderedness of an array, independent of the data themselves. This can be useful,
+e.g., when converting string data to a ``Categorical`` (:issue:`14711`,
+:issue:`15078`, :issue:`16015`):
+
+.. ipython:: python
+
+   from pandas.api.types import CategoricalDtype
+
+   s = pd.Series(['a', 'b', 'c', 'a'])  # strings
+   dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)
+   s.astype(dtype)
+
+The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a
+``Series`` with categorical type will now return an instance of ``CategoricalDtype``.
+
+See the :ref:`CategoricalDtype docs <categorical.categoricaldtype>` for more.
+
 .. _whatsnew_0210.enhancements.other:
 
 Other Enhancements
Original file line number	Original file line	Diff line number	Diff line change
`@@ -638,9 +638,11 @@ and allows efficient indexing and storage of an index with a large number of dup`
`638`		`638`
`639`	`.. ipython:: python`	`639`	`.. ipython:: python`
`640`		`640`
		`641`	`+ from pandas.api.types import CategoricalDtype`
		`642`	`+`
`641`	`df = pd.DataFrame({'A': np.arange(6),`	`643`	`df = pd.DataFrame({'A': np.arange(6),`
`642`	`'B': list('aabbca')})`	`644`	`'B': list('aabbca')})`
`643`	`- df['B'] = df['B'].astype('category', categories=list('cab'))`	`645`	`+ df['B'] = df['B'].astype(CategoricalDtype(list('cab')))`
`644`	`df`	`646`	`df`
`645`	`df.dtypes`	`647`	`df.dtypes`
`646`	`df.B.cat.categories`	`648`	`df.B.cat.categories`
Original file line number	Original file line	Diff line number	Diff line change
`@@ -646,7 +646,10 @@ strings and apply several methods to it. These can be accessed like`
`646`	`Categorical`	`646`	`Categorical`
`647`	`~~~~~~~~~~~`	`647`	`~~~~~~~~~~~`
`648`		`648`
`649`	-If the Series is of dtype ``category``, ``Series.cat`` can be used to change the the categorical	`649`	`+.. autoclass:: api.types.CategoricalDtype`
		`650`	`+ :members: categories, ordered`
		`651`	`+`
		`652`	+If the Series is of dtype ``CategoricalDtype``, ``Series.cat`` can be used to change the categorical
`650`	data. This accessor is similar to the ``Series.dt`` or ``Series.str`` and has the	`653`	data. This accessor is similar to the ``Series.dt`` or ``Series.str`` and has the
`651`	`following usable methods and properties:`	`654`	`following usable methods and properties:`
`652`		`655`
Original file line number	Original file line	Diff line number	Diff line change
`@@ -830,8 +830,10 @@ The left frame.`
`830`		`830`
`831`	`.. ipython:: python`	`831`	`.. ipython:: python`
`832`		`832`
		`833`	`+ from pandas.api.types import CategoricalDtype`
		`834`	`+`
`833`	`X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,)))`	`835`	`X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,)))`
`834`	`- X = X.astype('category', categories=['foo', 'bar'])`	`836`	`+ X = X.astype(CategoricalDtype(categories=['foo', 'bar']))`
`835`		`837`
`836`	`left = pd.DataFrame({'X': X,`	`838`	`left = pd.DataFrame({'X': X,`
`837`	`'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})`	`839`	`'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})`
`@@ -842,8 +844,11 @@ The right frame.`
`842`		`844`
`843`	`.. ipython:: python`	`845`	`.. ipython:: python`
`844`		`846`
`845`	`- right = pd.DataFrame({'X': pd.Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']),`	`847`	`+ right = pd.DataFrame({`
`846`	`- 'Z': [1, 2]})`	`848`	`+ 'X': pd.Series(['foo', 'bar'],`
		`849`	`+ dtype=CategoricalDtype(['foo', 'bar'])),`
		`850`	`+ 'Z': [1, 2]`
		`851`	`+ })`
`847`	`right`	`852`	`right`
`848`	`right.dtypes`	`853`	`right.dtypes`
`849`		`854`