From d8ee2752091645f593b37c453a024390797cf197 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Wed, 28 Dec 2022 12:58:37 -0800 Subject: [PATCH 1/3] DOC: Add pyarrow type equivalency table --- doc/source/reference/arrays.rst | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 5b41de4e12e6f..76219d544166a 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -60,6 +60,36 @@ is an :class:`ArrowDtype`. `Pyarrow `__ provides similar array and `data type `__ support as NumPy including first-class nullability support for all data types, immutability and more. +The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas. + +===================================== ========================== =================== +Pyarrow type pandas extension type Numpy type +===================================== ========================== =================== +``pd.ArroeDtype(pa.bool_())`` :class:`BooleanDtype` ``np.bool_`` +``pd.ArrowDtype(pa.int8())`` :class:`Int8Dtype` ``np.int8`` +``pd.ArrowDtype(pa.int16())`` :class:`Int16Dtype` ``np.int16`` +``pd.ArrowDtype(pa.int32())`` :class:`Int32Dtype` ``np.int32`` +``pd.ArrowDtype(pa.int64())`` :class:`Int64Dtype` ``np.int64`` +``pd.ArrowDtype(pa.uint8())`` :class:`UInt8Dtype` ``np.uint8`` +``pd.ArrowDtype(pa.uint16())`` :class:`UInt16Dtype` ``np.uint16`` +``pd.ArrowDtype(pa.uint32())`` :class:`UInt32Dtype` ``np.uint32`` +``pd.ArrowDtype(pa.uint64())`` :class:`UInt64Dtype` ``np.uint64`` +``pd.ArrowDtype(pa.float32())`` :class:`Float32Dtype` ``np.float32`` +``pd.ArrowDtype(pa.float64())`` :class:`Float64Dtype` ``np.float64`` +``pd.ArrowDtype(pa.time32(...))`` (none) (none) +``pd.ArrowDtype(pa.time64(...))`` (none) (none) +``pd.ArrowDtype(pa.timestamp(...))`` :class:`DatetimeTZDtype` ``np.datetime64`` +``pd.ArrowDtype(pa.date32())`` (none) (none) +``pd.ArrowDtype(pa.date64())`` (none) (none) +``pd.ArrowDtype(pa.duration(...))`` (none) ``np.timedelta64`` +``pd.ArrowDtype(pa.binary(...))`` (none) (none) +``pd.ArrowDtype(pa.string())`` :class:`StringDtype` ``np.str_`` +``pd.ArrowDtype(pa.decimal128(...))`` (none) (none) +``pd.ArrowDtype(pa.list_(...))`` (none) (none) +``pd.ArrowDtype(pa.map_(...))`` (none) (none) +``pd.ArrowDtype(pa.dictionary(...))`` :class:`CategoricalDtype` (none) +===================================== ========================== =================== + .. note:: For string types (``pyarrow.string()``, ``string[pyarrow]``), PyArrow support is still facilitated From 001b18b45240b5fcf72fd30d364975e681d162a0 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Tue, 3 Jan 2023 13:42:38 -0800 Subject: [PATCH 2/3] Address review --- doc/source/reference/arrays.rst | 57 +++++++++++++++++---------------- 1 file changed, 29 insertions(+), 28 deletions(-) diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index 76219d544166a..e5f4c9b857e9a 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -61,34 +61,35 @@ is an :class:`ArrowDtype`. support as NumPy including first-class nullability support for all data types, immutability and more. The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas. - -===================================== ========================== =================== -Pyarrow type pandas extension type Numpy type -===================================== ========================== =================== -``pd.ArroeDtype(pa.bool_())`` :class:`BooleanDtype` ``np.bool_`` -``pd.ArrowDtype(pa.int8())`` :class:`Int8Dtype` ``np.int8`` -``pd.ArrowDtype(pa.int16())`` :class:`Int16Dtype` ``np.int16`` -``pd.ArrowDtype(pa.int32())`` :class:`Int32Dtype` ``np.int32`` -``pd.ArrowDtype(pa.int64())`` :class:`Int64Dtype` ``np.int64`` -``pd.ArrowDtype(pa.uint8())`` :class:`UInt8Dtype` ``np.uint8`` -``pd.ArrowDtype(pa.uint16())`` :class:`UInt16Dtype` ``np.uint16`` -``pd.ArrowDtype(pa.uint32())`` :class:`UInt32Dtype` ``np.uint32`` -``pd.ArrowDtype(pa.uint64())`` :class:`UInt64Dtype` ``np.uint64`` -``pd.ArrowDtype(pa.float32())`` :class:`Float32Dtype` ``np.float32`` -``pd.ArrowDtype(pa.float64())`` :class:`Float64Dtype` ``np.float64`` -``pd.ArrowDtype(pa.time32(...))`` (none) (none) -``pd.ArrowDtype(pa.time64(...))`` (none) (none) -``pd.ArrowDtype(pa.timestamp(...))`` :class:`DatetimeTZDtype` ``np.datetime64`` -``pd.ArrowDtype(pa.date32())`` (none) (none) -``pd.ArrowDtype(pa.date64())`` (none) (none) -``pd.ArrowDtype(pa.duration(...))`` (none) ``np.timedelta64`` -``pd.ArrowDtype(pa.binary(...))`` (none) (none) -``pd.ArrowDtype(pa.string())`` :class:`StringDtype` ``np.str_`` -``pd.ArrowDtype(pa.decimal128(...))`` (none) (none) -``pd.ArrowDtype(pa.list_(...))`` (none) (none) -``pd.ArrowDtype(pa.map_(...))`` (none) (none) -``pd.ArrowDtype(pa.dictionary(...))`` :class:`CategoricalDtype` (none) -===================================== ========================== =================== +Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())`` + +=============================================== ========================== =================== +PyArrow type pandas extension type NumPy type +=============================================== ========================== =================== +:external+pyarrow:py:func:`pyarrow.bool_` :class:`BooleanDtype` ``np.bool_`` +:external+pyarrow:py:func:`pyarrow.int8` :class:`Int8Dtype` ``np.int8`` +:external+pyarrow:py:func:`pyarrow.int16` :class:`Int16Dtype` ``np.int16`` +:external+pyarrow:py:func:`pyarrow.int32`` :class:`Int32Dtype` ``np.int32`` +:external+pyarrow:py:func:`pyarrow.int64` :class:`Int64Dtype` ``np.int64`` +:external+pyarrow:py:func:`pyarrow.uint8` :class:`UInt8Dtype` ``np.uint8`` +:external+pyarrow:py:func:`pyarrow.uint16` :class:`UInt16Dtype` ``np.uint16`` +:external+pyarrow:py:func:`pyarrow.uint32` :class:`UInt32Dtype` ``np.uint32`` +:external+pyarrow:py:func:`pyarrow.uint64` :class:`UInt64Dtype` ``np.uint64`` +:external+pyarrow:py:func:`pyarrow.float32` :class:`Float32Dtype` ``np.float32`` +:external+pyarrow:py:func:`pyarrow.float64` :class:`Float64Dtype` ``np.float64`` +:external+pyarrow:py:func:`pyarrow.time32` (none) (none) +:external+pyarrow:py:func:`pyarrow.time64` (none) (none) +:external+pyarrow:py:func:`pyarrow.timestamp` :class:`DatetimeTZDtype` ``np.datetime64`` +:external+pyarrow:py:func:`pyarrow.date32` (none) (none) +:external+pyarrow:py:func:`pyarrow.date64` (none) (none) +:external+pyarrow:py:func:`pyarrow.duration` (none) ``np.timedelta64`` +:external+pyarrow:py:func:`pyarrow.binary` (none) (none) +:external+pyarrow:py:func:`pyarrow.string` :class:`StringDtype` ``np.str_`` +:external+pyarrow:py:func:`pyarrow.decimal128` (none) (none) +:external+pyarrow:py:func:`pyarrow.list_` (none) (none) +:external+pyarrow:py:func:`pyarrow.map_` (none) (none) +:external+pyarrow:py:func:`pyarrow.dictionary` :class:`CategoricalDtype` (none) +=============================================== ========================== =================== .. note:: From cfc98ce8a530b09b0bb18367732d24c8b95d22a6 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Tue, 3 Jan 2023 17:07:10 -0800 Subject: [PATCH 3/3] Fix another typo --- doc/source/reference/arrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst index e5f4c9b857e9a..aeaca7caea25d 100644 --- a/doc/source/reference/arrays.rst +++ b/doc/source/reference/arrays.rst @@ -69,7 +69,7 @@ PyArrow type pandas extension type NumPy :external+pyarrow:py:func:`pyarrow.bool_` :class:`BooleanDtype` ``np.bool_`` :external+pyarrow:py:func:`pyarrow.int8` :class:`Int8Dtype` ``np.int8`` :external+pyarrow:py:func:`pyarrow.int16` :class:`Int16Dtype` ``np.int16`` -:external+pyarrow:py:func:`pyarrow.int32`` :class:`Int32Dtype` ``np.int32`` +:external+pyarrow:py:func:`pyarrow.int32` :class:`Int32Dtype` ``np.int32`` :external+pyarrow:py:func:`pyarrow.int64` :class:`Int64Dtype` ``np.int64`` :external+pyarrow:py:func:`pyarrow.uint8` :class:`UInt8Dtype` ``np.uint8`` :external+pyarrow:py:func:`pyarrow.uint16` :class:`UInt16Dtype` ``np.uint16``