Skip to content

TST: Series repr with pyarrow date32 type #48258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
63091ac
BUG: is_datetime64tz_dtype shortcut only for DatetimeTZDtype
mroeschke Aug 25, 2022
99bfd65
add another whatsnew note
mroeschke Aug 25, 2022
17d87cd
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 25, 2022
46dde84
Use frame_or_series
mroeschke Aug 25, 2022
eff21a0
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 26, 2022
aa3b419
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 26, 2022
da96b81
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 29, 2022
5d78da7
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 29, 2022
4dfb27c
more specific warning condition
mroeschke Aug 29, 2022
9a58c96
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 29, 2022
21f1e52
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Aug 31, 2022
bfd28b8
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 1, 2022
13f5f3a
Try something random
mroeschke Sep 2, 2022
cce9a12
Remove random try
mroeschke Sep 2, 2022
8fe8de2
try skipping arg* tests for pyarrow?
mroeschke Sep 2, 2022
7607ef8
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 2, 2022
55b3c9d
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 6, 2022
f1253db
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 7, 2022
7c14b52
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 7, 2022
2f55e02
Add test for needs_i8_conversion
mroeschke Sep 7, 2022
59cc92c
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 9, 2022
0ad06c3
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 9, 2022
e83c007
Force colors for failures?
mroeschke Sep 9, 2022
011201d
PY_COLOR?
mroeschke Sep 9, 2022
6b02ed4
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 9, 2022
911c46e
See if no color logs errors
mroeschke Sep 10, 2022
59ea2a2
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 13, 2022
88d2e2c
Undo skips
mroeschke Sep 13, 2022
ddf8f61
Missed skip
mroeschke Sep 13, 2022
d548138
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 13, 2022
17b78e9
Remove test
mroeschke Sep 13, 2022
e051144
Undo test removal
mroeschke Sep 13, 2022
21d152c
Try not testing pyarrow timedelta types
mroeschke Sep 13, 2022
161ad6d
Revert the is_datetime64tz_dtype change
mroeschke Sep 14, 2022
91d4627
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 14, 2022
21ff5fd
Merge remote-tracking branch 'upstream/main' into bug/is_dttz_dtype
mroeschke Sep 14, 2022
8ef9f46
Validate it's pa.duration again
mroeschke Sep 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/reference/arrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ PyArrow

This feature is experimental, and the API can change in a future release without warning.

The ``dtype`` argument of :class:`Series` and :class:`DataFrame` can accept a string of a `pyarrow data type <https://arrow.apache.org/docs/python/api/datatypes.html>`__
with ``pyarrow`` in brackets e.g. ``"int64[pyarrow]"`` or, for pyarrow data types that take parameters, a :class:`ArrowDtype`
initialized with a ``pyarrow.DataType``.

The :class:`arrays.ArrowExtensionArray` is backed by a :external+pyarrow:py:class:`pyarrow.ChunkedArray` with a
:external+pyarrow:py:class:`pyarrow.DataType` instead of a NumPy array and data type. The ``.dtype`` of a :class:`arrays.ArrowExtensionArray`
is an :class:`ArrowDtype`.
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ initialized with a ``pyarrow.DataType``.
Most operations are supported and have been implemented using `pyarrow compute <https://arrow.apache.org/docs/python/api/compute.html>`__ functions.
We recommend installing the latest version of PyArrow to access the most recently implemented compute functions.

Please view the :ref:`API reference <api.arrays.arrow>` for more information.

.. warning::

This feature is experimental, and the API can change in a future release without warning.
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,13 @@ class ArrowExtensionArray(OpsMixin, ExtensionArray):
<ArrowExtensionArray>
[1, 1, <NA>]
Length: 3, dtype: int64[pyarrow]

Create a ArrowExtensionArray directly from an pyarrow array.
>>> import pyarrow as pa
>>> pd.arrays.ArrowExtensionArray(pa.array([1, 1, None]))
<ArrowExtensionArray>
[1, 1, <NA>]
Length: 3, dtype: int64[pyarrow]
""" # noqa: E501 (http link too long)

_data: pa.ChunkedArray
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,9 +379,10 @@ def is_datetime64tz_dtype(arr_or_dtype) -> bool:
>>> is_datetime64tz_dtype(s)
True
"""
if isinstance(arr_or_dtype, ExtensionDtype):
if isinstance(arr_or_dtype, DatetimeTZDtype):
# GH#33400 fastpath for dtype object
return arr_or_dtype.kind == "M"
# GH 34986
return True

if arr_or_dtype is None:
return False
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/dtypes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
CategoricalDtype,
CategoricalDtypeType,
DatetimeTZDtype,
ExtensionDtype,
IntervalDtype,
PeriodDtype,
)
Expand Down Expand Up @@ -239,6 +240,16 @@ def test_is_datetime64tz_dtype():
assert com.is_datetime64tz_dtype(pd.DatetimeIndex(["2000"], tz="US/Eastern"))


def test_custom_ea_kind_M_not_datetime64tz():
# GH 34986
class NotTZDtype(ExtensionDtype):
@property
def kind(self) -> str:
return "M"

assert not com.is_datetime64tz_dtype(NotTZDtype())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also check needs_i8_conversion here



def test_is_timedelta64_dtype():
assert not com.is_timedelta64_dtype(object)
assert not com.is_timedelta64_dtype(None)
Expand Down
98 changes: 15 additions & 83 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,15 +420,6 @@ def test_groupby_extension_no_sort(self, data_for_grouping, request):
reason=f"pyarrow doesn't support factorizing {pa_dtype}",
)
)
elif pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
super().test_groupby_extension_no_sort(data_for_grouping)

def test_groupby_extension_transform(self, data_for_grouping, request):
Expand All @@ -453,7 +444,6 @@ def test_groupby_extension_apply(
):
pa_dtype = data_for_grouping.dtype.pyarrow_dtype
# Is there a better way to get the "series" ID for groupby_apply_op?
is_series = "series" in request.node.nodeid
is_object = "object" in request.node.nodeid
if pa.types.is_duration(pa_dtype):
request.node.add_marker(
Expand All @@ -472,13 +462,6 @@ def test_groupby_extension_apply(
reason="GH 47514: _concat_datetime expects axis arg.",
)
)
elif not is_series:
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
super().test_groupby_extension_apply(data_for_grouping, groupby_apply_op)

def test_in_numeric_groupby(self, data_for_grouping, request):
Expand Down Expand Up @@ -508,16 +491,6 @@ def test_groupby_extension_agg(self, as_index, data_for_grouping, request):
reason=f"pyarrow doesn't support factorizing {pa_dtype}",
)
)
elif as_index is True and (
pa.types.is_date(pa_dtype)
or (pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None)
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
super().test_groupby_extension_agg(as_index, data_for_grouping)


Expand Down Expand Up @@ -752,22 +725,17 @@ def test_concat_extension_arrays_copy_false(self, data, na_value, request):

def test_concat_with_reindex(self, data, request, using_array_manager):
pa_dtype = data.dtype.pyarrow_dtype
if pa.types.is_duration(pa_dtype):
if (
pa.types.is_duration(pa_dtype)
or pa.types.is_date(pa_dtype)
or (pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None)
):
request.node.add_marker(
pytest.mark.xfail(
raises=TypeError,
reason="GH 47514: _concat_datetime expects axis arg.",
)
)
elif pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError if not using_array_manager else TypeError,
reason="GH 34986",
)
)
super().test_concat_with_reindex(data)

def test_align(self, data, na_value, request):
Expand Down Expand Up @@ -810,32 +778,6 @@ def test_merge(self, data, na_value, request):
)
super().test_merge(data, na_value)

def test_merge_on_extension_array(self, data, request):
pa_dtype = data.dtype.pyarrow_dtype
if pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
super().test_merge_on_extension_array(data)

def test_merge_on_extension_array_duplicates(self, data, request):
pa_dtype = data.dtype.pyarrow_dtype
if pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
super().test_merge_on_extension_array_duplicates(data)

def test_ravel(self, data, request):
tz = getattr(data.dtype.pyarrow_dtype, "tz", None)
if pa_version_under2p0 and tz not in (None, "UTC"):
Expand Down Expand Up @@ -1348,16 +1290,7 @@ def test_diff(self, data, periods, request):
@pytest.mark.parametrize("dropna", [True, False])
def test_value_counts(self, all_data, dropna, request):
pa_dtype = all_data.dtype.pyarrow_dtype
if pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
elif pa.types.is_duration(pa_dtype):
if pa.types.is_duration(pa_dtype):
request.node.add_marker(
pytest.mark.xfail(
raises=pa.ArrowNotImplementedError,
Expand All @@ -1368,16 +1301,7 @@ def test_value_counts(self, all_data, dropna, request):

def test_value_counts_with_normalize(self, data, request):
pa_dtype = data.dtype.pyarrow_dtype
if pa.types.is_date(pa_dtype) or (
pa.types.is_timestamp(pa_dtype) and pa_dtype.tz is None
):
request.node.add_marker(
pytest.mark.xfail(
raises=AttributeError,
reason="GH 34986",
)
)
elif pa.types.is_duration(pa_dtype):
if pa.types.is_duration(pa_dtype):
request.node.add_marker(
pytest.mark.xfail(
raises=pa.ArrowNotImplementedError,
Expand Down Expand Up @@ -2063,3 +1987,11 @@ def test_mode(data_for_grouping, dropna, take_idx, exp_idx, request):
result = ser.mode(dropna=dropna)
expected = pd.Series(data_for_grouping.take(exp_idx))
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize("box", ["Series", "DataFrame"])
def test_repr_from_arrow_array(data, box):
# GH 34986 & 48238
pa_array = pa.array([data[0], None])
result = getattr(pd, box)(pa_array, dtype=ArrowDtype(pa_array.type))
repr(result)