Skip to content

Commit 3065ac5

Browse files
Merge remote-tracking branch 'upstream/master' into bisect
2 parents 0d719ae + ca52e39 commit 3065ac5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1181
-850
lines changed

.travis.yml

-6
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,6 @@ matrix:
3535
fast_finish: true
3636

3737
include:
38-
- env:
39-
- JOB="3.8, slow" ENV_FILE="ci/deps/travis-38-slow.yaml" PATTERN="slow" SQL="1"
40-
services:
41-
- mysql
42-
- postgresql
43-
4438
- env:
4539
- JOB="3.7, locale" ENV_FILE="ci/deps/travis-37-locale.yaml" PATTERN="((not slow and not network and not clipboard) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
4640
services:

ci/azure/posix.yml

+5
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,11 @@ jobs:
4343
CONDA_PY: "38"
4444
PATTERN: "not slow and not network and not clipboard"
4545

46+
py38_slow:
47+
ENV_FILE: ci/deps/azure-38-slow.yaml
48+
CONDA_PY: "38"
49+
PATTERN: "slow"
50+
4651
py38_locale:
4752
ENV_FILE: ci/deps/azure-38-locale.yaml
4853
CONDA_PY: "38"
File renamed without changes.

doc/source/ecosystem.rst

+11-9
Original file line numberDiff line numberDiff line change
@@ -474,15 +474,16 @@ A directory of projects providing
474474
:ref:`extension accessors <extending.register-accessors>`. This is for users to
475475
discover new accessors and for library authors to coordinate on the namespace.
476476

477-
=============== ========== ========================= ===============================================================
478-
Library Accessor Classes Description
479-
=============== ========== ========================= ===============================================================
480-
`cyberpandas`_ ``ip`` ``Series`` Provides common operations for working with IP addresses.
481-
`pdvega`_ ``vgplot`` ``Series``, ``DataFrame`` Provides plotting functions from the Altair_ library.
482-
`pandas_path`_ ``path`` ``Index``, ``Series`` Provides `pathlib.Path`_ functions for Series.
483-
`pint-pandas`_ ``pint`` ``Series``, ``DataFrame`` Provides units support for numeric Series and DataFrames.
484-
`composeml`_ ``slice`` ``DataFrame`` Provides a generator for enhanced data slicing.
485-
=============== ========== ========================= ===============================================================
477+
=============== ============ ==================================== ===============================================================
478+
Library Accessor Classes Description
479+
=============== ============ ==================================== ===============================================================
480+
`cyberpandas`_ ``ip`` ``Series`` Provides common operations for working with IP addresses.
481+
`pdvega`_ ``vgplot`` ``Series``, ``DataFrame`` Provides plotting functions from the Altair_ library.
482+
`pandas_path`_ ``path`` ``Index``, ``Series`` Provides `pathlib.Path`_ functions for Series.
483+
`pint-pandas`_ ``pint`` ``Series``, ``DataFrame`` Provides units support for numeric Series and DataFrames.
484+
`composeml`_ ``slice`` ``DataFrame`` Provides a generator for enhanced data slicing.
485+
`datatest`_ ``validate`` ``Series``, ``DataFrame``, ``Index`` Provides validation, differences, and acceptance managers.
486+
=============== ============ ==================================== ===============================================================
486487

487488
.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest
488489
.. _pdvega: https://altair-viz.github.io/pdvega/
@@ -492,3 +493,4 @@ Library Accessor Classes Description
492493
.. _pathlib.Path: https://docs.python.org/3/library/pathlib.html
493494
.. _pint-pandas: https://github.com/hgrecco/pint-pandas
494495
.. _composeml: https://github.com/FeatureLabs/compose
496+
.. _datatest: https://datatest.readthedocs.io/

doc/source/whatsnew/v1.2.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -747,6 +747,7 @@ I/O
747747
- :meth:`DataFrame.to_html` was ignoring ``formatters`` argument for ``ExtensionDtype`` columns (:issue:`36525`)
748748
- Bumped minimum xarray version to 0.12.3 to avoid reference to the removed ``Panel`` class (:issue:`27101`)
749749
- :meth:`DataFrame.to_csv` was re-opening file-like handles that also implement ``os.PathLike`` (:issue:`38125`)
750+
- Bug in the conversion of a sliced ``pyarrow.Table`` with missing values to a DataFrame (:issue:`38525`)
750751

751752
Period
752753
^^^^^^
@@ -859,7 +860,7 @@ Other
859860
- Bug in :meth:`RangeIndex.difference` returning :class:`Int64Index` in some cases where it should return :class:`RangeIndex` (:issue:`38028`)
860861
- Fixed bug in :func:`assert_series_equal` when comparing a datetime-like array with an equivalent non extension dtype array (:issue:`37609`)
861862
- Bug in :func:`.is_bool_dtype` would raise when passed a valid string such as ``"boolean"`` (:issue:`38386`)
862-
863+
- Fixed regression in logical operators raising ``ValueError`` when columns of :class:`DataFrame` are a :class:`CategoricalIndex` with unused categories (:issue:`38367`)
863864

864865
.. ---------------------------------------------------------------------------
865866

pandas/core/arrays/_arrow_utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def pyarrow_array_to_numpy_and_mask(arr, dtype):
3030
bitmask = buflist[0]
3131
if bitmask is not None:
3232
mask = pyarrow.BooleanArray.from_buffers(
33-
pyarrow.bool_(), len(arr), [None, bitmask]
33+
pyarrow.bool_(), len(arr), [None, bitmask], offset=arr.offset
3434
)
3535
mask = np.asarray(mask)
3636
else:

pandas/core/arrays/boolean.py

+4-17
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,14 @@
1010

1111
from pandas.core.dtypes.common import (
1212
is_bool_dtype,
13-
is_extension_array_dtype,
1413
is_float,
1514
is_float_dtype,
1615
is_integer_dtype,
1716
is_list_like,
1817
is_numeric_dtype,
1918
pandas_dtype,
2019
)
21-
from pandas.core.dtypes.dtypes import register_extension_dtype
20+
from pandas.core.dtypes.dtypes import ExtensionDtype, register_extension_dtype
2221
from pandas.core.dtypes.missing import isna
2322

2423
from pandas.core import ops
@@ -372,34 +371,22 @@ def astype(self, dtype, copy: bool = True) -> ArrayLike:
372371
if incompatible type with an BooleanDtype, equivalent of same_kind
373372
casting
374373
"""
375-
from pandas.core.arrays.string_ import StringDtype
376-
377374
dtype = pandas_dtype(dtype)
378375

379-
if isinstance(dtype, BooleanDtype):
380-
values, mask = coerce_to_array(self, copy=copy)
381-
if not copy:
382-
return self
383-
else:
384-
return BooleanArray(values, mask, copy=False)
385-
elif isinstance(dtype, StringDtype):
386-
return dtype.construct_array_type()._from_sequence(self, copy=False)
376+
if isinstance(dtype, ExtensionDtype):
377+
return super().astype(dtype, copy)
387378

388379
if is_bool_dtype(dtype):
389380
# astype_nansafe converts np.nan to True
390381
if self._hasna:
391382
raise ValueError("cannot convert float NaN to bool")
392383
else:
393384
return self._data.astype(dtype, copy=copy)
394-
if is_extension_array_dtype(dtype) and is_integer_dtype(dtype):
395-
from pandas.core.arrays import IntegerArray
396385

397-
return IntegerArray(
398-
self._data.astype(dtype.numpy_dtype), self._mask.copy(), copy=False
399-
)
400386
# for integer, error if there are missing values
401387
if is_integer_dtype(dtype) and self._hasna:
402388
raise ValueError("cannot convert NA to integer")
389+
403390
# for float dtype, ensure we use np.nan before casting (numpy cannot
404391
# deal with pd.NA)
405392
na_value = self._na_value

pandas/core/arrays/datetimelike.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,7 @@ def astype(self, dtype, copy=True):
346346
elif is_string_dtype(dtype) and not is_categorical_dtype(dtype):
347347
if is_extension_array_dtype(dtype):
348348
arr_cls = dtype.construct_array_type()
349-
return arr_cls._from_sequence(self, dtype=dtype)
349+
return arr_cls._from_sequence(self, dtype=dtype, copy=copy)
350350
else:
351351
return self._format_native_types()
352352
elif is_integer_dtype(dtype):

pandas/core/arrays/datetimes.py

+19-14
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
is_categorical_dtype,
3232
is_datetime64_any_dtype,
3333
is_datetime64_dtype,
34-
is_datetime64_ns_dtype,
3534
is_datetime64tz_dtype,
3635
is_dtype_equal,
3736
is_extension_array_dtype,
@@ -587,24 +586,30 @@ def astype(self, dtype, copy=True):
587586
# DatetimeLikeArrayMixin Super handles the rest.
588587
dtype = pandas_dtype(dtype)
589588

590-
if is_datetime64_ns_dtype(dtype) and not is_dtype_equal(dtype, self.dtype):
589+
if is_dtype_equal(dtype, self.dtype):
590+
if copy:
591+
return self.copy()
592+
return self
593+
594+
elif is_datetime64tz_dtype(dtype) and self.tz is None:
595+
# FIXME: GH#33401 this does not match Series behavior
596+
return self.tz_localize(dtype.tz)
597+
598+
elif is_datetime64tz_dtype(dtype):
591599
# GH#18951: datetime64_ns dtype but not equal means different tz
592-
new_tz = getattr(dtype, "tz", None)
593-
if getattr(self.dtype, "tz", None) is None:
594-
return self.tz_localize(new_tz)
595-
result = self.tz_convert(new_tz)
600+
result = self.tz_convert(dtype.tz)
596601
if copy:
597602
result = result.copy()
598-
if new_tz is None:
599-
# Do we want .astype('datetime64[ns]') to be an ndarray.
600-
# The astype in Block._astype expects this to return an
601-
# ndarray, but we could maybe work around it there.
602-
result = result._data
603603
return result
604-
elif is_datetime64tz_dtype(self.dtype) and is_dtype_equal(self.dtype, dtype):
604+
605+
elif dtype == "M8[ns]":
606+
# we must have self.tz is None, otherwise we would have gone through
607+
# the is_dtype_equal branch above.
608+
result = self.tz_convert("UTC").tz_localize(None)
605609
if copy:
606-
return self.copy()
607-
return self
610+
result = result.copy()
611+
return result
612+
608613
elif is_period_dtype(dtype):
609614
return self.to_period(freq=dtype.freq)
610615
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)

pandas/core/arrays/floating.py

+3-18
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,13 @@
1919
is_object_dtype,
2020
pandas_dtype,
2121
)
22-
from pandas.core.dtypes.dtypes import register_extension_dtype
22+
from pandas.core.dtypes.dtypes import ExtensionDtype, register_extension_dtype
2323
from pandas.core.dtypes.missing import isna
2424

2525
from pandas.core import ops
2626
from pandas.core.ops import invalid_comparison
2727
from pandas.core.tools.numeric import to_numeric
2828

29-
from .masked import BaseMaskedDtype
3029
from .numeric import NumericArray, NumericDtype
3130

3231

@@ -332,24 +331,10 @@ def astype(self, dtype, copy: bool = True) -> ArrayLike:
332331
if incompatible type with an FloatingDtype, equivalent of same_kind
333332
casting
334333
"""
335-
from pandas.core.arrays.string_ import StringArray, StringDtype
336-
337334
dtype = pandas_dtype(dtype)
338335

339-
# if the dtype is exactly the same, we can fastpath
340-
if self.dtype == dtype:
341-
# return the same object for copy=False
342-
return self.copy() if copy else self
343-
# if we are astyping to another nullable masked dtype, we can fastpath
344-
if isinstance(dtype, BaseMaskedDtype):
345-
# TODO deal with NaNs
346-
data = self._data.astype(dtype.numpy_dtype, copy=copy)
347-
# mask is copied depending on whether the data was copied, and
348-
# not directly depending on the `copy` keyword
349-
mask = self._mask if data is self._data else self._mask.copy()
350-
return dtype.construct_array_type()(data, mask, copy=False)
351-
elif isinstance(dtype, StringDtype):
352-
return StringArray._from_sequence(self, copy=False)
336+
if isinstance(dtype, ExtensionDtype):
337+
return super().astype(dtype, copy=copy)
353338

354339
# coerce
355340
if is_float_dtype(dtype):

pandas/core/arrays/integer.py

+3-17
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from pandas.compat.numpy import function as nv
1010
from pandas.util._decorators import cache_readonly
1111

12-
from pandas.core.dtypes.base import register_extension_dtype
12+
from pandas.core.dtypes.base import ExtensionDtype, register_extension_dtype
1313
from pandas.core.dtypes.common import (
1414
is_bool_dtype,
1515
is_datetime64_dtype,
@@ -390,24 +390,10 @@ def astype(self, dtype, copy: bool = True) -> ArrayLike:
390390
if incompatible type with an IntegerDtype, equivalent of same_kind
391391
casting
392392
"""
393-
from pandas.core.arrays.masked import BaseMaskedDtype
394-
from pandas.core.arrays.string_ import StringDtype
395-
396393
dtype = pandas_dtype(dtype)
397394

398-
# if the dtype is exactly the same, we can fastpath
399-
if self.dtype == dtype:
400-
# return the same object for copy=False
401-
return self.copy() if copy else self
402-
# if we are astyping to another nullable masked dtype, we can fastpath
403-
if isinstance(dtype, BaseMaskedDtype):
404-
data = self._data.astype(dtype.numpy_dtype, copy=copy)
405-
# mask is copied depending on whether the data was copied, and
406-
# not directly depending on the `copy` keyword
407-
mask = self._mask if data is self._data else self._mask.copy()
408-
return dtype.construct_array_type()(data, mask, copy=False)
409-
elif isinstance(dtype, StringDtype):
410-
return dtype.construct_array_type()._from_sequence(self, copy=False)
395+
if isinstance(dtype, ExtensionDtype):
396+
return super().astype(dtype, copy=copy)
411397

412398
# coerce
413399
if is_float_dtype(dtype):

pandas/core/arrays/masked.py

+27-1
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,18 @@
55
import numpy as np
66

77
from pandas._libs import lib, missing as libmissing
8-
from pandas._typing import Scalar
8+
from pandas._typing import ArrayLike, Dtype, Scalar
99
from pandas.errors import AbstractMethodError
1010
from pandas.util._decorators import cache_readonly, doc
1111

1212
from pandas.core.dtypes.base import ExtensionDtype
1313
from pandas.core.dtypes.common import (
14+
is_dtype_equal,
1415
is_integer,
1516
is_object_dtype,
1617
is_scalar,
1718
is_string_dtype,
19+
pandas_dtype,
1820
)
1921
from pandas.core.dtypes.missing import isna, notna
2022

@@ -229,6 +231,30 @@ def to_numpy(
229231
data = self._data.astype(dtype, copy=copy)
230232
return data
231233

234+
def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
235+
dtype = pandas_dtype(dtype)
236+
237+
if is_dtype_equal(dtype, self.dtype):
238+
if copy:
239+
return self.copy()
240+
return self
241+
242+
# if we are astyping to another nullable masked dtype, we can fastpath
243+
if isinstance(dtype, BaseMaskedDtype):
244+
# TODO deal with NaNs for FloatingArray case
245+
data = self._data.astype(dtype.numpy_dtype, copy=copy)
246+
# mask is copied depending on whether the data was copied, and
247+
# not directly depending on the `copy` keyword
248+
mask = self._mask if data is self._data else self._mask.copy()
249+
cls = dtype.construct_array_type()
250+
return cls(data, mask, copy=False)
251+
252+
if isinstance(dtype, ExtensionDtype):
253+
eacls = dtype.construct_array_type()
254+
return eacls._from_sequence(self, dtype=dtype, copy=copy)
255+
256+
raise NotImplementedError("subclass must implement astype to np.dtype")
257+
232258
__array_priority__ = 1000 # higher than ndarray so ops dispatch to us
233259

234260
def __array__(self, dtype=None) -> np.ndarray:

pandas/core/arrays/period.py

+20-7
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
period_asfreq_arr,
2828
)
2929
from pandas._typing import AnyArrayLike
30-
from pandas.util._decorators import cache_readonly
30+
from pandas.util._decorators import cache_readonly, doc
3131

3232
from pandas.core.dtypes.common import (
3333
TD64NS_DTYPE,
@@ -51,6 +51,10 @@
5151
from pandas.core.arrays import datetimelike as dtl
5252
import pandas.core.common as com
5353

54+
_shared_doc_kwargs = {
55+
"klass": "PeriodArray",
56+
}
57+
5458

5559
def _field_accessor(name: str, docstring=None):
5660
def f(self):
@@ -67,8 +71,8 @@ class PeriodArray(PeriodMixin, dtl.DatelikeOps):
6771
"""
6872
Pandas ExtensionArray for storing Period data.
6973
70-
Users should use :func:`period_range` to create new instances.
71-
Alternatively, :func:`array` can be used to create new instances
74+
Users should use :func:`~pandas.period_array` to create new instances.
75+
Alternatively, :func:`~pandas.array` can be used to create new instances
7276
from a sequence of Period scalars.
7377
7478
Parameters
@@ -495,15 +499,19 @@ def _time_shift(self, periods, freq=None):
495499
def _box_func(self, x) -> Union[Period, NaTType]:
496500
return Period._from_ordinal(ordinal=x, freq=self.freq)
497501

502+
@doc(**_shared_doc_kwargs, other="PeriodIndex", other_name="PeriodIndex")
498503
def asfreq(self, freq=None, how: str = "E") -> "PeriodArray":
499504
"""
500-
Convert the Period Array/Index to the specified frequency `freq`.
505+
Convert the {klass} to the specified frequency `freq`.
506+
507+
Equivalent to applying :meth:`pandas.Period.asfreq` with the given arguments
508+
to each :class:`~pandas.Period` in this {klass}.
501509
502510
Parameters
503511
----------
504512
freq : str
505513
A frequency.
506-
how : str {'E', 'S'}
514+
how : str {{'E', 'S'}}, default 'E'
507515
Whether the elements should be aligned to the end
508516
or start within pa period.
509517
@@ -514,8 +522,13 @@ def asfreq(self, freq=None, how: str = "E") -> "PeriodArray":
514522
515523
Returns
516524
-------
517-
Period Array/Index
518-
Constructed with the new frequency.
525+
{klass}
526+
The transformed {klass} with the new frequency.
527+
528+
See Also
529+
--------
530+
{other}.asfreq: Convert each Period in a {other_name} to the given frequency.
531+
Period.asfreq : Convert a :class:`~pandas.Period` object to the given frequency.
519532
520533
Examples
521534
--------

0 commit comments

Comments
 (0)