Skip to content

Commit 31f48ed

Browse files
Merge remote-tracking branch 'upstream/master' into numba
2 parents b10722f + db95fdc commit 31f48ed

File tree

94 files changed

+1970
-1276
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+1970
-1276
lines changed

.github/workflows/ci.yml

+5-1
Original file line numberDiff line numberDiff line change
@@ -153,13 +153,17 @@ jobs:
153153
run: |
154154
source activate pandas-dev
155155
pytest pandas/tests/frame/methods --array-manager
156+
pytest pandas/tests/frame/test_reductions.py --array-manager
157+
pytest pandas/tests/reductions/ --array-manager
158+
pytest pandas/tests/generic/test_generic.py --array-manager
156159
pytest pandas/tests/arithmetic/ --array-manager
160+
pytest pandas/tests/groupby/aggregate/ --array-manager
157161
pytest pandas/tests/reshape/merge --array-manager
158162
159163
# indexing subset (temporary since other tests don't pass yet)
160164
pytest pandas/tests/frame/indexing/test_indexing.py::TestDataFrameIndexing::test_setitem_boolean --array-manager
161165
pytest pandas/tests/frame/indexing/test_where.py --array-manager
162-
pytest pandas/tests/frame/indexing/test_indexing.py::TestDataFrameIndexing::test_setitem_multi_index --array-manager
166+
pytest pandas/tests/frame/indexing/test_setitem.py::TestDataFrameSetItem::test_setitem_multi_index --array-manager
163167
pytest pandas/tests/frame/indexing/test_setitem.py::TestDataFrameSetItem::test_setitem_listlike_indexer_duplicate_columns --array-manager
164168
pytest pandas/tests/indexing/multiindex/test_setitem.py::TestMultiIndexSetItem::test_astype_assignment_with_dups --array-manager
165169
pytest pandas/tests/indexing/multiindex/test_setitem.py::TestMultiIndexSetItem::test_frame_setitem_multi_column --array-manager

doc/source/whatsnew/v1.2.3.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,10 @@ Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717

1818
- Fixed regression in :meth:`~DataFrame.to_excel` raising ``KeyError`` when giving duplicate columns with ``columns`` attribute (:issue:`39695`)
19-
- Fixed regression in :class:`IntegerArray` unary ops propagating mask on assignment (:issue:`39943`)
19+
- Fixed regression in nullable integer unary ops propagating mask on assignment (:issue:`39943`)
2020
- Fixed regression in :meth:`DataFrame.__setitem__` not aligning :class:`DataFrame` on right-hand side for boolean indexer (:issue:`39931`)
21+
- Fixed regression in :meth:`~DataFrame.to_json` failing to use ``compression`` with URL-like paths that are internally opened in binary mode or with user-provided file objects that are opened in binary mode (:issue:`39985`)
22+
-
2123

2224
.. ---------------------------------------------------------------------------
2325

doc/source/whatsnew/v1.3.0.rst

+9-1
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ Timedelta
308308
- Bug in constructing :class:`Timedelta` from ``np.timedelta64`` objects with non-nanosecond units that are out of bounds for ``timedelta64[ns]`` (:issue:`38965`)
309309
- Bug in constructing a :class:`TimedeltaIndex` incorrectly accepting ``np.datetime64("NaT")`` objects (:issue:`39462`)
310310
- Bug in constructing :class:`Timedelta` from input string with only symbols and no digits failed to raise an error (:issue:`39710`)
311+
- Bug in :class:`TimedeltaIndex` and :func:`to_timedelta` failing to raise when passed non-nanosecond ``timedelta64`` arrays that overflow when converting to ``timedelta64[ns]`` (:issue:`40008`)
311312

312313
Timezones
313314
^^^^^^^^^
@@ -323,6 +324,8 @@ Numeric
323324
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
324325
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError`` (:issue:`38932`)
325326
- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
327+
- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
328+
- Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)
326329
-
327330

328331
Conversion
@@ -376,7 +379,7 @@ Missing
376379
^^^^^^^
377380

378381
- Bug in :class:`Grouper` now correctly propagates ``dropna`` argument and :meth:`DataFrameGroupBy.transform` now correctly handles missing values for ``dropna=True`` (:issue:`35612`)
379-
-
382+
- Bug in :func:`isna`, and :meth:`Series.isna`, :meth:`Index.isna`, :meth:`DataFrame.isna` (and the corresponding ``notna`` functions) not recognizing ``Decimal("NaN")`` objects (:issue:`39409`)
380383
-
381384

382385
MultiIndex
@@ -439,6 +442,10 @@ Groupby/resample/rolling
439442
- Bug in :meth:`core.window.expanding.ExpandingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.cov` where 1 would be returned instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`)
440443
- Bug in :meth:`.GroupBy.mean`, :meth:`.GroupBy.median` and :meth:`DataFrame.pivot_table` not propagating metadata (:issue:`28283`)
441444
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly when window is an offset and dates are in descending order (:issue:`40002`)
445+
- Bug in :class:`SeriesGroupBy` and :class:`DataFrameGroupBy` on an empty ``Series`` or ``DataFrame`` would lose index, columns, and/or data types when directly using the methods ``idxmax``, ``idxmin``, ``mad``, ``min``, ``max``, ``sum``, ``prod``, and ``skew`` or using them through ``apply``, ``aggregate``, or ``resample`` (:issue:`26411`)
446+
- Bug in :meth:`DataFrameGroupBy.apply` where a :class:`MultiIndex` would be created instead of an :class:`Index` if a :class:`:meth:`core.window.rolling.RollingGroupby` object was created (:issue:`39732`)
447+
- Bug in :meth:`DataFrameGroupBy.sample` where error was raised when ``weights`` was specified and the index was an :class:`Int64Index` (:issue:`39927`)
448+
- Bug in :meth:`DataFrameGroupBy.aggregate` and :meth:`.Resampler.aggregate` would sometimes raise ``SpecificationError`` when passed a dictionary and columns were missing; will now always raise a ``KeyError`` instead (:issue:`40004`)
442449
-
443450

444451
Reshaping
@@ -454,6 +461,7 @@ Reshaping
454461
- Bug in :meth:`DataFrame.sort_values` not reshaping index correctly after sorting on columns, when ``ignore_index=True`` (:issue:`39464`)
455462
- Bug in :meth:`DataFrame.append` returning incorrect dtypes with combinations of ``ExtensionDtype`` dtypes (:issue:`39454`)
456463
- Bug in :meth:`DataFrame.append` returning incorrect dtypes with combinations of ``datetime64`` and ``timedelta64`` dtypes (:issue:`39574`)
464+
- Bug in :meth:`DataFrame.pivot_table` returning a ``MultiIndex`` for a single value when operating on and empty ``DataFrame`` (:issue:`13483`)
457465

458466
Sparse
459467
^^^^^^

pandas/_libs/missing.pyx

+17-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from decimal import Decimal
12
import numbers
23

34
import cython
@@ -36,6 +37,8 @@ cdef:
3637

3738
bint is_32bit = not IS64
3839

40+
type cDecimal = Decimal # for faster isinstance checks
41+
3942

4043
cpdef bint is_matching_na(object left, object right, bint nan_matches_none=False):
4144
"""
@@ -86,6 +89,8 @@ cpdef bint is_matching_na(object left, object right, bint nan_matches_none=False
8689
and util.is_timedelta64_object(right)
8790
and get_timedelta64_value(right) == NPY_NAT
8891
)
92+
elif is_decimal_na(left):
93+
return is_decimal_na(right)
8994
return False
9095

9196

@@ -113,7 +118,18 @@ cpdef bint checknull(object val):
113118
The difference between `checknull` and `checknull_old` is that `checknull`
114119
does *not* consider INF or NEGINF to be NA.
115120
"""
116-
return val is C_NA or is_null_datetimelike(val, inat_is_null=False)
121+
return (
122+
val is C_NA
123+
or is_null_datetimelike(val, inat_is_null=False)
124+
or is_decimal_na(val)
125+
)
126+
127+
128+
cdef inline bint is_decimal_na(object val):
129+
"""
130+
Is this a decimal.Decimal object Decimal("NAN").
131+
"""
132+
return isinstance(val, cDecimal) and val != val
117133

118134

119135
cpdef bint checknull_old(object val):

pandas/_libs/tslibs/conversion.pyx

+7-3
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,11 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True):
239239
return result
240240

241241
unit = get_datetime64_unit(arr.flat[0])
242+
if unit == NPY_DATETIMEUNIT.NPY_FR_GENERIC:
243+
# without raising explicitly here, we end up with a SystemError
244+
# built-in function ensure_datetime64ns returned a result with an error
245+
raise ValueError("datetime64/timedelta64 must have a unit specified")
246+
242247
if unit == NPY_FR_ns:
243248
if copy:
244249
arr = arr.copy()
@@ -289,9 +294,8 @@ def ensure_timedelta64ns(arr: ndarray, copy: bool=True):
289294
else:
290295
bad_val = tdmax
291296

292-
raise OutOfBoundsTimedelta(
293-
f"Out of bounds for nanosecond {arr.dtype.name} {bad_val}"
294-
)
297+
msg = f"Out of bounds for nanosecond {arr.dtype.name} {str(bad_val)}"
298+
raise OutOfBoundsTimedelta(msg)
295299

296300
return dt64_result.view(TD64NS_DTYPE)
297301

pandas/_libs/tslibs/timedeltas.pyx

+5-1
Original file line numberDiff line numberDiff line change
@@ -347,9 +347,13 @@ def array_to_timedelta64(ndarray[object] values, str unit=None, str errors="rais
347347
for i in range(n):
348348
try:
349349
result[i] = convert_to_timedelta64(values[i], parsed_unit)
350-
except ValueError:
350+
except ValueError as err:
351351
if errors == 'coerce':
352352
result[i] = NPY_NAT
353+
elif "unit abbreviation w/o a number" in str(err):
354+
# re-raise with more pertinent message
355+
msg = f"Could not convert '{values[i]}' to NumPy timedelta"
356+
raise ValueError(msg) from err
353357
else:
354358
raise
355359

pandas/_testing/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import collections
44
from datetime import datetime
5+
from decimal import Decimal
56
from functools import wraps
67
import operator
78
import os
@@ -146,7 +147,7 @@
146147
+ BYTES_DTYPES
147148
)
148149

149-
NULL_OBJECTS = [None, np.nan, pd.NaT, float("nan"), pd.NA]
150+
NULL_OBJECTS = [None, np.nan, pd.NaT, float("nan"), pd.NA, Decimal("NaN")]
150151

151152
EMPTY_STRING_PATTERN = re.compile("^$")
152153

pandas/_testing/asserters.py

+6-18
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import numpy as np
88

99
from pandas._libs.lib import no_default
10+
from pandas._libs.missing import is_matching_na
1011
import pandas._libs.testing as _testing
1112

1213
from pandas.core.dtypes.common import (
@@ -18,6 +19,7 @@
1819
is_numeric_dtype,
1920
needs_i8_conversion,
2021
)
22+
from pandas.core.dtypes.dtypes import PandasDtype
2123
from pandas.core.dtypes.missing import array_equivalent
2224

2325
import pandas as pd
@@ -457,22 +459,8 @@ def assert_attr_equal(attr: str, left, right, obj: str = "Attributes"):
457459

458460
if left_attr is right_attr:
459461
return True
460-
elif (
461-
is_number(left_attr)
462-
and np.isnan(left_attr)
463-
and is_number(right_attr)
464-
and np.isnan(right_attr)
465-
):
466-
# np.nan
467-
return True
468-
elif (
469-
isinstance(left_attr, (np.datetime64, np.timedelta64))
470-
and isinstance(right_attr, (np.datetime64, np.timedelta64))
471-
and type(left_attr) is type(right_attr)
472-
and np.isnat(left_attr)
473-
and np.isnat(right_attr)
474-
):
475-
# np.datetime64("nat") or np.timedelta64("nat")
462+
elif is_matching_na(left_attr, right_attr):
463+
# e.g. both np.nan, both NaT, both pd.NA, ...
476464
return True
477465

478466
try:
@@ -630,12 +618,12 @@ def raise_assert_detail(obj, message, left, right, diff=None, index_values=None)
630618

631619
if isinstance(left, np.ndarray):
632620
left = pprint_thing(left)
633-
elif is_categorical_dtype(left):
621+
elif is_categorical_dtype(left) or isinstance(left, PandasDtype):
634622
left = repr(left)
635623

636624
if isinstance(right, np.ndarray):
637625
right = pprint_thing(right)
638-
elif is_categorical_dtype(right):
626+
elif is_categorical_dtype(right) or isinstance(right, PandasDtype):
639627
right = repr(right)
640628

641629
msg += f"""

pandas/conftest.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ def nselect_method(request):
304304
# ----------------------------------------------------------------
305305
# Missing values & co.
306306
# ----------------------------------------------------------------
307-
@pytest.fixture(params=tm.NULL_OBJECTS, ids=str)
307+
@pytest.fixture(params=tm.NULL_OBJECTS, ids=lambda x: type(x).__name__)
308308
def nulls_fixture(request):
309309
"""
310310
Fixture for each null type in pandas.

0 commit comments

Comments
 (0)