Skip to content

Commit 449accc

Browse files
authored
Merge branch 'main' into bug-range
2 parents 64a591a + 4e3d691 commit 449accc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+650
-496
lines changed

asv_bench/benchmarks/frame_methods.py

+3
Original file line numberDiff line numberDiff line change
@@ -611,6 +611,9 @@ def time_frame_duplicated(self):
611611
def time_frame_duplicated_wide(self):
612612
self.df2.duplicated()
613613

614+
def time_frame_duplicated_subset(self):
615+
self.df.duplicated(subset=["a"])
616+
614617

615618
class XS:
616619

doc/source/development/contributing_environment.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
222222
pyenv virtualenv <version> <name-to-give-it>
223223
224224
# For instance:
225-
pyenv virtualenv 3.7.6 pandas-dev
225+
pyenv virtualenv 3.9.10 pandas-dev
226226
227227
# Activate the virtualenv
228228
pyenv activate pandas-dev

doc/source/getting_started/comparison/comparison_with_sql.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ structure.
1818
1919
url = (
2020
"https://raw.github.com/pandas-dev"
21-
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
21+
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
2222
)
2323
tips = pd.read_csv(url)
2424
tips

doc/source/whatsnew/v1.3.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -811,7 +811,7 @@ Other Deprecations
811811
- Deprecated allowing scalars to be passed to the :class:`Categorical` constructor (:issue:`38433`)
812812
- Deprecated constructing :class:`CategoricalIndex` without passing list-like data (:issue:`38944`)
813813
- Deprecated allowing subclass-specific keyword arguments in the :class:`Index` constructor, use the specific subclass directly instead (:issue:`14093`, :issue:`21311`, :issue:`22315`, :issue:`26974`)
814-
- Deprecated the :meth:`astype` method of datetimelike (``timedelta64[ns]``, ``datetime64[ns]``, ``Datetime64TZDtype``, ``PeriodDtype``) to convert to integer dtypes, use ``values.view(...)`` instead (:issue:`38544`)
814+
- Deprecated the :meth:`astype` method of datetimelike (``timedelta64[ns]``, ``datetime64[ns]``, ``Datetime64TZDtype``, ``PeriodDtype``) to convert to integer dtypes, use ``values.view(...)`` instead (:issue:`38544`). This deprecation was later reverted in pandas 1.4.0.
815815
- Deprecated :meth:`MultiIndex.is_lexsorted` and :meth:`MultiIndex.lexsort_depth`, use :meth:`MultiIndex.is_monotonic_increasing` instead (:issue:`32259`)
816816
- Deprecated keyword ``try_cast`` in :meth:`Series.where`, :meth:`Series.mask`, :meth:`DataFrame.where`, :meth:`DataFrame.mask`; cast results manually if desired (:issue:`38836`)
817817
- Deprecated comparison of :class:`Timestamp` objects with ``datetime.date`` objects. Instead of e.g. ``ts <= mydate`` use ``ts <= pd.Timestamp(mydate)`` or ``ts.date() <= mydate`` (:issue:`36131`)

doc/source/whatsnew/v1.4.0.rst

+175-145
Large diffs are not rendered by default.

doc/source/whatsnew/v1.5.0.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,11 @@ enhancement2
3131

3232
Other enhancements
3333
^^^^^^^^^^^^^^^^^^
34+
- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`)
3435
- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`)
3536
- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`)
3637
- :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`)
38+
- :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports `Numba <https://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`45428`)
3739
-
3840

3941
.. ---------------------------------------------------------------------------
@@ -145,6 +147,7 @@ Other Deprecations
145147
- Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse ``dtype``. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`)
146148
- Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (``union`` behavior was already deprecated in version 1.3.0) with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`)
147149
- Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`)
150+
- Deprecated the ``__array_wrap__`` method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`)
148151
-
149152

150153

@@ -153,7 +156,7 @@ Other Deprecations
153156

154157
Performance improvements
155158
~~~~~~~~~~~~~~~~~~~~~~~~
156-
-
159+
- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`)
157160
-
158161

159162
.. ---------------------------------------------------------------------------
@@ -203,7 +206,7 @@ Strings
203206

204207
Interval
205208
^^^^^^^^
206-
-
209+
- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`)
207210
-
208211

209212
Indexing
@@ -213,6 +216,7 @@ Indexing
213216
- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised insead of casting to a common dtype (:issue:`45070`)
214217
- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`)
215218
- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`)
219+
- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`)>>>>>>> main
216220
-
217221

218222
Missing
@@ -228,7 +232,7 @@ MultiIndex
228232
I/O
229233
^^^
230234
- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`)
231-
-
235+
- Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`)
232236

233237
Period
234238
^^^^^^

pandas/_version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
199199
# refs/heads/ and refs/tags/ prefixes that would let us distinguish
200200
# between branches and tags. By ignoring refnames without digits, we
201201
# filter out many common branch names like "release" and
202-
# "stabilization", as well as "HEAD" and "master".
202+
# "stabilization", as well as "HEAD" and "main".
203203
tags = {r for r in refs if re.search(r"\d", r)}
204204
if verbose:
205205
print("discarding '%s', no digits" % ",".join(refs - tags))

pandas/core/array_algos/putmask.py

+2
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ def putmask_without_repeat(
126126
mask : np.ndarray[bool]
127127
new : Any
128128
"""
129+
new = setitem_datetimelike_compat(values, mask.sum(), new)
130+
129131
if getattr(new, "ndim", 0) >= 1:
130132
new = new.astype(values.dtype, copy=False)
131133

pandas/core/arrays/base.py

+17-4
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ def __contains__(self, item: object) -> bool | np.bool_:
427427
if not self._can_hold_na:
428428
return False
429429
elif item is self.dtype.na_value or isinstance(item, self.dtype.type):
430-
return self._hasnans
430+
return self._hasna
431431
else:
432432
return False
433433
else:
@@ -606,7 +606,7 @@ def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll:
606606
raise AbstractMethodError(self)
607607

608608
@property
609-
def _hasnans(self) -> bool:
609+
def _hasna(self) -> bool:
610610
# GH#22680
611611
"""
612612
Equivalent to `self.isna().any()`.
@@ -628,6 +628,16 @@ def _values_for_argsort(self) -> np.ndarray:
628628
See Also
629629
--------
630630
ExtensionArray.argsort : Return the indices that would sort this array.
631+
632+
Notes
633+
-----
634+
The caller is responsible for *not* modifying these values in-place, so
635+
it is safe for implementors to give views on `self`.
636+
637+
Functions that use this (e.g. ExtensionArray.argsort) should ignore
638+
entries with missing values in the original array (according to `self.isna()`).
639+
This means that the corresponding entries in the returned array don't need to
640+
be modified to sort correctly.
631641
"""
632642
# Note: this is used in `ExtensionArray.argsort`.
633643
return np.array(self)
@@ -698,7 +708,7 @@ def argmin(self, skipna: bool = True) -> int:
698708
ExtensionArray.argmax
699709
"""
700710
validate_bool_kwarg(skipna, "skipna")
701-
if not skipna and self._hasnans:
711+
if not skipna and self._hasna:
702712
raise NotImplementedError
703713
return nargminmax(self, "argmin")
704714

@@ -722,7 +732,7 @@ def argmax(self, skipna: bool = True) -> int:
722732
ExtensionArray.argmin
723733
"""
724734
validate_bool_kwarg(skipna, "skipna")
725-
if not skipna and self._hasnans:
735+
if not skipna and self._hasna:
726736
raise NotImplementedError
727737
return nargminmax(self, "argmax")
728738

@@ -1534,6 +1544,9 @@ def _empty(cls, shape: Shape, dtype: ExtensionDtype):
15341544
ExtensionDtype.empty
15351545
ExtensionDtype.empty is the 'official' public version of this API.
15361546
"""
1547+
# Implementer note: while ExtensionDtype.empty is the public way to
1548+
# call this method, it is still required to implement this `_empty`
1549+
# method as well (it is called internally in pandas)
15371550
obj = cls._from_sequence([], dtype=dtype)
15381551

15391552
taker = np.broadcast_to(np.intp(-1), shape)

pandas/core/arrays/boolean.py

-18
Original file line numberDiff line numberDiff line change
@@ -421,24 +421,6 @@ def astype(self, dtype: AstypeArg, copy: bool = True) -> ArrayLike:
421421
# coerce
422422
return self.to_numpy(dtype=dtype, na_value=na_value, copy=False)
423423

424-
def _values_for_argsort(self) -> np.ndarray:
425-
"""
426-
Return values for sorting.
427-
428-
Returns
429-
-------
430-
ndarray
431-
The transformed values should maintain the ordering between values
432-
within the array.
433-
434-
See Also
435-
--------
436-
ExtensionArray.argsort : Return the indices that would sort this array.
437-
"""
438-
data = self._data.copy()
439-
data[self._mask] = -1
440-
return data
441-
442424
def _logical_method(self, other, op):
443425

444426
assert op.__name__ in {"or_", "ror_", "and_", "rand_", "xor", "rxor"}

pandas/core/arrays/datetimelike.py

+3-11
Original file line numberDiff line numberDiff line change
@@ -430,14 +430,6 @@ def astype(self, dtype, copy: bool = True):
430430
elif is_integer_dtype(dtype):
431431
# we deliberately ignore int32 vs. int64 here.
432432
# See https://github.com/pandas-dev/pandas/issues/24381 for more.
433-
warnings.warn(
434-
f"casting {self.dtype} values to int64 with .astype(...) is "
435-
"deprecated and will raise in a future version. "
436-
"Use .view(...) instead.",
437-
FutureWarning,
438-
stacklevel=find_stack_level(),
439-
)
440-
441433
values = self.asi8
442434

443435
if is_unsigned_integer_dtype(dtype):
@@ -849,7 +841,7 @@ def _isnan(self) -> npt.NDArray[np.bool_]:
849841
return self.asi8 == iNaT
850842

851843
@property # NB: override with cache_readonly in immutable subclasses
852-
def _hasnans(self) -> bool:
844+
def _hasna(self) -> bool:
853845
"""
854846
return if I have any nans; enables various perf speedups
855847
"""
@@ -874,7 +866,7 @@ def _maybe_mask_results(
874866
875867
This is an internal routine.
876868
"""
877-
if self._hasnans:
869+
if self._hasna:
878870
if convert:
879871
result = result.astype(convert)
880872
if fill_value is None:
@@ -1133,7 +1125,7 @@ def _add_timedelta_arraylike(self, other):
11331125
new_values = checked_add_with_arr(
11341126
self_i8, other_i8, arr_mask=self._isnan, b_mask=other._isnan
11351127
)
1136-
if self._hasnans or other._hasnans:
1128+
if self._hasna or other._hasna:
11371129
mask = self._isnan | other._isnan
11381130
np.putmask(new_values, mask, iNaT)
11391131

pandas/core/arrays/datetimes.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -738,7 +738,7 @@ def _sub_datetime_arraylike(self, other):
738738
other_i8 = other.asi8
739739
arr_mask = self._isnan | other._isnan
740740
new_values = checked_add_with_arr(self_i8, -other_i8, arr_mask=arr_mask)
741-
if self._hasnans or other._hasnans:
741+
if self._hasna or other._hasna:
742742
np.putmask(new_values, arr_mask, iNaT)
743743
return new_values.view("timedelta64[ns]")
744744

@@ -1356,7 +1356,7 @@ def isocalendar(self) -> DataFrame:
13561356
iso_calendar_df = DataFrame(
13571357
sarray, columns=["year", "week", "day"], dtype="UInt32"
13581358
)
1359-
if self._hasnans:
1359+
if self._hasna:
13601360
iso_calendar_df.iloc[self._isnan] = None
13611361
return iso_calendar_df
13621362

pandas/core/arrays/floating.py

-3
Original file line numberDiff line numberDiff line change
@@ -244,9 +244,6 @@ def _coerce_to_array(
244244
) -> tuple[np.ndarray, np.ndarray]:
245245
return coerce_to_array(value, dtype=dtype, copy=copy)
246246

247-
def _values_for_argsort(self) -> np.ndarray:
248-
return self._data
249-
250247

251248
_dtype_docstring = """
252249
An ExtensionDtype for {dtype} data.

pandas/core/arrays/integer.py

-19
Original file line numberDiff line numberDiff line change
@@ -301,25 +301,6 @@ def _coerce_to_array(
301301
) -> tuple[np.ndarray, np.ndarray]:
302302
return coerce_to_array(value, dtype=dtype, copy=copy)
303303

304-
def _values_for_argsort(self) -> np.ndarray:
305-
"""
306-
Return values for sorting.
307-
308-
Returns
309-
-------
310-
ndarray
311-
The transformed values should maintain the ordering between values
312-
within the array.
313-
314-
See Also
315-
--------
316-
ExtensionArray.argsort : Return the indices that would sort this array.
317-
"""
318-
data = self._data.copy()
319-
if self._mask.any():
320-
data[self._mask] = data.min() - 1
321-
return data
322-
323304

324305
_dtype_docstring = """
325306
An ExtensionDtype for {dtype} integer data.

pandas/core/arrays/interval.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -1140,7 +1140,9 @@ def _validate_setitem_value(self, value):
11401140
return self._validate_listlike(value)
11411141

11421142
if needs_float_conversion:
1143-
raise ValueError("Cannot set float NaN to integer-backed IntervalArray")
1143+
# GH#45484 TypeError, not ValueError, matches what we get with
1144+
# non-NA un-holdable value.
1145+
raise TypeError("Cannot set float NaN to integer-backed IntervalArray")
11441146
return value_left, value_right
11451147

11461148
def value_counts(self, dropna: bool = True):

pandas/core/arrays/masked.py

+4
Original file line numberDiff line numberDiff line change
@@ -757,6 +757,10 @@ def factorize(self, na_sentinel: int = -1) -> tuple[np.ndarray, ExtensionArray]:
757757
uniques_ea = type(self)(uniques, np.zeros(len(uniques), dtype=bool))
758758
return codes, uniques_ea
759759

760+
@doc(ExtensionArray._values_for_argsort)
761+
def _values_for_argsort(self) -> np.ndarray:
762+
return self._data
763+
760764
def value_counts(self, dropna: bool = True) -> Series:
761765
"""
762766
Returns a Series containing counts of each unique value.

pandas/core/arrays/period.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ def _time_shift(self, periods: int, freq=None) -> PeriodArray:
548548
f"{type(self).__name__}._time_shift"
549549
)
550550
values = self.asi8 + periods * self.freq.n
551-
if self._hasnans:
551+
if self._hasna:
552552
values[self._isnan] = iNaT
553553
return type(self)(values, freq=self.freq)
554554

@@ -618,7 +618,7 @@ def asfreq(self, freq=None, how: str = "E") -> PeriodArray:
618618

619619
new_data = period_asfreq_arr(ordinal, base1, base2, end)
620620

621-
if self._hasnans:
621+
if self._hasna:
622622
new_data[self._isnan] = iNaT
623623

624624
return type(self)(new_data, freq=freq)
@@ -645,7 +645,7 @@ def _format_native_types(
645645
else:
646646
formatter = lambda dt: str(dt)
647647

648-
if self._hasnans:
648+
if self._hasna:
649649
mask = self._isnan
650650
values[mask] = na_rep
651651
imask = ~mask
@@ -712,7 +712,7 @@ def _sub_period(self, other):
712712
new_data = asi8 - other.ordinal
713713
new_data = np.array([self.freq * x for x in new_data])
714714

715-
if self._hasnans:
715+
if self._hasna:
716716
new_data[self._isnan] = NaT
717717

718718
return new_data
@@ -739,7 +739,7 @@ def _sub_period_array(self, other):
739739
)
740740

741741
new_values = np.array([self.freq.base * x for x in new_values])
742-
if self._hasnans or other._hasnans:
742+
if self._hasna or other._hasna:
743743
mask = self._isnan | other._isnan
744744
new_values[mask] = NaT
745745
return new_values

pandas/core/arrays/timedeltas.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ def _field_accessor(name: str, alias: str, docstring: str):
8282
def f(self) -> np.ndarray:
8383
values = self.asi8
8484
result = get_timedelta_field(values, alias)
85-
if self._hasnans:
85+
if self._hasna:
8686
result = self._maybe_mask_results(
8787
result, fill_value=None, convert="float64"
8888
)
@@ -911,7 +911,7 @@ def components(self) -> DataFrame:
911911
"microseconds",
912912
"nanoseconds",
913913
]
914-
hasnans = self._hasnans
914+
hasnans = self._hasna
915915
if hasnans:
916916

917917
def f(x):

0 commit comments

Comments
 (0)