Skip to content

Commit 40ca9d1

Browse files
committed
Merge branch 'master' into ref-hybrid-3
2 parents 874023e + 855696c commit 40ca9d1

34 files changed

+621
-198
lines changed

doc/source/_static/style/hq_ax1.png

5.95 KB
Loading
5.96 KB
Loading

doc/source/_static/style/hq_props.png

6.09 KB
Loading

doc/source/development/roadmap.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ instead of comparing as False).
7171

7272
Long term, we want to introduce consistent missing data handling for all data
7373
types. This includes consistent behavior in all operations (indexing, arithmetic
74-
operations, comparisons, etc.). We want to eventually make the new semantics the
75-
default.
74+
operations, comparisons, etc.). There has been discussion of eventually making
75+
the new semantics the default.
7676

7777
This has been discussed at
7878
`github #28095 <https://github.com/pandas-dev/pandas/issues/28095>`__ (and

doc/source/getting_started/intro_tutorials/01_table_oriented.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ these are by default not taken into account by the :func:`~DataFrame.describe` m
176176

177177
Many pandas operations return a ``DataFrame`` or a ``Series``. The
178178
:func:`~DataFrame.describe` method is an example of a pandas operation returning a
179-
pandas ``Series``.
179+
pandas ``Series`` or a pandas ``DataFrame``.
180180

181181
.. raw:: html
182182

doc/source/whatsnew/v1.3.0.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,9 @@ to accept more universal CSS language for arguments, such as ``'color:red;'`` in
119119
to allow custom CSS highlighting instead of default background coloring (:issue:`40242`).
120120
Enhancements to other built-in methods include extending the :meth:`.Styler.background_gradient`
121121
method to shade elements based on a given gradient map and not be restricted only to
122-
values in the DataFrame (:issue:`39930` :issue:`22727` :issue:`28901`).
122+
values in the DataFrame (:issue:`39930` :issue:`22727` :issue:`28901`). Additional
123+
built-in methods such as :meth:`.Styler.highlight_between` and :meth:`.Styler.highlight_quantile`
124+
have been added (:issue:`39821` and :issue:`40926`).
123125

124126
The :meth:`.Styler.apply` now consistently allows functions with ``ndarray`` output to
125127
allow more flexible development of UDFs when ``axis`` is ``None`` ``0`` or ``1`` (:issue:`39393`).
@@ -219,6 +221,7 @@ Other enhancements
219221
- :meth:`pandas.read_csv` and :meth:`pandas.read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
220222
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
221223
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
224+
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`)
222225
-
223226

224227
.. ---------------------------------------------------------------------------
@@ -601,8 +604,10 @@ Deprecations
601604
- Deprecated :meth:`.Styler.set_na_rep` and :meth:`.Styler.set_precision` in favour of :meth:`.Styler.format` with ``na_rep`` and ``precision`` as existing and new input arguments respectively (:issue:`40134`, :issue:`40425`)
602605
- Deprecated allowing partial failure in :meth:`Series.transform` and :meth:`DataFrame.transform` when ``func`` is list-like or dict-like and raises anything but ``TypeError``; ``func`` raising anything but a ``TypeError`` will raise in a future version (:issue:`40211`)
603606
- Deprecated support for ``np.ma.mrecords.MaskedRecords`` in the :class:`DataFrame` constructor, pass ``{name: data[name] for name in data.dtype.names}`` instead (:issue:`40363`)
607+
- Deprecated using :func:`merge` or :func:`join` on a different number of levels (:issue:`34862`)
604608
- Deprecated the use of ``**kwargs`` in :class:`.ExcelWriter`; use the keyword argument ``engine_kwargs`` instead (:issue:`40430`)
605609
- Deprecated the ``level`` keyword for :class:`DataFrame` and :class:`Series` aggregations; use groupby instead (:issue:`39983`)
610+
- Deprecated :func:`merge` producing duplicated columns through the ``suffixes`` keyword and already existing columns (:issue:`22818`)
606611

607612
.. ---------------------------------------------------------------------------
608613
@@ -749,6 +754,7 @@ Missing
749754
- Bug in :class:`Grouper` now correctly propagates ``dropna`` argument and :meth:`DataFrameGroupBy.transform` now correctly handles missing values for ``dropna=True`` (:issue:`35612`)
750755
- Bug in :func:`isna`, and :meth:`Series.isna`, :meth:`Index.isna`, :meth:`DataFrame.isna` (and the corresponding ``notna`` functions) not recognizing ``Decimal("NaN")`` objects (:issue:`39409`)
751756
- Bug in :meth:`DataFrame.fillna` not accepting dictionary for ``downcast`` keyword (:issue:`40809`)
757+
- Bug in :func:`isna` not returning a copy of the mask for nullable types, causing any subsequent mask modification to change the original array (:issue:`40935`)
752758

753759
MultiIndex
754760
^^^^^^^^^^
@@ -831,6 +837,7 @@ Groupby/resample/rolling
831837
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
832838
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
833839
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
840+
- Bug in :meth:`.GroupBy.rank` with nullable dtypes incorrectly raising ``TypeError`` (:issue:`41010`)
834841

835842
Reshaping
836843
^^^^^^^^^
@@ -851,6 +858,7 @@ Reshaping
851858
- Bug in :meth:`DataFrame.stack` not preserving ``CategoricalDtype`` in a ``MultiIndex`` (:issue:`36991`)
852859
- Bug in :func:`to_datetime` raising error when input sequence contains unhashable items (:issue:`39756`)
853860
- Bug in :meth:`Series.explode` preserving index when ``ignore_index`` was ``True`` and values were scalars (:issue:`40487`)
861+
- Bug in :func:`to_datetime` raising ``ValueError`` when :class:`Series` contains ``None`` and ``NaT`` and has more than 50 elements (:issue:`39882`)
854862

855863
Sparse
856864
^^^^^^

pandas/core/array_algos/take.py

Lines changed: 15 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -105,14 +105,13 @@ def take_nd(
105105
return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
106106

107107
arr = np.asarray(arr)
108-
return _take_nd_ndarray(arr, indexer, axis, None, fill_value, allow_fill)
108+
return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
109109

110110

111111
def _take_nd_ndarray(
112112
arr: np.ndarray,
113113
indexer,
114114
axis: int,
115-
out: np.ndarray | None,
116115
fill_value,
117116
allow_fill: bool,
118117
) -> np.ndarray:
@@ -124,7 +123,7 @@ def _take_nd_ndarray(
124123
indexer = ensure_platform_int(indexer)
125124

126125
indexer, dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
127-
arr, indexer, out, fill_value, allow_fill
126+
arr, indexer, fill_value, allow_fill
128127
)
129128

130129
flip_order = False
@@ -134,23 +133,20 @@ def _take_nd_ndarray(
134133
if flip_order:
135134
arr = arr.T
136135
axis = arr.ndim - axis - 1
137-
if out is not None:
138-
out = out.T
139136

140137
# at this point, it's guaranteed that dtype can hold both the arr values
141138
# and the fill_value
142-
if out is None:
143-
out_shape_ = list(arr.shape)
144-
out_shape_[axis] = len(indexer)
145-
out_shape = tuple(out_shape_)
146-
if arr.flags.f_contiguous and axis == arr.ndim - 1:
147-
# minor tweak that can make an order-of-magnitude difference
148-
# for dataframes initialized directly from 2-d ndarrays
149-
# (s.t. df.values is c-contiguous and df._mgr.blocks[0] is its
150-
# f-contiguous transpose)
151-
out = np.empty(out_shape, dtype=dtype, order="F")
152-
else:
153-
out = np.empty(out_shape, dtype=dtype)
139+
out_shape_ = list(arr.shape)
140+
out_shape_[axis] = len(indexer)
141+
out_shape = tuple(out_shape_)
142+
if arr.flags.f_contiguous and axis == arr.ndim - 1:
143+
# minor tweak that can make an order-of-magnitude difference
144+
# for dataframes initialized directly from 2-d ndarrays
145+
# (s.t. df.values is c-contiguous and df._mgr.blocks[0] is its
146+
# f-contiguous transpose)
147+
out = np.empty(out_shape, dtype=dtype, order="F")
148+
else:
149+
out = np.empty(out_shape, dtype=dtype)
154150

155151
func = _get_take_nd_function(
156152
arr.ndim, arr.dtype, out.dtype, axis=axis, mask_info=mask_info
@@ -195,7 +191,7 @@ def take_1d(
195191
return arr.take(indexer)
196192

197193
indexer, dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
198-
arr, indexer, None, fill_value, True
194+
arr, indexer, fill_value, True
199195
)
200196

201197
# at this point, it's guaranteed that dtype can hold both the arr values
@@ -521,7 +517,6 @@ def _take_2d_multi_object(
521517
def _take_preprocess_indexer_and_fill_value(
522518
arr: np.ndarray,
523519
indexer: np.ndarray,
524-
out: np.ndarray | None,
525520
fill_value,
526521
allow_fill: bool,
527522
):
@@ -539,10 +534,7 @@ def _take_preprocess_indexer_and_fill_value(
539534
mask = indexer == -1
540535
needs_masking = mask.any()
541536
mask_info = mask, needs_masking
542-
if needs_masking:
543-
if out is not None and out.dtype != dtype:
544-
raise TypeError("Incompatible type for fill_value")
545-
else:
537+
if not needs_masking:
546538
# if not, then depromote, set fill_value to dummy
547539
# (it won't be used but we don't want the cython code
548540
# to crash when trying to cast it to dtype)

pandas/core/arrays/masked.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -352,7 +352,7 @@ def _hasna(self) -> bool:
352352
return self._mask.any() # type: ignore[return-value]
353353

354354
def isna(self) -> np.ndarray:
355-
return self._mask
355+
return self._mask.copy()
356356

357357
@property
358358
def _na_value(self):

pandas/core/arrays/string_arrow.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -757,3 +757,9 @@ def _str_map(self, f, na_value=None, dtype: Dtype | None = None):
757757
# or .findall returns a list).
758758
# -> We don't know the result type. E.g. `.get` can return anything.
759759
return lib.map_infer_mask(arr, f, mask.view("uint8"))
760+
761+
def _str_lower(self):
762+
return type(self)(pc.utf8_lower(self._data))
763+
764+
def _str_upper(self):
765+
return type(self)(pc.utf8_upper(self._data))

pandas/core/frame.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4790,7 +4790,8 @@ def drop(
47904790
Remove rows or columns by specifying label names and corresponding
47914791
axis, or by specifying directly index or column names. When using a
47924792
multi-index, labels on different levels can be removed by specifying
4793-
the level.
4793+
the level. See the `user guide <advanced.shown_levels>`
4794+
for more information about the now unused levels.
47944795
47954796
Parameters
47964797
----------
@@ -7821,6 +7822,11 @@ def pivot(self, index=None, columns=None, values=None) -> DataFrame:
78217822
78227823
.. versionchanged:: 0.25.0
78237824
7825+
sort : bool, default True
7826+
Specifies if the result should be sorted.
7827+
7828+
.. versionadded:: 1.3.0
7829+
78247830
Returns
78257831
-------
78267832
DataFrame
@@ -7924,6 +7930,7 @@ def pivot_table(
79247930
dropna=True,
79257931
margins_name="All",
79267932
observed=False,
7933+
sort=True,
79277934
) -> DataFrame:
79287935
from pandas.core.reshape.pivot import pivot_table
79297936

@@ -7938,6 +7945,7 @@ def pivot_table(
79387945
dropna=dropna,
79397946
margins_name=margins_name,
79407947
observed=observed,
7948+
sort=sort,
79417949
)
79427950

79437951
def stack(self, level: Level = -1, dropna: bool = True):

pandas/core/groupby/base.py

Lines changed: 0 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -7,83 +7,8 @@
77

88
import collections
99

10-
from pandas._typing import final
11-
12-
from pandas.core.dtypes.common import (
13-
is_list_like,
14-
is_scalar,
15-
)
16-
17-
from pandas.core.base import PandasObject
18-
1910
OutputKey = collections.namedtuple("OutputKey", ["label", "position"])
2011

21-
22-
class ShallowMixin(PandasObject):
23-
_attributes: list[str] = []
24-
25-
@final
26-
def _shallow_copy(self, obj, **kwargs):
27-
"""
28-
return a new object with the replacement attributes
29-
"""
30-
if isinstance(obj, self._constructor):
31-
obj = obj.obj
32-
for attr in self._attributes:
33-
if attr not in kwargs:
34-
kwargs[attr] = getattr(self, attr)
35-
return self._constructor(obj, **kwargs)
36-
37-
38-
class GotItemMixin(PandasObject):
39-
"""
40-
Provide the groupby facilities to the mixed object.
41-
"""
42-
43-
_attributes: list[str]
44-
45-
@final
46-
def _gotitem(self, key, ndim, subset=None):
47-
"""
48-
Sub-classes to define. Return a sliced object.
49-
50-
Parameters
51-
----------
52-
key : string / list of selections
53-
ndim : {1, 2}
54-
requested ndim of result
55-
subset : object, default None
56-
subset to act on
57-
"""
58-
# create a new object to prevent aliasing
59-
if subset is None:
60-
# error: "GotItemMixin" has no attribute "obj"
61-
subset = self.obj # type: ignore[attr-defined]
62-
63-
# we need to make a shallow copy of ourselves
64-
# with the same groupby
65-
kwargs = {attr: getattr(self, attr) for attr in self._attributes}
66-
67-
# Try to select from a DataFrame, falling back to a Series
68-
try:
69-
# error: "GotItemMixin" has no attribute "_groupby"
70-
groupby = self._groupby[key] # type: ignore[attr-defined]
71-
except IndexError:
72-
# error: "GotItemMixin" has no attribute "_groupby"
73-
groupby = self._groupby # type: ignore[attr-defined]
74-
75-
# error: Too many arguments for "GotItemMixin"
76-
# error: Unexpected keyword argument "groupby" for "GotItemMixin"
77-
# error: Unexpected keyword argument "parent" for "GotItemMixin"
78-
self = type(self)(
79-
subset, groupby=groupby, parent=self, **kwargs # type: ignore[call-arg]
80-
)
81-
self._reset_cache()
82-
if subset.ndim == 2 and (is_scalar(key) and key in subset or is_list_like(key)):
83-
self._selection = key
84-
return self
85-
86-
8712
# special case to prevent duplicate plots when catching exceptions when
8813
# forwarding methods from NDFrames
8914
plotting_methods = frozenset(["plot", "hist"])
@@ -122,8 +47,6 @@ def _gotitem(self, key, ndim, subset=None):
12247
# require postprocessing of the result by transform.
12348
cythonized_kernels = frozenset(["cumprod", "cumsum", "shift", "cummin", "cummax"])
12449

125-
cython_cast_blocklist = frozenset(["rank", "count", "size", "idxmin", "idxmax"])
126-
12750
# List of aggregation/reduction functions.
12851
# These map each group to a single numeric value
12952
reduction_kernels = frozenset(

pandas/core/groupby/generic.py

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646

4747
from pandas.core.dtypes.cast import (
4848
find_common_type,
49-
maybe_cast_result_dtype,
5049
maybe_downcast_numeric,
5150
)
5251
from pandas.core.dtypes.common import (
@@ -58,7 +57,6 @@
5857
is_interval_dtype,
5958
is_numeric_dtype,
6059
is_scalar,
61-
needs_i8_conversion,
6260
)
6361
from pandas.core.dtypes.missing import (
6462
isna,
@@ -1104,13 +1102,11 @@ def _cython_agg_manager(
11041102

11051103
using_array_manager = isinstance(data, ArrayManager)
11061104

1107-
def cast_agg_result(result, values: ArrayLike, how: str) -> ArrayLike:
1105+
def cast_agg_result(
1106+
result: ArrayLike, values: ArrayLike, how: str
1107+
) -> ArrayLike:
11081108
# see if we can cast the values to the desired dtype
11091109
# this may not be the original dtype
1110-
assert not isinstance(result, DataFrame)
1111-
1112-
dtype = maybe_cast_result_dtype(values.dtype, how)
1113-
result = maybe_downcast_numeric(result, dtype)
11141110

11151111
if isinstance(values, Categorical) and isinstance(result, np.ndarray):
11161112
# If the Categorical op didn't raise, it is dtype-preserving
@@ -1125,6 +1121,7 @@ def cast_agg_result(result, values: ArrayLike, how: str) -> ArrayLike:
11251121
):
11261122
# We went through a SeriesGroupByPath and need to reshape
11271123
# GH#32223 includes case with IntegerArray values
1124+
# We only get here with values.dtype == object
11281125
result = result.reshape(1, -1)
11291126
# test_groupby_duplicate_columns gets here with
11301127
# result.dtype == int64, values.dtype=object, how="min"
@@ -1140,8 +1137,11 @@ def py_fallback(values: ArrayLike) -> ArrayLike:
11401137

11411138
# call our grouper again with only this block
11421139
if values.ndim == 1:
1140+
# We only get here with ExtensionArray
1141+
11431142
obj = Series(values)
11441143
else:
1144+
# We only get here with values.dtype == object
11451145
# TODO special case not needed with ArrayManager
11461146
obj = DataFrame(values.T)
11471147
if obj.shape[1] == 1:
@@ -1193,7 +1193,8 @@ def array_func(values: ArrayLike) -> ArrayLike:
11931193

11941194
result = py_fallback(values)
11951195

1196-
return cast_agg_result(result, values, how)
1196+
return cast_agg_result(result, values, how)
1197+
return result
11971198

11981199
# TypeError -> we may have an exception in trying to aggregate
11991200
# continue and exclude the block
@@ -1366,11 +1367,7 @@ def _wrap_applied_output_series(
13661367

13671368
# if we have date/time like in the original, then coerce dates
13681369
# as we are stacking can easily have object dtypes here
1369-
so = self._selected_obj
1370-
if so.ndim == 2 and so.dtypes.apply(needs_i8_conversion).any():
1371-
result = result._convert(datetime=True)
1372-
else:
1373-
result = result._convert(datetime=True)
1370+
result = result._convert(datetime=True)
13741371

13751372
if not self.as_index:
13761373
self._insert_inaxis_grouper_inplace(result)
@@ -1507,7 +1504,7 @@ def _choose_path(self, fast_path: Callable, slow_path: Callable, group: DataFram
15071504
try:
15081505
res_fast = fast_path(group)
15091506
except AssertionError:
1510-
raise
1507+
raise # pragma: no cover
15111508
except Exception:
15121509
# GH#29631 For user-defined function, we can't predict what may be
15131510
# raised; see test_transform.test_transform_fastpath_raises

0 commit comments

Comments
 (0)