Skip to content

Commit 8c19221

Browse files
authored
Merge branch 'main' into fix-issue-61221
2 parents 8afbad2 + 52e9767 commit 8c19221

File tree

14 files changed

+295
-67
lines changed

14 files changed

+295
-67
lines changed

doc/source/reference/groupby.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ Function application
7979
DataFrameGroupBy.cumsum
8080
DataFrameGroupBy.describe
8181
DataFrameGroupBy.diff
82+
DataFrameGroupBy.ewm
83+
DataFrameGroupBy.expanding
8284
DataFrameGroupBy.ffill
8385
DataFrameGroupBy.first
8486
DataFrameGroupBy.head
@@ -130,6 +132,8 @@ Function application
130132
SeriesGroupBy.cumsum
131133
SeriesGroupBy.describe
132134
SeriesGroupBy.diff
135+
SeriesGroupBy.ewm
136+
SeriesGroupBy.expanding
133137
SeriesGroupBy.ffill
134138
SeriesGroupBy.first
135139
SeriesGroupBy.head

doc/source/whatsnew/v3.0.0.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,7 @@ Other Deprecations
421421
- Deprecated lowercase strings ``w``, ``w-mon``, ``w-tue``, etc. denoting frequencies in :class:`Week` in favour of ``W``, ``W-MON``, ``W-TUE``, etc. (:issue:`58998`)
422422
- Deprecated parameter ``method`` in :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like` (:issue:`58667`)
423423
- Deprecated strings ``w``, ``d``, ``MIN``, ``MS``, ``US`` and ``NS`` denoting units in :class:`Timedelta` in favour of ``W``, ``D``, ``min``, ``ms``, ``us`` and ``ns`` (:issue:`59051`)
424+
- Deprecated the ``arg`` parameter of ``Series.map``; pass the added ``func`` argument instead. (:issue:`61260`)
424425
- Deprecated using ``epoch`` date format in :meth:`DataFrame.to_json` and :meth:`Series.to_json`, use ``iso`` instead. (:issue:`57063`)
425426

426427
.. ---------------------------------------------------------------------------
@@ -622,6 +623,7 @@ Performance improvements
622623
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
623624
- Performance improvement in :meth:`DataFrame.__getitem__` when ``key`` is a :class:`DataFrame` with many columns (:issue:`61010`)
624625
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
626+
- Performance improvement in :meth:`DataFrame.stack` when using ``future_stack=True`` and the DataFrame does not have a :class:`MultiIndex` (:issue:`58391`)
625627
- Performance improvement in :meth:`DataFrame.where` when ``cond`` is a :class:`DataFrame` with many columns (:issue:`61010`)
626628
- Performance improvement in :meth:`to_hdf` avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (:issue:`58248`)
627629
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
@@ -637,6 +639,7 @@ Bug fixes
637639
Categorical
638640
^^^^^^^^^^^
639641
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
642+
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
640643
- Bug in :meth:`Series.convert_dtypes` with ``dtype_backend="pyarrow"`` where empty :class:`CategoricalDtype` :class:`Series` raised an error or got converted to ``null[pyarrow]`` (:issue:`59934`)
641644
-
642645

@@ -649,6 +652,7 @@ Datetimelike
649652
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
650653
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
651654
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
655+
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
652656
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
653657
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
654658
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)

pandas/__init__.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,17 @@
44

55
# Let users know if they're missing any of our hard dependencies
66
_hard_dependencies = ("numpy", "dateutil")
7-
_missing_dependencies = []
87

98
for _dependency in _hard_dependencies:
109
try:
1110
__import__(_dependency)
1211
except ImportError as _e: # pragma: no cover
13-
_missing_dependencies.append(f"{_dependency}: {_e}")
12+
raise ImportError(
13+
f"Unable to import required dependency {_dependency}. "
14+
"Please see the traceback for details."
15+
) from _e
1416

15-
if _missing_dependencies: # pragma: no cover
16-
raise ImportError(
17-
"Unable to import required dependencies:\n" + "\n".join(_missing_dependencies)
18-
)
19-
del _hard_dependencies, _dependency, _missing_dependencies
17+
del _hard_dependencies, _dependency
2018

2119
try:
2220
# numpy compat

pandas/core/arrays/categorical.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ def __init__(
452452
if isinstance(values, Index):
453453
arr = values._data._pa_array.combine_chunks()
454454
else:
455-
arr = values._pa_array.combine_chunks()
455+
arr = extract_array(values)._pa_array.combine_chunks()
456456
categories = arr.dictionary.to_pandas(types_mapper=ArrowDtype)
457457
codes = arr.indices.to_numpy()
458458
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered)

pandas/core/groupby/groupby.py

Lines changed: 112 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3803,16 +3803,58 @@ def rolling(
38033803
)
38043804

38053805
@final
3806-
@Substitution(name="groupby")
3807-
@Appender(_common_see_also)
38083806
def expanding(self, *args, **kwargs) -> ExpandingGroupby:
38093807
"""
3810-
Return an expanding grouper, providing expanding
3811-
functionality per group.
3808+
Return an expanding grouper, providing expanding functionality per group.
3809+
3810+
Arguments are the same as `:meth:DataFrame.rolling` except that ``step`` cannot
3811+
be specified.
3812+
3813+
Parameters
3814+
----------
3815+
*args : tuple
3816+
Positional arguments passed to the expanding window constructor.
3817+
**kwargs : dict
3818+
Keyword arguments passed to the expanding window constructor.
38123819
38133820
Returns
38143821
-------
38153822
pandas.api.typing.ExpandingGroupby
3823+
An object that supports expanding transformations over each group.
3824+
3825+
See Also
3826+
--------
3827+
Series.expanding : Expanding transformations for Series.
3828+
DataFrame.expanding : Expanding transformations for DataFrames.
3829+
Series.groupby : Apply a function groupby to a Series.
3830+
DataFrame.groupby : Apply a function groupby.
3831+
3832+
Examples
3833+
--------
3834+
>>> df = pd.DataFrame(
3835+
... {
3836+
... "Class": ["A", "A", "A", "B", "B", "B"],
3837+
... "Value": [10, 20, 30, 40, 50, 60],
3838+
... }
3839+
... )
3840+
>>> df
3841+
Class Value
3842+
0 A 10
3843+
1 A 20
3844+
2 A 30
3845+
3 B 40
3846+
4 B 50
3847+
5 B 60
3848+
3849+
>>> df.groupby("Class").expanding().mean()
3850+
Value
3851+
Class
3852+
A 0 10.0
3853+
1 15.0
3854+
2 20.0
3855+
B 3 40.0
3856+
4 45.0
3857+
5 50.0
38163858
"""
38173859
from pandas.core.window import ExpandingGroupby
38183860

@@ -3824,15 +3866,79 @@ def expanding(self, *args, **kwargs) -> ExpandingGroupby:
38243866
)
38253867

38263868
@final
3827-
@Substitution(name="groupby")
3828-
@Appender(_common_see_also)
38293869
def ewm(self, *args, **kwargs) -> ExponentialMovingWindowGroupby:
38303870
"""
38313871
Return an ewm grouper, providing ewm functionality per group.
38323872
3873+
Parameters
3874+
----------
3875+
*args : tuple
3876+
Positional arguments passed to the EWM window constructor.
3877+
**kwargs : dict
3878+
Keyword arguments passed to the EWM window constructor, such as:
3879+
3880+
com : float, optional
3881+
Specify decay in terms of center of mass.
3882+
``span``, ``halflife``, and ``alpha`` are alternative ways to specify
3883+
decay.
3884+
span : float, optional
3885+
Specify decay in terms of span.
3886+
halflife : float, optional
3887+
Specify decay in terms of half-life.
3888+
alpha : float, optional
3889+
Specify smoothing factor directly.
3890+
min_periods : int, default 0
3891+
Minimum number of observations in the window required to have a value;
3892+
otherwise, result is ``np.nan``.
3893+
adjust : bool, default True
3894+
Divide by decaying adjustment factor to account for imbalance in
3895+
relative weights.
3896+
ignore_na : bool, default False
3897+
Ignore missing values when calculating weights.
3898+
times : str or array-like of datetime64, optional
3899+
Times corresponding to the observations.
3900+
axis : {0 or 'index', 1 or 'columns'}, default 0
3901+
Axis along which the EWM function is applied.
3902+
38333903
Returns
38343904
-------
38353905
pandas.api.typing.ExponentialMovingWindowGroupby
3906+
An object that supports exponentially weighted moving transformations over
3907+
each group.
3908+
3909+
See Also
3910+
--------
3911+
Series.ewm : EWM transformations for Series.
3912+
DataFrame.ewm : EWM transformations for DataFrames.
3913+
Series.groupby : Apply a function groupby to a Series.
3914+
DataFrame.groupby : Apply a function groupby.
3915+
3916+
Examples
3917+
--------
3918+
>>> df = pd.DataFrame(
3919+
... {
3920+
... "Class": ["A", "A", "A", "B", "B", "B"],
3921+
... "Value": [10, 20, 30, 40, 50, 60],
3922+
... }
3923+
... )
3924+
>>> df
3925+
Class Value
3926+
0 A 10
3927+
1 A 20
3928+
2 A 30
3929+
3 B 40
3930+
4 B 50
3931+
5 B 60
3932+
3933+
>>> df.groupby("Class").ewm(com=0.5).mean()
3934+
Value
3935+
Class
3936+
A 0 10.000000
3937+
1 17.500000
3938+
2 26.153846
3939+
B 3 40.000000
3940+
4 47.500000
3941+
5 56.153846
38363942
"""
38373943
from pandas.core.window import ExponentialMovingWindowGroupby
38383944

pandas/core/internals/blocks.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1679,6 +1679,8 @@ def where(self, other, cond) -> list[Block]:
16791679

16801680
try:
16811681
res_values = arr._where(cond, other).T
1682+
except OutOfBoundsDatetime:
1683+
raise
16821684
except (ValueError, TypeError):
16831685
if self.ndim == 1 or self.shape[0] == 1:
16841686
if isinstance(self.dtype, (IntervalDtype, StringDtype)):
@@ -1746,6 +1748,8 @@ def putmask(self, mask, new) -> list[Block]:
17461748
try:
17471749
# Caller is responsible for ensuring matching lengths
17481750
values._putmask(mask, new)
1751+
except OutOfBoundsDatetime:
1752+
raise
17491753
except (TypeError, ValueError):
17501754
if self.ndim == 1 or self.shape[0] == 1:
17511755
if isinstance(self.dtype, IntervalDtype):

pandas/core/reshape/reshape.py

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -963,7 +963,20 @@ def stack_v3(frame: DataFrame, level: list[int]) -> Series | DataFrame:
963963
[k for k in range(frame.columns.nlevels - 1, -1, -1) if k not in set_levels]
964964
)
965965

966-
result = stack_reshape(frame, level, set_levels, stack_cols)
966+
result: Series | DataFrame
967+
if not isinstance(frame.columns, MultiIndex):
968+
# GH#58817 Fast path when we're stacking the columns of a non-MultiIndex.
969+
# When columns are homogeneous EAs, we pass through object
970+
# dtype but this is still slightly faster than the normal path.
971+
if len(frame.columns) > 0 and frame._is_homogeneous_type:
972+
dtype = frame._mgr.blocks[0].dtype
973+
else:
974+
dtype = None
975+
result = frame._constructor_sliced(
976+
frame._values.reshape(-1, order="F"), dtype=dtype
977+
)
978+
else:
979+
result = stack_reshape(frame, level, set_levels, stack_cols)
967980

968981
# Construct the correct MultiIndex by combining the frame's index and
969982
# stacked columns.
@@ -1045,6 +1058,8 @@ def stack_reshape(
10451058
-------
10461059
The data of behind the stacked DataFrame.
10471060
"""
1061+
# non-MultIndex takes a fast path.
1062+
assert isinstance(frame.columns, MultiIndex)
10481063
# If we need to drop `level` from columns, it needs to be in descending order
10491064
drop_levnums = sorted(level, reverse=True)
10501065

@@ -1054,18 +1069,14 @@ def stack_reshape(
10541069
if len(frame.columns) == 1:
10551070
data = frame.copy(deep=False)
10561071
else:
1057-
if not isinstance(frame.columns, MultiIndex) and not isinstance(idx, tuple):
1058-
# GH#57750 - if the frame is an Index with tuples, .loc below will fail
1059-
column_indexer = idx
1060-
else:
1061-
# Take the data from frame corresponding to this idx value
1062-
if len(level) == 1:
1063-
idx = (idx,)
1064-
gen = iter(idx)
1065-
column_indexer = tuple(
1066-
next(gen) if k in set_levels else slice(None)
1067-
for k in range(frame.columns.nlevels)
1068-
)
1072+
# Take the data from frame corresponding to this idx value
1073+
if len(level) == 1:
1074+
idx = (idx,)
1075+
gen = iter(idx)
1076+
column_indexer = tuple(
1077+
next(gen) if k in set_levels else slice(None)
1078+
for k in range(frame.columns.nlevels)
1079+
)
10691080
data = frame.loc[:, column_indexer]
10701081

10711082
if len(level) < frame.columns.nlevels:

pandas/core/series.py

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@
5252
doc,
5353
set_module,
5454
)
55+
from pandas.util._exceptions import (
56+
find_stack_level,
57+
)
5558
from pandas.util._validators import (
5659
validate_ascending,
5760
validate_bool_kwarg,
@@ -4320,7 +4323,7 @@ def unstack(
43204323

43214324
def map(
43224325
self,
4323-
arg: Callable | Mapping | Series,
4326+
func: Callable | Mapping | Series | None = None,
43244327
na_action: Literal["ignore"] | None = None,
43254328
**kwargs,
43264329
) -> Series:
@@ -4333,8 +4336,8 @@ def map(
43334336
43344337
Parameters
43354338
----------
4336-
arg : function, collections.abc.Mapping subclass or Series
4337-
Mapping correspondence.
4339+
func : function, collections.abc.Mapping subclass or Series
4340+
Function or mapping correspondence.
43384341
na_action : {None, 'ignore'}, default None
43394342
If 'ignore', propagate NaN values, without passing them to the
43404343
mapping correspondence.
@@ -4404,9 +4407,22 @@ def map(
44044407
3 I am a rabbit
44054408
dtype: object
44064409
"""
4407-
if callable(arg):
4408-
arg = functools.partial(arg, **kwargs)
4409-
new_values = self._map_values(arg, na_action=na_action)
4410+
if func is None:
4411+
if "arg" in kwargs:
4412+
# `.map(arg=my_func)`
4413+
func = kwargs.pop("arg")
4414+
warnings.warn(
4415+
"The parameter `arg` has been renamed to `func`, and it "
4416+
"will stop being supported in a future version of pandas.",
4417+
FutureWarning,
4418+
stacklevel=find_stack_level(),
4419+
)
4420+
else:
4421+
raise ValueError("The `func` parameter is required")
4422+
4423+
if callable(func):
4424+
func = functools.partial(func, **kwargs)
4425+
new_values = self._map_values(func, na_action=na_action)
44104426
return self._constructor(new_values, index=self.index, copy=False).__finalize__(
44114427
self, method="map"
44124428
)

pandas/tests/extension/base/reshaping.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
import numpy as np
44
import pytest
55

6+
from pandas.core.dtypes.dtypes import NumpyEADtype
7+
68
import pandas as pd
79
import pandas._testing as tm
810
from pandas.api.extensions import ExtensionArray
@@ -266,7 +268,13 @@ def test_stack(self, data, columns, future_stack):
266268
expected = expected.astype(object)
267269

268270
if isinstance(expected, pd.Series):
269-
assert result.dtype == df.iloc[:, 0].dtype
271+
if future_stack and isinstance(data.dtype, NumpyEADtype):
272+
# GH#58817 future_stack=True constructs the result specifying the dtype
273+
# using the dtype of the input; we thus get the underlying
274+
# NumPy dtype as the result instead of the NumpyExtensionArray
275+
assert result.dtype == df.iloc[:, 0].to_numpy().dtype
276+
else:
277+
assert result.dtype == df.iloc[:, 0].dtype
270278
else:
271279
assert all(result.dtypes == df.iloc[:, 0].dtype)
272280

0 commit comments

Comments
 (0)