Skip to content

Commit 762dc81

Browse files
committed
Merge remote-tracking branch 'upstream/master' into deprecate-week-of-year
2 parents 734ae8f + 3912a38 commit 762dc81

26 files changed

+1180
-879
lines changed

doc/source/reference/groupby.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,10 @@ Function application
3636

3737
GroupBy.apply
3838
GroupBy.agg
39-
GroupBy.aggregate
40-
GroupBy.transform
39+
SeriesGroupBy.aggregate
40+
DataFrameGroupBy.aggregate
41+
SeriesGroupBy.transform
42+
DataFrameGroupBy.transform
4143
GroupBy.pipe
4244

4345
Computations / descriptive stats

doc/source/whatsnew/v1.1.0.rst

+4
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ Other enhancements
139139
- The :meth:`DataFrame.to_feather` method now supports additional keyword
140140
arguments (e.g. to set the compression) that are added in pyarrow 0.17
141141
(:issue:`33422`).
142+
- The :func:`cut` will now accept parameter ``ordered`` with default ``ordered=True``. If ``ordered=False`` and no labels are provided, an error will be raised (:issue:`33141`)
142143
- :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`,
143144
and :meth:`DataFrame.to_json` now support passing a dict of
144145
compression arguments when using the ``gzip`` and ``bz2`` protocols.
@@ -547,6 +548,7 @@ Datetimelike
547548
- Bug in :meth:`DatetimeIndex.tz_localize` incorrectly retaining ``freq`` in some cases where the original freq is no longer valid (:issue:`30511`)
548549
- Bug in :meth:`DatetimeIndex.intersection` losing ``freq`` and timezone in some cases (:issue:`33604`)
549550
- Bug in :class:`DatetimeIndex` addition and subtraction with some types of :class:`DateOffset` objects incorrectly retaining an invalid ``freq`` attribute (:issue:`33779`)
551+
- Bug in :class:`DatetimeIndex` where setting the ``freq`` attribute on an index could silently change the ``freq`` attribute on another index viewing the same data (:issue:`33552`)
550552

551553
Timedelta
552554
^^^^^^^^^
@@ -573,6 +575,7 @@ Numeric
573575
- Bug in :meth:`DataFrame.count` with ``level="foo"`` and index level ``"foo"`` containing NaNs causes segmentation fault (:issue:`21824`)
574576
- Bug in :meth:`DataFrame.diff` with ``axis=1`` returning incorrect results with mixed dtypes (:issue:`32995`)
575577
- Bug in :meth:`DataFrame.corr` and :meth:`DataFrame.cov` raising when handling nullable integer columns with ``pandas.NA`` (:issue:`33803`)
578+
- Bug in :class:`DataFrame` and :class:`Series` addition and subtraction between object-dtype objects and ``datetime64`` dtype objects (:issue:`33824`)
576579

577580
Conversion
578581
^^^^^^^^^^
@@ -723,6 +726,7 @@ Reshaping
723726
- Bug in :meth:`concat` where when passing a non-dict mapping as ``objs`` would raise a ``TypeError`` (:issue:`32863`)
724727
- :meth:`DataFrame.agg` now provides more descriptive ``SpecificationError`` message when attempting to aggregating non-existant column (:issue:`32755`)
725728
- Bug in :meth:`DataFrame.unstack` when MultiIndexed columns and MultiIndexed rows were used (:issue:`32624`, :issue:`24729` and :issue:`28306`)
729+
- Bug in :func:`cut` raised an error when non-unique labels (:issue:`33141`)
726730

727731

728732
Sparse

pandas/core/array_algos/transforms.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,8 @@
1010
def shift(values: np.ndarray, periods: int, axis: int, fill_value) -> np.ndarray:
1111
new_values = values
1212

13-
if periods == 0:
14-
# TODO: should we copy here?
15-
return new_values
13+
if periods == 0 or values.size == 0:
14+
return new_values.copy()
1615

1716
# make sure array sent to np.roll is c_contiguous
1817
f_ordered = values.flags.f_contiguous

pandas/core/arrays/categorical.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1196,7 +1196,7 @@ def shift(self, periods, fill_value=None):
11961196

11971197
fill_value = self._validate_fill_value(fill_value)
11981198

1199-
codes = shift(codes.copy(), periods, axis=0, fill_value=fill_value)
1199+
codes = shift(codes, periods, axis=0, fill_value=fill_value)
12001200

12011201
return self._constructor(codes, dtype=self.dtype, fastpath=True)
12021202

pandas/core/arrays/datetimelike.py

+33-14
Original file line numberDiff line numberDiff line change
@@ -699,8 +699,6 @@ def _values_for_argsort(self):
699699

700700
@Appender(ExtensionArray.shift.__doc__)
701701
def shift(self, periods=1, fill_value=None, axis=0):
702-
if not self.size or periods == 0:
703-
return self.copy()
704702

705703
fill_value = self._validate_shift_value(fill_value)
706704
new_values = shift(self._data, periods, axis, fill_value)
@@ -742,10 +740,12 @@ def _validate_fill_value(self, fill_value):
742740
return fill_value
743741

744742
def _validate_shift_value(self, fill_value):
745-
# TODO(2.0): once this deprecation is enforced, used _validate_fill_value
743+
# TODO(2.0): once this deprecation is enforced, use _validate_fill_value
746744
if is_valid_nat_for_dtype(fill_value, self.dtype):
747745
fill_value = NaT
748-
elif not isinstance(fill_value, self._recognized_scalars):
746+
elif isinstance(fill_value, self._recognized_scalars):
747+
fill_value = self._scalar_type(fill_value)
748+
else:
749749
# only warn if we're not going to raise
750750
if self._scalar_type is Period and lib.is_integer(fill_value):
751751
# kludge for #31971 since Period(integer) tries to cast to str
@@ -782,6 +782,9 @@ def _validate_searchsorted_value(self, value):
782782
elif isinstance(value, self._recognized_scalars):
783783
value = self._scalar_type(value)
784784

785+
elif isinstance(value, type(self)):
786+
pass
787+
785788
elif is_list_like(value) and not isinstance(value, type(self)):
786789
value = array(value)
787790

@@ -791,7 +794,7 @@ def _validate_searchsorted_value(self, value):
791794
f"not {type(value).__name__}"
792795
)
793796

794-
if not (isinstance(value, (self._scalar_type, type(self))) or (value is NaT)):
797+
else:
795798
raise TypeError(f"Unexpected type for 'value': {type(value)}")
796799

797800
if isinstance(value, type(self)):
@@ -803,25 +806,41 @@ def _validate_searchsorted_value(self, value):
803806
return value
804807

805808
def _validate_setitem_value(self, value):
806-
if lib.is_scalar(value) and not isna(value):
807-
value = com.maybe_box_datetimelike(value)
808809

809810
if is_list_like(value):
810-
value = type(self)._from_sequence(value, dtype=self.dtype)
811-
self._check_compatible_with(value, setitem=True)
812-
value = value.asi8
813-
elif isinstance(value, self._scalar_type):
814-
self._check_compatible_with(value, setitem=True)
815-
value = self._unbox_scalar(value)
811+
value = array(value)
812+
if is_dtype_equal(value.dtype, "string"):
813+
# We got a StringArray
814+
try:
815+
# TODO: Could use from_sequence_of_strings if implemented
816+
# Note: passing dtype is necessary for PeriodArray tests
817+
value = type(self)._from_sequence(value, dtype=self.dtype)
818+
except ValueError:
819+
pass
820+
821+
if not type(self)._is_recognized_dtype(value):
822+
raise TypeError(
823+
"setitem requires compatible dtype or scalar, "
824+
f"not {type(value).__name__}"
825+
)
826+
827+
elif isinstance(value, self._recognized_scalars):
828+
value = self._scalar_type(value)
816829
elif is_valid_nat_for_dtype(value, self.dtype):
817-
value = iNaT
830+
value = NaT
818831
else:
819832
msg = (
820833
f"'value' should be a '{self._scalar_type.__name__}', 'NaT', "
821834
f"or array of those. Got '{type(value).__name__}' instead."
822835
)
823836
raise TypeError(msg)
824837

838+
self._check_compatible_with(value, setitem=True)
839+
if isinstance(value, type(self)):
840+
value = value.asi8
841+
else:
842+
value = self._unbox_scalar(value)
843+
825844
return value
826845

827846
def _validate_insert_value(self, value):

pandas/core/groupby/generic.py

+10-40
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,11 @@
6363
import pandas.core.common as com
6464
from pandas.core.construction import create_series_with_explicit_dtype
6565
from pandas.core.frame import DataFrame
66-
from pandas.core.generic import ABCDataFrame, ABCSeries, NDFrame, _shared_docs
66+
from pandas.core.generic import ABCDataFrame, ABCSeries, NDFrame
6767
from pandas.core.groupby import base
6868
from pandas.core.groupby.groupby import (
6969
GroupBy,
70+
_agg_template,
7071
_apply_docs,
7172
_transform_template,
7273
get_groupby,
@@ -177,16 +178,6 @@ def _selection_name(self):
177178
else:
178179
return self._selection
179180

180-
_agg_see_also_doc = dedent(
181-
"""
182-
See Also
183-
--------
184-
pandas.Series.groupby.apply
185-
pandas.Series.groupby.transform
186-
pandas.Series.aggregate
187-
"""
188-
)
189-
190181
_agg_examples_doc = dedent(
191182
"""
192183
Examples
@@ -224,8 +215,7 @@ def _selection_name(self):
224215
... )
225216
minimum maximum
226217
1 1 2
227-
2 3 4
228-
"""
218+
2 3 4"""
229219
)
230220

231221
@Appender(
@@ -237,13 +227,9 @@ def apply(self, func, *args, **kwargs):
237227
return super().apply(func, *args, **kwargs)
238228

239229
@Substitution(
240-
see_also=_agg_see_also_doc,
241-
examples=_agg_examples_doc,
242-
versionadded="",
243-
klass="Series",
244-
axis="",
230+
examples=_agg_examples_doc, klass="Series",
245231
)
246-
@Appender(_shared_docs["aggregate"])
232+
@Appender(_agg_template)
247233
def aggregate(
248234
self, func=None, *args, engine="cython", engine_kwargs=None, **kwargs
249235
):
@@ -476,7 +462,7 @@ def _aggregate_named(self, func, *args, **kwargs):
476462

477463
return result
478464

479-
@Substitution(klass="Series", selected="A.")
465+
@Substitution(klass="Series")
480466
@Appender(_transform_template)
481467
def transform(self, func, *args, engine="cython", engine_kwargs=None, **kwargs):
482468
func = self._get_cython_func(func) or func
@@ -854,16 +840,6 @@ class DataFrameGroupBy(GroupBy[DataFrame]):
854840

855841
_apply_whitelist = base.dataframe_apply_whitelist
856842

857-
_agg_see_also_doc = dedent(
858-
"""
859-
See Also
860-
--------
861-
pandas.DataFrame.groupby.apply
862-
pandas.DataFrame.groupby.transform
863-
pandas.DataFrame.aggregate
864-
"""
865-
)
866-
867843
_agg_examples_doc = dedent(
868844
"""
869845
Examples
@@ -928,26 +904,20 @@ class DataFrameGroupBy(GroupBy[DataFrame]):
928904
1 1 0.590715
929905
2 3 0.704907
930906
931-
932907
- The keywords are the *output* column names
933908
- The values are tuples whose first element is the column to select
934909
and the second element is the aggregation to apply to that column.
935910
Pandas provides the ``pandas.NamedAgg`` namedtuple with the fields
936911
``['column', 'aggfunc']`` to make it clearer what the arguments are.
937912
As usual, the aggregation can be a callable or a string alias.
938913
939-
See :ref:`groupby.aggregate.named` for more.
940-
"""
914+
See :ref:`groupby.aggregate.named` for more."""
941915
)
942916

943917
@Substitution(
944-
see_also=_agg_see_also_doc,
945-
examples=_agg_examples_doc,
946-
versionadded="",
947-
klass="DataFrame",
948-
axis="",
918+
examples=_agg_examples_doc, klass="DataFrame",
949919
)
950-
@Appender(_shared_docs["aggregate"])
920+
@Appender(_agg_template)
951921
def aggregate(
952922
self, func=None, *args, engine="cython", engine_kwargs=None, **kwargs
953923
):
@@ -1467,7 +1437,7 @@ def _transform_general(
14671437
concatenated = concatenated.reindex(concat_index, axis=other_axis, copy=False)
14681438
return self._set_result_index_ordered(concatenated)
14691439

1470-
@Substitution(klass="DataFrame", selected="")
1440+
@Substitution(klass="DataFrame")
14711441
@Appender(_transform_template)
14721442
def transform(self, func, *args, engine="cython", engine_kwargs=None, **kwargs):
14731443

pandas/core/groupby/groupby.py

+73-4
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,9 @@ class providing the base-class of operations.
291291
292292
See Also
293293
--------
294-
aggregate, transform
294+
%(klass)s.groupby.apply
295+
%(klass)s.groupby.aggregate
296+
%(klass)s.transform
295297
296298
Notes
297299
-----
@@ -310,14 +312,17 @@ class providing the base-class of operations.
310312
* f must not mutate groups. Mutation is not supported and may
311313
produce unexpected results.
312314
315+
When using ``engine='numba'``, there will be no "fall back" behavior internally.
316+
The group data and group index will be passed as numpy arrays to the JITed
317+
user defined function, and no alternative execution attempts will be tried.
318+
313319
Examples
314320
--------
315321
316-
# Same shape
317322
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
318323
... 'foo', 'bar'],
319324
... 'B' : ['one', 'one', 'two', 'three',
320-
... 'two', 'two'],
325+
... 'two', 'two'],
321326
... 'C' : [1, 5, 5, 2, 5, 5],
322327
... 'D' : [2.0, 5., 8., 1., 2., 9.]})
323328
>>> grouped = df.groupby('A')
@@ -330,7 +335,8 @@ class providing the base-class of operations.
330335
4 0.577350 -0.577350
331336
5 0.577350 1.000000
332337
333-
# Broadcastable
338+
Broadcast result of the transformation
339+
334340
>>> grouped.transform(lambda x: x.max() - x.min())
335341
C D
336342
0 4 6.0
@@ -341,6 +347,69 @@ class providing the base-class of operations.
341347
5 3 8.0
342348
"""
343349

350+
_agg_template = """
351+
Aggregate using one or more operations over the specified axis.
352+
353+
Parameters
354+
----------
355+
func : function, str, list or dict
356+
Function to use for aggregating the data. If a function, must either
357+
work when passed a %(klass)s or when passed to %(klass)s.apply.
358+
359+
Accepted combinations are:
360+
361+
- function
362+
- string function name
363+
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
364+
- dict of axis labels -> functions, function names or list of such.
365+
366+
Can also accept a Numba JIT function with
367+
``engine='numba'`` specified.
368+
369+
If the ``'numba'`` engine is chosen, the function must be
370+
a user defined function with ``values`` and ``index`` as the
371+
first and second arguments respectively in the function signature.
372+
Each group's index will be passed to the user defined function
373+
and optionally available for use.
374+
375+
.. versionchanged:: 1.1.0
376+
*args
377+
Positional arguments to pass to func
378+
engine : str, default 'cython'
379+
* ``'cython'`` : Runs the function through C-extensions from cython.
380+
* ``'numba'`` : Runs the function through JIT compiled code from numba.
381+
382+
.. versionadded:: 1.1.0
383+
engine_kwargs : dict, default None
384+
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
385+
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
386+
and ``parallel`` dictionary keys. The values must either be ``True`` or
387+
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
388+
``{'nopython': True, 'nogil': False, 'parallel': False}`` and will be
389+
applied to the function
390+
391+
.. versionadded:: 1.1.0
392+
**kwargs
393+
Keyword arguments to be passed into func.
394+
395+
Returns
396+
-------
397+
%(klass)s
398+
399+
See Also
400+
--------
401+
%(klass)s.groupby.apply
402+
%(klass)s.groupby.transform
403+
%(klass)s.aggregate
404+
405+
Notes
406+
-----
407+
When using ``engine='numba'``, there will be no "fall back" behavior internally.
408+
The group data and group index will be passed as numpy arrays to the JITed
409+
user defined function, and no alternative execution attempts will be tried.
410+
%(examples)s
411+
"""
412+
344413

345414
class GroupByPlot(PandasObject):
346415
"""

0 commit comments

Comments
 (0)