Skip to content

Commit ca1dfe6

Browse files
authored
Merge branch 'main' into pylint-48855-C-type-disallowed-name
2 parents b9d381b + a793802 commit ca1dfe6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+600
-811
lines changed

asv_bench/benchmarks/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ def time_different_python_functions_multicol(self, df):
310310
df.groupby(["key1", "key2"]).agg([sum, min, max])
311311

312312
def time_different_python_functions_singlecol(self, df):
313-
df.groupby("key1").agg([sum, min, max])
313+
df.groupby("key1")[["value1", "value2", "value3"]].agg([sum, min, max])
314314

315315

316316
class GroupStrings:

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ dependencies:
2525
- blosc=1.21.0
2626
- bottleneck=1.3.2
2727
- brotlipy=0.7.0
28-
- fastparquet=0.4.0
28+
- fastparquet=0.6.3
2929
- fsspec=2021.07.0
3030
- html5lib=1.1
3131
- hypothesis=6.13.0

doc/source/user_guide/basics.rst

-28
Original file line numberDiff line numberDiff line change
@@ -1039,34 +1039,6 @@ not noted for a particular column will be ``NaN``:
10391039
10401040
tsdf.agg({"A": ["mean", "min"], "B": "sum"})
10411041
1042-
.. _basics.aggregation.mixed_string:
1043-
1044-
Mixed dtypes
1045-
++++++++++++
1046-
1047-
.. deprecated:: 1.4.0
1048-
Attempting to determine which columns cannot be aggregated and silently dropping them from the results is deprecated and will be removed in a future version. If any porition of the columns or operations provided fail, the call to ``.agg`` will raise.
1049-
1050-
When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
1051-
aggregations. This is similar to how ``.groupby.agg`` works.
1052-
1053-
.. ipython:: python
1054-
1055-
mdf = pd.DataFrame(
1056-
{
1057-
"A": [1, 2, 3],
1058-
"B": [1.0, 2.0, 3.0],
1059-
"C": ["foo", "bar", "baz"],
1060-
"D": pd.date_range("20130101", periods=3),
1061-
}
1062-
)
1063-
mdf.dtypes
1064-
1065-
.. ipython:: python
1066-
:okwarning:
1067-
1068-
mdf.agg(["min", "sum"])
1069-
10701042
.. _basics.aggregation.custom_describe:
10711043

10721044
Custom describe

doc/source/user_guide/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,7 @@ functions:
10071007
.. ipython:: python
10081008
:okwarning:
10091009
1010-
grouped = df.groupby("A")
1010+
grouped = df.groupby("A")[["C", "D"]]
10111011
grouped.agg(lambda x: x.std())
10121012
10131013
But, it's rather verbose and can be untidy if you need to pass additional

doc/source/whatsnew/v0.20.0.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,13 @@ aggregations. This is similar to how groupby ``.agg()`` works. (:issue:`15015`)
104104
'D': pd.date_range('20130101', periods=3)})
105105
df.dtypes
106106
107-
.. ipython:: python
108-
:okwarning:
107+
.. code-block:: python
109108
110-
df.agg(['min', 'sum'])
109+
In [10]: df.agg(['min', 'sum'])
110+
Out[10]:
111+
A B C D
112+
min 1 1.0 bar 2013-01-01
113+
sum 6 6.0 foobarbaz NaT
111114
112115
.. _whatsnew_0200.enhancements.dataio_dtype:
113116

doc/source/whatsnew/v1.5.2.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,16 @@ including other versions of pandas.
1414
Fixed regressions
1515
~~~~~~~~~~~~~~~~~
1616
- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
17+
- Fixed regression in :meth:`DataFrame.plot` preventing :class:`~matplotlib.colors.Colormap` instance
18+
from being passed using the ``colormap`` argument if Matplotlib 3.6+ is used (:issue:`49374`)
1719
-
1820

1921
.. ---------------------------------------------------------------------------
2022
.. _whatsnew_152.bug_fixes:
2123

2224
Bug fixes
2325
~~~~~~~~~
24-
-
26+
- Bug in the Copy-on-Write implementation losing track of views in certain chained indexing cases (:issue:`48996`)
2527
-
2628

2729
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

+14-1
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,8 @@ Optional libraries below the lowest tested version may still work, but are not c
124124
+=================+=================+=========+
125125
| pyarrow | 6.0.0 | X |
126126
+-----------------+-----------------+---------+
127+
| fastparquet | 0.6.3 | X |
128+
+-----------------+-----------------+---------+
127129

128130
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
129131

@@ -142,6 +144,7 @@ Other API changes
142144
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
143145
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
144146
- :meth:`Series.unique` with dtype "timedelta64[ns]" or "datetime64[ns]" now returns :class:`TimedeltaArray` or :class:`DatetimeArray` instead of ``numpy.ndarray`` (:issue:`49176`)
147+
- Passing a sequence containing ``datetime`` objects and ``date`` objects to :class:`Series` constructor will return with ``object`` dtype instead of ``datetime64[ns]`` dtype, consistent with :class:`Index` behavior (:issue:`49341`)
145148
- Passing strings that cannot be parsed as datetimes to :class:`Series` or :class:`DataFrame` with ``dtype="datetime64[ns]"`` will raise instead of silently ignoring the keyword and returning ``object`` dtype (:issue:`24435`)
146149
-
147150

@@ -158,6 +161,7 @@ Deprecations
158161

159162
Removal of prior version deprecations/changes
160163
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164+
- Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`)
161165
- Removed deprecated module ``pandas.core.index`` (:issue:`30193`)
162166
- Removed deprecated :meth:`Categorical.to_dense`, use ``np.asarray(cat)`` instead (:issue:`32639`)
163167
- Removed deprecated :meth:`Categorical.take_nd` (:issue:`27745`)
@@ -208,7 +212,9 @@ Removal of prior version deprecations/changes
208212
- Removed argument ``inplace`` from :meth:`Categorical.remove_unused_categories` (:issue:`37918`)
209213
- Disallow passing non-round floats to :class:`Timestamp` with ``unit="M"`` or ``unit="Y"`` (:issue:`47266`)
210214
- Remove keywords ``convert_float`` and ``mangle_dupe_cols`` from :func:`read_excel` (:issue:`41176`)
215+
- Removed ``errors`` keyword from :meth:`DataFrame.where`, :meth:`Series.where`, :meth:`DataFrame.mask` and :meth:`Series.mask` (:issue:`47728`)
211216
- Disallow passing non-keyword arguments to :func:`read_excel` except ``io`` and ``sheet_name`` (:issue:`34418`)
217+
- Disallow passing non-keyword arguments to :meth:`StringMethods.split` and :meth:`StringMethods.rsplit` except for ``pat`` (:issue:`47448`)
212218
- Disallow passing non-keyword arguments to :meth:`DataFrame.set_index` except ``keys`` (:issue:`41495`)
213219
- Disallow passing non-keyword arguments to :meth:`Resampler.interpolate` except ``method`` (:issue:`41699`)
214220
- Disallow passing non-keyword arguments to :meth:`DataFrame.reset_index` and :meth:`Series.reset_index` except ``level`` (:issue:`41496`)
@@ -225,6 +231,7 @@ Removal of prior version deprecations/changes
225231
- Disallow passing non-keyword arguments to :func:`concat` except for ``objs`` (:issue:`41485`)
226232
- Disallow passing non-keyword arguments to :func:`pivot` except for ``data`` (:issue:`48301`)
227233
- Disallow passing non-keyword arguments to :meth:`DataFrame.pivot` (:issue:`48301`)
234+
- Disallow passing non-keyword arguments to :func:`read_html` except for ``io`` (:issue:`27573`)
228235
- Disallow passing non-keyword arguments to :func:`read_json` except for ``path_or_buf`` (:issue:`27573`)
229236
- Disallow passing non-keyword arguments to :func:`read_sas` except for ``filepath_or_buffer`` (:issue:`47154`)
230237
- Disallow passing non-keyword arguments to :func:`read_stata` except for ``filepath_or_buffer`` (:issue:`48128`)
@@ -270,6 +277,9 @@ Removal of prior version deprecations/changes
270277
- Enforced disallowing a string column label into ``times`` in :meth:`DataFrame.ewm` (:issue:`43265`)
271278
- Enforced disallowing a tuple of column labels into :meth:`.DataFrameGroupBy.__getitem__` (:issue:`30546`)
272279
- Enforced disallowing setting values with ``.loc`` using a positional slice. Use ``.loc`` with labels or ``.iloc`` with positions instead (:issue:`31840`)
280+
- Enforced disallowing ``dict`` or ``set`` objects in ``suffixes`` in :func:`merge` (:issue:`34810`)
281+
- Enforced disallowing :func:`merge` to produce duplicated columns through the ``suffixes`` keyword and already existing columns (:issue:`22818`)
282+
- Enforced disallowing using :func:`merge` or :func:`join` on a different number of levels (:issue:`34862`)
273283
- Removed setting Categorical._codes directly (:issue:`41429`)
274284
- Removed setting Categorical.categories directly (:issue:`47834`)
275285
- Removed argument ``inplace`` from :meth:`Categorical.add_categories`, :meth:`Categorical.remove_categories`, :meth:`Categorical.set_categories`, :meth:`Categorical.rename_categories`, :meth:`Categorical.reorder_categories`, :meth:`Categorical.set_ordered`, :meth:`Categorical.as_ordered`, :meth:`Categorical.as_unordered` (:issue:`37981`, :issue:`41118`, :issue:`41133`, :issue:`47834`)
@@ -282,15 +292,18 @@ Removal of prior version deprecations/changes
282292
- Removed the deprecated method ``tshift`` from pandas classes (:issue:`11631`)
283293
- Changed behavior of empty data passed into :class:`Series`; the default dtype will be ``object`` instead of ``float64`` (:issue:`29405`)
284294
- Changed the behavior of :func:`to_datetime` with argument "now" with ``utc=False`` to match ``Timestamp("now")`` (:issue:`18705`)
295+
- Changed behavior of :meth:`SparseArray.astype` when given a dtype that is not explicitly ``SparseDtype``, cast to the exact requested dtype rather than silently using a ``SparseDtype`` instead (:issue:`34457`)
285296
- Changed behavior of :class:`DataFrame` constructor given floating-point ``data`` and an integer ``dtype``, when the data cannot be cast losslessly, the floating point dtype is retained, matching :class:`Series` behavior (:issue:`41170`)
286297
- Changed behavior of :class:`DataFrame` constructor when passed a ``dtype`` (other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (:issue:`41733`)
287298
- Changed the behavior of :class:`Series` constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (:issue:`41731`)
299+
- Changed behavior of :class:`Timestamp` constructor with a ``np.datetime64`` object and a ``tz`` passed to interpret the input as a wall-time as opposed to a UTC time (:issue:`42288`)
288300
- Changed behavior of :class:`Index` constructor when passed a ``SparseArray`` or ``SparseDtype`` to retain that dtype instead of casting to ``numpy.ndarray`` (:issue:`43930`)
301+
- Changed behavior of :class:`Index`, :class:`Series`, :class:`DataFrame` constructors with floating-dtype data and a :class:`DatetimeTZDtype`, the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated (:issue:`45573`)
289302
- Removed the deprecated ``base`` and ``loffset`` arguments from :meth:`pandas.DataFrame.resample`, :meth:`pandas.Series.resample` and :class:`pandas.Grouper`. Use ``offset`` or ``origin`` instead (:issue:`31809`)
290303
- Changed behavior of :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``; object-dtype columns with all-bool values will no longer be included, manually cast to ``bool`` dtype first (:issue:`46188`)
291304
- Changed behavior of comparison of a :class:`Timestamp` with a ``datetime.date`` object; these now compare as un-equal and raise on inequality comparisons, matching the ``datetime.datetime`` behavior (:issue:`36131`)
292305
- Enforced deprecation of silently dropping columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a list or dictionary (:issue:`43740`)
293-
-
306+
- Change behavior of :meth:`DataFrame.apply` with list-like so that any partial failure will raise an error (:issue:`43740`)
294307

295308
.. ---------------------------------------------------------------------------
296309
.. _whatsnew_200.performance:

pandas/_libs/internals.pyx

+9-3
Original file line numberDiff line numberDiff line change
@@ -676,8 +676,9 @@ cdef class BlockManager:
676676
public bint _known_consolidated, _is_consolidated
677677
public ndarray _blknos, _blklocs
678678
public list refs
679+
public object parent
679680

680-
def __cinit__(self, blocks=None, axes=None, refs=None, verify_integrity=True):
681+
def __cinit__(self, blocks=None, axes=None, refs=None, parent=None, verify_integrity=True):
681682
# None as defaults for unpickling GH#42345
682683
if blocks is None:
683684
# This adds 1-2 microseconds to DataFrame(np.array([]))
@@ -690,6 +691,7 @@ cdef class BlockManager:
690691
self.blocks = blocks
691692
self.axes = axes.copy() # copy to make sure we are not remotely-mutable
692693
self.refs = refs
694+
self.parent = parent
693695

694696
# Populate known_consolidate, blknos, and blklocs lazily
695697
self._known_consolidated = False
@@ -805,7 +807,9 @@ cdef class BlockManager:
805807
nrefs.append(weakref.ref(blk))
806808

807809
new_axes = [self.axes[0], self.axes[1]._getitem_slice(slobj)]
808-
mgr = type(self)(tuple(nbs), new_axes, nrefs, verify_integrity=False)
810+
mgr = type(self)(
811+
tuple(nbs), new_axes, nrefs, parent=self, verify_integrity=False
812+
)
809813

810814
# We can avoid having to rebuild blklocs/blknos
811815
blklocs = self._blklocs
@@ -827,4 +831,6 @@ cdef class BlockManager:
827831
new_axes = list(self.axes)
828832
new_axes[axis] = new_axes[axis]._getitem_slice(slobj)
829833

830-
return type(self)(tuple(new_blocks), new_axes, new_refs, verify_integrity=False)
834+
return type(self)(
835+
tuple(new_blocks), new_axes, new_refs, parent=self, verify_integrity=False
836+
)

pandas/_libs/lib.pyx

+13-5
Original file line numberDiff line numberDiff line change
@@ -1640,8 +1640,11 @@ def infer_datetimelike_array(arr: ndarray[object]) -> tuple[str, bool]:
16401640
return "interval"
16411641
return "mixed"
16421642

1643-
if seen_date and not (seen_datetime or seen_timedelta):
1644-
return "date"
1643+
if seen_date:
1644+
if not seen_datetime and not seen_timedelta:
1645+
return "date"
1646+
return "mixed"
1647+
16451648
elif seen_datetime and not seen_timedelta:
16461649
return "datetime"
16471650
elif seen_timedelta and not seen_datetime:
@@ -2570,10 +2573,15 @@ def maybe_convert_objects(ndarray[object] objects,
25702573
if seen.datetimetz_:
25712574
if is_datetime_with_singletz_array(objects):
25722575
from pandas import DatetimeIndex
2573-
dti = DatetimeIndex(objects)
25742576

2575-
# unbox to DatetimeArray
2576-
return dti._data
2577+
try:
2578+
dti = DatetimeIndex(objects)
2579+
except OutOfBoundsDatetime:
2580+
# e.g. test_to_datetime_cache_coerce_50_lines_outofbounds
2581+
pass
2582+
else:
2583+
# unbox to DatetimeArray
2584+
return dti._data
25772585
seen.object_ = True
25782586

25792587
elif seen.datetime_:

pandas/_libs/tslibs/timestamps.pyx

+2-11
Original file line numberDiff line numberDiff line change
@@ -1637,18 +1637,9 @@ class Timestamp(_Timestamp):
16371637

16381638
tzobj = maybe_get_tz(tz)
16391639
if tzobj is not None and is_datetime64_object(ts_input):
1640-
# GH#24559, GH#42288 In the future we will treat datetime64 as
1640+
# GH#24559, GH#42288 As of 2.0 we treat datetime64 as
16411641
# wall-time (consistent with DatetimeIndex)
1642-
warnings.warn(
1643-
"In a future version, when passing a np.datetime64 object and "
1644-
"a timezone to Timestamp, the datetime64 will be interpreted "
1645-
"as a wall time, not a UTC time. To interpret as a UTC time, "
1646-
"use `Timestamp(dt64).tz_localize('UTC').tz_convert(tz)`",
1647-
FutureWarning,
1648-
stacklevel=find_stack_level(),
1649-
)
1650-
# Once this deprecation is enforced, we can do
1651-
# return Timestamp(ts_input).tz_localize(tzobj)
1642+
return cls(ts_input).tz_localize(tzobj)
16521643

16531644
if nanosecond is None:
16541645
nanosecond = 0

pandas/compat/_optional.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
"blosc": "1.21.0",
1717
"bottleneck": "1.3.2",
1818
"brotli": "0.7.0",
19-
"fastparquet": "0.4.0",
19+
"fastparquet": "0.6.3",
2020
"fsspec": "2021.07.0",
2121
"html5lib": "1.1",
2222
"hypothesis": "6.13.0",

0 commit comments

Comments
 (0)