Skip to content

Commit 4f95fb2

Browse files
authored
Merge branch 'pandas-dev:main' into issue-48949
2 parents 2d272c9 + dec9be2 commit 4f95fb2

File tree

92 files changed

+725
-990
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+725
-990
lines changed

asv_bench/benchmarks/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ def time_different_python_functions_multicol(self, df):
310310
df.groupby(["key1", "key2"]).agg([sum, min, max])
311311

312312
def time_different_python_functions_singlecol(self, df):
313-
df.groupby("key1").agg([sum, min, max])
313+
df.groupby("key1")[["value1", "value2", "value3"]].agg([sum, min, max])
314314

315315

316316
class GroupStrings:

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ dependencies:
2525
- blosc=1.21.0
2626
- bottleneck=1.3.2
2727
- brotlipy=0.7.0
28-
- fastparquet=0.4.0
28+
- fastparquet=0.6.3
2929
- fsspec=2021.07.0
3030
- html5lib=1.1
3131
- hypothesis=6.13.0

doc/source/user_guide/basics.rst

-28
Original file line numberDiff line numberDiff line change
@@ -1039,34 +1039,6 @@ not noted for a particular column will be ``NaN``:
10391039
10401040
tsdf.agg({"A": ["mean", "min"], "B": "sum"})
10411041
1042-
.. _basics.aggregation.mixed_string:
1043-
1044-
Mixed dtypes
1045-
++++++++++++
1046-
1047-
.. deprecated:: 1.4.0
1048-
Attempting to determine which columns cannot be aggregated and silently dropping them from the results is deprecated and will be removed in a future version. If any porition of the columns or operations provided fail, the call to ``.agg`` will raise.
1049-
1050-
When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid
1051-
aggregations. This is similar to how ``.groupby.agg`` works.
1052-
1053-
.. ipython:: python
1054-
1055-
mdf = pd.DataFrame(
1056-
{
1057-
"A": [1, 2, 3],
1058-
"B": [1.0, 2.0, 3.0],
1059-
"C": ["foo", "bar", "baz"],
1060-
"D": pd.date_range("20130101", periods=3),
1061-
}
1062-
)
1063-
mdf.dtypes
1064-
1065-
.. ipython:: python
1066-
:okwarning:
1067-
1068-
mdf.agg(["min", "sum"])
1069-
10701042
.. _basics.aggregation.custom_describe:
10711043

10721044
Custom describe

doc/source/user_guide/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,7 @@ functions:
10071007
.. ipython:: python
10081008
:okwarning:
10091009
1010-
grouped = df.groupby("A")
1010+
grouped = df.groupby("A")[["C", "D"]]
10111011
grouped.agg(lambda x: x.std())
10121012
10131013
But, it's rather verbose and can be untidy if you need to pass additional

doc/source/user_guide/options.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ displayed when calling :meth:`~pandas.DataFrame.info`.
249249
``display.max_info_rows``: :meth:`~pandas.DataFrame.info` will usually show null-counts for each column.
250250
For a large :class:`DataFrame`, this can be quite slow. ``max_info_rows`` and ``max_info_cols``
251251
limit this null check to the specified rows and columns respectively. The :meth:`~pandas.DataFrame.info`
252-
keyword argument ``null_counts=True`` will override this.
252+
keyword argument ``show_counts=True`` will override this.
253253

254254
.. ipython:: python
255255

doc/source/whatsnew/v0.13.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,7 @@ Enhancements
733733
734734
.. _scipy: http://www.scipy.org
735735
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
736-
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
736+
.. _guide: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html
737737

738738
- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
739739
output datetime objects should be formatted. Datetimes encountered in the

doc/source/whatsnew/v0.20.0.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,13 @@ aggregations. This is similar to how groupby ``.agg()`` works. (:issue:`15015`)
104104
'D': pd.date_range('20130101', periods=3)})
105105
df.dtypes
106106
107-
.. ipython:: python
108-
:okwarning:
107+
.. code-block:: python
109108
110-
df.agg(['min', 'sum'])
109+
In [10]: df.agg(['min', 'sum'])
110+
Out[10]:
111+
A B C D
112+
min 1 1.0 bar 2013-01-01
113+
sum 6 6.0 foobarbaz NaT
111114
112115
.. _whatsnew_0200.enhancements.dataio_dtype:
113116

doc/source/whatsnew/v1.5.2.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,16 @@ including other versions of pandas.
1414
Fixed regressions
1515
~~~~~~~~~~~~~~~~~
1616
- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
17+
- Fixed regression in :meth:`DataFrame.plot` preventing :class:`~matplotlib.colors.Colormap` instance
18+
from being passed using the ``colormap`` argument if Matplotlib 3.6+ is used (:issue:`49374`)
1719
-
1820

1921
.. ---------------------------------------------------------------------------
2022
.. _whatsnew_152.bug_fixes:
2123

2224
Bug fixes
2325
~~~~~~~~~
24-
-
26+
- Bug in the Copy-on-Write implementation losing track of views in certain chained indexing cases (:issue:`48996`)
2527
-
2628

2729
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

+15-1
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,8 @@ Optional libraries below the lowest tested version may still work, but are not c
124124
+=================+=================+=========+
125125
| pyarrow | 6.0.0 | X |
126126
+-----------------+-----------------+---------+
127+
| fastparquet | 0.6.3 | X |
128+
+-----------------+-----------------+---------+
127129

128130
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
129131

@@ -142,6 +144,7 @@ Other API changes
142144
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
143145
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
144146
- :meth:`Series.unique` with dtype "timedelta64[ns]" or "datetime64[ns]" now returns :class:`TimedeltaArray` or :class:`DatetimeArray` instead of ``numpy.ndarray`` (:issue:`49176`)
147+
- Passing a sequence containing ``datetime`` objects and ``date`` objects to :class:`Series` constructor will return with ``object`` dtype instead of ``datetime64[ns]`` dtype, consistent with :class:`Index` behavior (:issue:`49341`)
145148
- Passing strings that cannot be parsed as datetimes to :class:`Series` or :class:`DataFrame` with ``dtype="datetime64[ns]"`` will raise instead of silently ignoring the keyword and returning ``object`` dtype (:issue:`24435`)
146149
-
147150

@@ -158,6 +161,7 @@ Deprecations
158161

159162
Removal of prior version deprecations/changes
160163
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164+
- Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`)
161165
- Removed deprecated module ``pandas.core.index`` (:issue:`30193`)
162166
- Removed deprecated :meth:`Categorical.to_dense`, use ``np.asarray(cat)`` instead (:issue:`32639`)
163167
- Removed deprecated :meth:`Categorical.take_nd` (:issue:`27745`)
@@ -181,6 +185,7 @@ Removal of prior version deprecations/changes
181185
- Removed deprecated :meth:`.Styler.where` (:issue:`49397`)
182186
- Removed deprecated :meth:`.Styler.render` (:issue:`49397`)
183187
- Removed deprecated argument ``null_color`` in :meth:`.Styler.highlight_null` (:issue:`49397`)
188+
- Removed deprecated ``null_counts`` argument in :meth:`DataFrame.info`. Use ``show_counts`` instead (:issue:`37999`)
184189
- Enforced deprecation disallowing passing a timezone-aware :class:`Timestamp` and ``dtype="datetime64[ns]"`` to :class:`Series` or :class:`DataFrame` constructors (:issue:`41555`)
185190
- Enforced deprecation disallowing passing a sequence of timezone-aware values and ``dtype="datetime64[ns]"`` to to :class:`Series` or :class:`DataFrame` constructors (:issue:`41555`)
186191
- Enforced deprecation disallowing unit-less "datetime64" dtype in :meth:`Series.astype` and :meth:`DataFrame.astype` (:issue:`47844`)
@@ -207,7 +212,9 @@ Removal of prior version deprecations/changes
207212
- Removed argument ``inplace`` from :meth:`Categorical.remove_unused_categories` (:issue:`37918`)
208213
- Disallow passing non-round floats to :class:`Timestamp` with ``unit="M"`` or ``unit="Y"`` (:issue:`47266`)
209214
- Remove keywords ``convert_float`` and ``mangle_dupe_cols`` from :func:`read_excel` (:issue:`41176`)
215+
- Removed ``errors`` keyword from :meth:`DataFrame.where`, :meth:`Series.where`, :meth:`DataFrame.mask` and :meth:`Series.mask` (:issue:`47728`)
210216
- Disallow passing non-keyword arguments to :func:`read_excel` except ``io`` and ``sheet_name`` (:issue:`34418`)
217+
- Disallow passing non-keyword arguments to :meth:`StringMethods.split` and :meth:`StringMethods.rsplit` except for ``pat`` (:issue:`47448`)
211218
- Disallow passing non-keyword arguments to :meth:`DataFrame.set_index` except ``keys`` (:issue:`41495`)
212219
- Disallow passing non-keyword arguments to :meth:`Resampler.interpolate` except ``method`` (:issue:`41699`)
213220
- Disallow passing non-keyword arguments to :meth:`DataFrame.reset_index` and :meth:`Series.reset_index` except ``level`` (:issue:`41496`)
@@ -224,6 +231,7 @@ Removal of prior version deprecations/changes
224231
- Disallow passing non-keyword arguments to :func:`concat` except for ``objs`` (:issue:`41485`)
225232
- Disallow passing non-keyword arguments to :func:`pivot` except for ``data`` (:issue:`48301`)
226233
- Disallow passing non-keyword arguments to :meth:`DataFrame.pivot` (:issue:`48301`)
234+
- Disallow passing non-keyword arguments to :func:`read_html` except for ``io`` (:issue:`27573`)
227235
- Disallow passing non-keyword arguments to :func:`read_json` except for ``path_or_buf`` (:issue:`27573`)
228236
- Disallow passing non-keyword arguments to :func:`read_sas` except for ``filepath_or_buffer`` (:issue:`47154`)
229237
- Disallow passing non-keyword arguments to :func:`read_stata` except for ``filepath_or_buffer`` (:issue:`48128`)
@@ -269,6 +277,9 @@ Removal of prior version deprecations/changes
269277
- Enforced disallowing a string column label into ``times`` in :meth:`DataFrame.ewm` (:issue:`43265`)
270278
- Enforced disallowing a tuple of column labels into :meth:`.DataFrameGroupBy.__getitem__` (:issue:`30546`)
271279
- Enforced disallowing setting values with ``.loc`` using a positional slice. Use ``.loc`` with labels or ``.iloc`` with positions instead (:issue:`31840`)
280+
- Enforced disallowing ``dict`` or ``set`` objects in ``suffixes`` in :func:`merge` (:issue:`34810`)
281+
- Enforced disallowing :func:`merge` to produce duplicated columns through the ``suffixes`` keyword and already existing columns (:issue:`22818`)
282+
- Enforced disallowing using :func:`merge` or :func:`join` on a different number of levels (:issue:`34862`)
272283
- Removed setting Categorical._codes directly (:issue:`41429`)
273284
- Removed setting Categorical.categories directly (:issue:`47834`)
274285
- Removed argument ``inplace`` from :meth:`Categorical.add_categories`, :meth:`Categorical.remove_categories`, :meth:`Categorical.set_categories`, :meth:`Categorical.rename_categories`, :meth:`Categorical.reorder_categories`, :meth:`Categorical.set_ordered`, :meth:`Categorical.as_ordered`, :meth:`Categorical.as_unordered` (:issue:`37981`, :issue:`41118`, :issue:`41133`, :issue:`47834`)
@@ -281,15 +292,18 @@ Removal of prior version deprecations/changes
281292
- Removed the deprecated method ``tshift`` from pandas classes (:issue:`11631`)
282293
- Changed behavior of empty data passed into :class:`Series`; the default dtype will be ``object`` instead of ``float64`` (:issue:`29405`)
283294
- Changed the behavior of :func:`to_datetime` with argument "now" with ``utc=False`` to match ``Timestamp("now")`` (:issue:`18705`)
295+
- Changed behavior of :meth:`SparseArray.astype` when given a dtype that is not explicitly ``SparseDtype``, cast to the exact requested dtype rather than silently using a ``SparseDtype`` instead (:issue:`34457`)
284296
- Changed behavior of :class:`DataFrame` constructor given floating-point ``data`` and an integer ``dtype``, when the data cannot be cast losslessly, the floating point dtype is retained, matching :class:`Series` behavior (:issue:`41170`)
285297
- Changed behavior of :class:`DataFrame` constructor when passed a ``dtype`` (other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (:issue:`41733`)
286298
- Changed the behavior of :class:`Series` constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (:issue:`41731`)
299+
- Changed behavior of :class:`Timestamp` constructor with a ``np.datetime64`` object and a ``tz`` passed to interpret the input as a wall-time as opposed to a UTC time (:issue:`42288`)
287300
- Changed behavior of :class:`Index` constructor when passed a ``SparseArray`` or ``SparseDtype`` to retain that dtype instead of casting to ``numpy.ndarray`` (:issue:`43930`)
301+
- Changed behavior of :class:`Index`, :class:`Series`, :class:`DataFrame` constructors with floating-dtype data and a :class:`DatetimeTZDtype`, the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated (:issue:`45573`)
288302
- Removed the deprecated ``base`` and ``loffset`` arguments from :meth:`pandas.DataFrame.resample`, :meth:`pandas.Series.resample` and :class:`pandas.Grouper`. Use ``offset`` or ``origin`` instead (:issue:`31809`)
289303
- Changed behavior of :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``; object-dtype columns with all-bool values will no longer be included, manually cast to ``bool`` dtype first (:issue:`46188`)
290304
- Changed behavior of comparison of a :class:`Timestamp` with a ``datetime.date`` object; these now compare as un-equal and raise on inequality comparisons, matching the ``datetime.datetime`` behavior (:issue:`36131`)
291305
- Enforced deprecation of silently dropping columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a list or dictionary (:issue:`43740`)
292-
-
306+
- Change behavior of :meth:`DataFrame.apply` with list-like so that any partial failure will raise an error (:issue:`43740`)
293307

294308
.. ---------------------------------------------------------------------------
295309
.. _whatsnew_200.performance:

pandas/_libs/internals.pyx

+9-3
Original file line numberDiff line numberDiff line change
@@ -676,8 +676,9 @@ cdef class BlockManager:
676676
public bint _known_consolidated, _is_consolidated
677677
public ndarray _blknos, _blklocs
678678
public list refs
679+
public object parent
679680

680-
def __cinit__(self, blocks=None, axes=None, refs=None, verify_integrity=True):
681+
def __cinit__(self, blocks=None, axes=None, refs=None, parent=None, verify_integrity=True):
681682
# None as defaults for unpickling GH#42345
682683
if blocks is None:
683684
# This adds 1-2 microseconds to DataFrame(np.array([]))
@@ -690,6 +691,7 @@ cdef class BlockManager:
690691
self.blocks = blocks
691692
self.axes = axes.copy() # copy to make sure we are not remotely-mutable
692693
self.refs = refs
694+
self.parent = parent
693695

694696
# Populate known_consolidate, blknos, and blklocs lazily
695697
self._known_consolidated = False
@@ -805,7 +807,9 @@ cdef class BlockManager:
805807
nrefs.append(weakref.ref(blk))
806808

807809
new_axes = [self.axes[0], self.axes[1]._getitem_slice(slobj)]
808-
mgr = type(self)(tuple(nbs), new_axes, nrefs, verify_integrity=False)
810+
mgr = type(self)(
811+
tuple(nbs), new_axes, nrefs, parent=self, verify_integrity=False
812+
)
809813

810814
# We can avoid having to rebuild blklocs/blknos
811815
blklocs = self._blklocs
@@ -827,4 +831,6 @@ cdef class BlockManager:
827831
new_axes = list(self.axes)
828832
new_axes[axis] = new_axes[axis]._getitem_slice(slobj)
829833

830-
return type(self)(tuple(new_blocks), new_axes, new_refs, verify_integrity=False)
834+
return type(self)(
835+
tuple(new_blocks), new_axes, new_refs, parent=self, verify_integrity=False
836+
)

pandas/_libs/lib.pyx

+13-5
Original file line numberDiff line numberDiff line change
@@ -1640,8 +1640,11 @@ def infer_datetimelike_array(arr: ndarray[object]) -> tuple[str, bool]:
16401640
return "interval"
16411641
return "mixed"
16421642

1643-
if seen_date and not (seen_datetime or seen_timedelta):
1644-
return "date"
1643+
if seen_date:
1644+
if not seen_datetime and not seen_timedelta:
1645+
return "date"
1646+
return "mixed"
1647+
16451648
elif seen_datetime and not seen_timedelta:
16461649
return "datetime"
16471650
elif seen_timedelta and not seen_datetime:
@@ -2570,10 +2573,15 @@ def maybe_convert_objects(ndarray[object] objects,
25702573
if seen.datetimetz_:
25712574
if is_datetime_with_singletz_array(objects):
25722575
from pandas import DatetimeIndex
2573-
dti = DatetimeIndex(objects)
25742576

2575-
# unbox to DatetimeArray
2576-
return dti._data
2577+
try:
2578+
dti = DatetimeIndex(objects)
2579+
except OutOfBoundsDatetime:
2580+
# e.g. test_to_datetime_cache_coerce_50_lines_outofbounds
2581+
pass
2582+
else:
2583+
# unbox to DatetimeArray
2584+
return dti._data
25772585
seen.object_ = True
25782586

25792587
elif seen.datetime_:

pandas/_libs/tslibs/timestamps.pyx

+2-11
Original file line numberDiff line numberDiff line change
@@ -1637,18 +1637,9 @@ class Timestamp(_Timestamp):
16371637

16381638
tzobj = maybe_get_tz(tz)
16391639
if tzobj is not None and is_datetime64_object(ts_input):
1640-
# GH#24559, GH#42288 In the future we will treat datetime64 as
1640+
# GH#24559, GH#42288 As of 2.0 we treat datetime64 as
16411641
# wall-time (consistent with DatetimeIndex)
1642-
warnings.warn(
1643-
"In a future version, when passing a np.datetime64 object and "
1644-
"a timezone to Timestamp, the datetime64 will be interpreted "
1645-
"as a wall time, not a UTC time. To interpret as a UTC time, "
1646-
"use `Timestamp(dt64).tz_localize('UTC').tz_convert(tz)`",
1647-
FutureWarning,
1648-
stacklevel=find_stack_level(),
1649-
)
1650-
# Once this deprecation is enforced, we can do
1651-
# return Timestamp(ts_input).tz_localize(tzobj)
1642+
return cls(ts_input).tz_localize(tzobj)
16521643

16531644
if nanosecond is None:
16541645
nanosecond = 0

pandas/compat/_optional.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
"blosc": "1.21.0",
1717
"bottleneck": "1.3.2",
1818
"brotli": "0.7.0",
19-
"fastparquet": "0.4.0",
19+
"fastparquet": "0.6.3",
2020
"fsspec": "2021.07.0",
2121
"html5lib": "1.1",
2222
"hypothesis": "6.13.0",

0 commit comments

Comments
 (0)