Skip to content

Commit dee0595

Browse files
authored
Merge branch 'main' into nightlies
2 parents e1a0621 + 10ef2ef commit dee0595

File tree

93 files changed

+1276
-422
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+1276
-422
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
.build_cache_dir
3939
.mesonpy-native-file.ini
4040
MANIFEST
41+
compile_commands.json
42+
debug
4143

4244
# Python files #
4345
################

ci/code_checks.sh

-36
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8383
pandas.Series.backfill \
8484
pandas.Series.ffill \
8585
pandas.Series.pad \
86-
pandas.Series.dt.days \
87-
pandas.Series.dt.seconds \
88-
pandas.Series.dt.microseconds \
89-
pandas.Series.dt.nanoseconds \
9086
pandas.Series.str.center \
9187
pandas.Series.str.decode \
9288
pandas.Series.str.encode \
@@ -121,7 +117,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
121117
pandas.errors.DataError \
122118
pandas.errors.IncompatibilityWarning \
123119
pandas.errors.InvalidComparison \
124-
pandas.errors.InvalidVersion \
125120
pandas.errors.IntCastingNaNError \
126121
pandas.errors.LossySetitemError \
127122
pandas.errors.MergeError \
@@ -137,7 +132,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
137132
pandas.errors.PyperclipWindowsException \
138133
pandas.errors.UnsortedIndexError \
139134
pandas.errors.UnsupportedFunctionCall \
140-
pandas.show_versions \
141135
pandas.test \
142136
pandas.NaT \
143137
pandas.Timestamp.as_unit \
@@ -170,19 +164,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
170164
pandas.Period.asfreq \
171165
pandas.Period.now \
172166
pandas.arrays.PeriodArray \
173-
pandas.arrays.IntervalArray.from_arrays \
174-
pandas.arrays.IntervalArray.to_tuples \
175-
pandas.Int8Dtype \
176-
pandas.Int16Dtype \
177-
pandas.Int32Dtype \
178-
pandas.Int64Dtype \
179-
pandas.UInt8Dtype \
180-
pandas.UInt16Dtype \
181-
pandas.UInt32Dtype \
182-
pandas.UInt64Dtype \
183-
pandas.NA \
184-
pandas.Float32Dtype \
185-
pandas.Float64Dtype \
186167
pandas.CategoricalDtype.categories \
187168
pandas.CategoricalDtype.ordered \
188169
pandas.Categorical.dtype \
@@ -258,23 +239,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
258239
pandas.util.hash_pandas_object \
259240
pandas_object \
260241
pandas.api.interchange.from_dataframe \
261-
pandas.Index.T \
262-
pandas.Index.memory_usage \
263-
pandas.Index.copy \
264-
pandas.Index.drop \
265-
pandas.Index.identical \
266-
pandas.Index.insert \
267-
pandas.Index.is_ \
268-
pandas.Index.take \
269-
pandas.Index.putmask \
270-
pandas.Index.unique \
271-
pandas.Index.fillna \
272-
pandas.Index.dropna \
273-
pandas.Index.astype \
274-
pandas.Index.map \
275-
pandas.Index.to_list \
276-
pandas.Index.append \
277-
pandas.Index.join \
278242
pandas.Index.asof_locs \
279243
pandas.Index.get_slice_bound \
280244
pandas.RangeIndex \

doc/source/development/debugging_extensions.rst

+18
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,21 @@ For Python developers with limited or no C/C++ experience this can seem a daunti
1313
1. `Fundamental Python Debugging Part 1 - Python <https://willayd.com/fundamental-python-debugging-part-1-python.html>`_
1414
2. `Fundamental Python Debugging Part 2 - Python Extensions <https://willayd.com/fundamental-python-debugging-part-2-python-extensions.html>`_
1515
3. `Fundamental Python Debugging Part 3 - Cython Extensions <https://willayd.com/fundamental-python-debugging-part-3-cython-extensions.html>`_
16+
17+
Generating debug builds
18+
-----------------------
19+
20+
By default building pandas from source will generate a release build. To generate a development build you can type::
21+
22+
pip install -ve . --no-build-isolation --config-settings=builddir="debug" --config-settings=setup-args="-Dbuildtype=debug"
23+
24+
By specifying ``builddir="debug"`` all of the targets will be built and placed in the debug directory relative to the project root. This helps to keep your debug and release artifacts separate; you are of course able to choose a different directory name or omit altogether if you do not care to separate build types.
25+
26+
Editor support
27+
--------------
28+
29+
The meson build system generates a `compilation database <https://clang.llvm.org/docs/JSONCompilationDatabase.html>`_ automatically and places it in the build directory. Many language servers and IDEs can use this information to provide code-completion, go-to-defintion and error checking support as you type.
30+
31+
How each language server / IDE chooses to look for the compilation database may vary. When in doubt you may want to create a symlink at the root of the project that points to the compilation database in your build directory. Assuming you used *debug* as your directory name, you can run::
32+
33+
ln -s debug/compile_commands.json .

doc/source/whatsnew/v0.15.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ API changes
102102
current behavior:
103103

104104
.. ipython:: python
105+
:okwarning:
105106
106107
gr.apply(sum)
107108

doc/source/whatsnew/v2.0.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -945,7 +945,7 @@ Removal of prior version deprecations/changes
945945
- Disallow passing non-keyword arguments to :meth:`DataFrame.where` and :meth:`Series.where` except for ``cond`` and ``other`` (:issue:`41523`)
946946
- Disallow passing non-keyword arguments to :meth:`Series.set_axis` and :meth:`DataFrame.set_axis` except for ``labels`` (:issue:`41491`)
947947
- Disallow passing non-keyword arguments to :meth:`Series.rename_axis` and :meth:`DataFrame.rename_axis` except for ``mapper`` (:issue:`47587`)
948-
- Disallow passing non-keyword arguments to :meth:`Series.clip` and :meth:`DataFrame.clip` (:issue:`41511`)
948+
- Disallow passing non-keyword arguments to :meth:`Series.clip` and :meth:`DataFrame.clip` except ``lower`` and ``upper`` (:issue:`41511`)
949949
- Disallow passing non-keyword arguments to :meth:`Series.bfill`, :meth:`Series.ffill`, :meth:`DataFrame.bfill` and :meth:`DataFrame.ffill` (:issue:`41508`)
950950
- Disallow passing non-keyword arguments to :meth:`DataFrame.replace`, :meth:`Series.replace` except for ``to_replace`` and ``value`` (:issue:`47587`)
951951
- Disallow passing non-keyword arguments to :meth:`DataFrame.sort_values` except for ``by`` (:issue:`41505`)

doc/source/whatsnew/v2.0.2.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_202:
22

3-
What's new in 2.0.2 (May ..., 2023)
3+
What's new in 2.0.2 (May 26, 2023)
44
-----------------------------------
55

66
These are the changes in pandas 2.0.2. See :ref:`release` for a full changelog
@@ -25,6 +25,7 @@ Fixed regressions
2525

2626
Bug fixes
2727
~~~~~~~~~
28+
- Bug in :class:`.arrays.ArrowExtensionArray` incorrectly assigning ``dict`` instead of ``list`` for ``.type`` with ``pyarrow.map_`` and raising a ``NotImplementedError`` with ``pyarrow.struct`` (:issue:`53328`)
2829
- Bug in :func:`api.interchange.from_dataframe` was raising ``IndexError`` on empty categorical data (:issue:`53077`)
2930
- Bug in :func:`api.interchange.from_dataframe` was returning :class:`DataFrame`'s of incorrect sizes when called on slices (:issue:`52824`)
3031
- Bug in :func:`api.interchange.from_dataframe` was unnecessarily raising on bitmasks (:issue:`49888`)
@@ -34,6 +35,7 @@ Bug fixes
3435
- Bug in :func:`to_timedelta` was raising ``ValueError`` with ``pandas.NA`` (:issue:`52909`)
3536
- Bug in :meth:`DataFrame.__getitem__` not preserving dtypes for :class:`MultiIndex` partial keys (:issue:`51895`)
3637
- Bug in :meth:`DataFrame.convert_dtypes` ignores ``convert_*`` keywords when set to False ``dtype_backend="pyarrow"`` (:issue:`52872`)
38+
- Bug in :meth:`DataFrame.convert_dtypes` losing timezone for tz-aware dtypes and ``dtype_backend="pyarrow"`` (:issue:`53382`)
3739
- Bug in :meth:`DataFrame.sort_values` raising for PyArrow ``dictionary`` dtype (:issue:`53232`)
3840
- Bug in :meth:`Series.describe` treating pyarrow-backed timestamps and timedeltas as categorical data (:issue:`53001`)
3941
- Bug in :meth:`Series.rename` not making a lazy copy when Copy-on-Write is enabled when a scalar is passed to it (:issue:`52450`)

doc/source/whatsnew/v2.1.0.rst

+6
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,7 @@ Deprecations
251251
- Deprecated :meth:`DataFrame.applymap`. Use the new :meth:`DataFrame.map` method instead (:issue:`52353`)
252252
- Deprecated :meth:`DataFrame.swapaxes` and :meth:`Series.swapaxes`, use :meth:`DataFrame.transpose` or :meth:`Series.transpose` instead (:issue:`51946`)
253253
- Deprecated ``freq`` parameter in :class:`PeriodArray` constructor, pass ``dtype`` instead (:issue:`52462`)
254+
- Deprecated behavior of :class:`DataFrame` reductions ``sum``, ``prod``, ``std``, ``var``, ``sem`` with ``axis=None``, in a future version this will operate over both axes returning a scalar instead of behaving like ``axis=0``; note this also affects numpy functions e.g. ``np.sum(df)`` (:issue:`21597`)
254255
- Deprecated behavior of :func:`concat` when :class:`DataFrame` has columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (:issue:`40893`)
255256
- Deprecated behavior of :meth:`Series.dt.to_pydatetime`, in a future version this will return a :class:`Series` containing python ``datetime`` objects instead of an ``ndarray`` of datetimes; this matches the behavior of other :meth:`Series.dt` properties (:issue:`20306`)
256257
- Deprecated logical operations (``|``, ``&``, ``^``) between pandas objects and dtype-less sequences (e.g. ``list``, ``tuple``), wrap a sequence in a :class:`Series` or numpy array before operating instead (:issue:`51521`)
@@ -265,8 +266,10 @@ Deprecations
265266
- Deprecated logical operation between two non boolean :class:`Series` with different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs. (:issue:`52500`, :issue:`52538`)
266267
- Deprecated allowing ``downcast`` keyword other than ``None``, ``False``, "infer", or a dict with these as values in :meth:`Series.fillna`, :meth:`DataFrame.fillna` (:issue:`40988`)
267268
- Deprecated allowing arbitrary ``fill_value`` in :class:`SparseDtype`, in a future version the ``fill_value`` will need to be compatible with the ``dtype.subtype``, either a scalar that can be held by that subtype or ``NaN`` for integer or bool subtypes (:issue:`23124`)
269+
- Deprecated behavior of :func:`assert_series_equal` and :func:`assert_frame_equal` considering NA-like values (e.g. ``NaN`` vs ``None`` as equivalent) (:issue:`52081`)
268270
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
269271
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)
272+
-
270273

271274
.. ---------------------------------------------------------------------------
272275
.. _whatsnew_210.performance:
@@ -301,6 +304,7 @@ Performance improvements
301304
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.astype` when converting from a pyarrow timestamp or duration dtype to numpy (:issue:`53326`)
302305
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`52525`)
303306
- Performance improvement when doing various reshaping operations on :class:`arrays.IntegerArrays` & :class:`arrays.FloatingArray` by avoiding doing unnecessary validation (:issue:`53013`)
307+
- Performance improvement when indexing with pyarrow timestamp and duration dtypes (:issue:`53368`)
304308

305309
.. ---------------------------------------------------------------------------
306310
.. _whatsnew_210.bug_fixes:
@@ -422,6 +426,7 @@ Groupby/resample/rolling
422426
grouped :class:`Series` or :class:`DataFrame` was a :class:`DatetimeIndex`, :class:`TimedeltaIndex`
423427
or :class:`PeriodIndex`, and the ``groupby`` method was given a function as its first argument,
424428
the function operated on the whole index rather than each element of the index. (:issue:`51979`)
429+
- Bug in :meth:`DataFrameGroupBy.agg` with lists not respecting ``as_index=False`` (:issue:`52849`)
425430
- Bug in :meth:`DataFrameGroupBy.apply` causing an error to be raised when the input :class:`DataFrame` was subset as a :class:`DataFrame` after groupby (``[['a']]`` and not ``['a']``) and the given callable returned :class:`Series` that were not all indexed the same. (:issue:`52444`)
426431
- Bug in :meth:`DataFrameGroupBy.apply` raising a ``TypeError`` when selecting multiple columns and providing a function that returns ``np.ndarray`` results (:issue:`18930`)
427432
- Bug in :meth:`GroupBy.groups` with a datetime key in conjunction with another key produced incorrect number of group keys (:issue:`51158`)
@@ -432,6 +437,7 @@ Reshaping
432437
^^^^^^^^^
433438
- Bug in :func:`crosstab` when ``dropna=False`` would not keep ``np.nan`` in the result (:issue:`10772`)
434439
- Bug in :meth:`DataFrame.agg` and :meth:`Series.agg` on non-unique columns would return incorrect type when dist-like argument passed in (:issue:`51099`)
440+
- Bug in :meth:`DataFrame.idxmin` and :meth:`DataFrame.idxmax`, where the axis dtype would be lost for empty frames (:issue:`53265`)
435441
- Bug in :meth:`DataFrame.merge` not merging correctly when having ``MultiIndex`` with single level (:issue:`52331`)
436442
- Bug in :meth:`DataFrame.stack` losing extension dtypes when columns is a :class:`MultiIndex` and frame contains mixed dtypes (:issue:`45740`)
437443
- Bug in :meth:`DataFrame.transpose` inferring dtype for object column (:issue:`51546`)

pandas/_libs/groupby.pyx

+7
Original file line numberDiff line numberDiff line change
@@ -1075,6 +1075,13 @@ def group_mean(
10751075
y = val - compensation[lab, j]
10761076
t = sumx[lab, j] + y
10771077
compensation[lab, j] = t - sumx[lab, j] - y
1078+
if compensation[lab, j] != compensation[lab, j]:
1079+
# GH#50367
1080+
# If val is +/- infinity, compensation is NaN
1081+
# which would lead to results being NaN instead
1082+
# of +/-infinity. We cannot use util.is_nan
1083+
# because of no gil
1084+
compensation[lab, j] = 0.
10781085
sumx[lab, j] = t
10791086

10801087
for i in range(ncounts):

pandas/_libs/missing.pyx

+20
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,26 @@ class NAType(C_NAType):
377377
378378
The NA singleton is a missing value indicator defined by pandas. It is
379379
used in certain new extension dtypes (currently the "string" dtype).
380+
381+
Examples
382+
--------
383+
>>> pd.NA
384+
<NA>
385+
386+
>>> True | pd.NA
387+
True
388+
389+
>>> True & pd.NA
390+
<NA>
391+
392+
>>> pd.NA != pd.NA
393+
<NA>
394+
395+
>>> pd.NA == pd.NA
396+
<NA>
397+
398+
>>> True | pd.NA
399+
True
380400
"""
381401

382402
_instance = None

pandas/_libs/testing.pyx

+23-7
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,25 @@
11
import cmath
22
import math
3+
import warnings
34

45
import numpy as np
56

67
from numpy cimport import_array
78

89
import_array()
910

10-
from pandas._libs.missing cimport checknull
11+
from pandas._libs.missing cimport (
12+
checknull,
13+
is_matching_na,
14+
)
1115
from pandas._libs.util cimport (
1216
is_array,
1317
is_complex_object,
1418
is_real_number_object,
1519
)
1620

21+
from pandas.util._exceptions import find_stack_level
22+
1723
from pandas.core.dtypes.missing import array_equivalent
1824

1925

@@ -176,13 +182,23 @@ cpdef assert_almost_equal(a, b,
176182
# classes can't be the same, to raise error
177183
assert_class_equal(a, b, obj=obj)
178184

179-
if checknull(a) and checknull(b):
180-
# TODO: Should require same-dtype NA?
185+
if checknull(a):
181186
# nan / None comparison
182-
return True
183-
184-
if (checknull(a) and not checknull(b)) or (not checknull(a) and checknull(b)):
185-
# boolean value of pd.NA is ambiguous
187+
if is_matching_na(a, b, nan_matches_none=False):
188+
return True
189+
elif checknull(b):
190+
# GH#18463
191+
warnings.warn(
192+
f"Mismatched null-like values {a} and {b} found. In a future "
193+
"version, pandas equality-testing functions "
194+
"(e.g. assert_frame_equal) will consider these not-matching "
195+
"and raise.",
196+
FutureWarning,
197+
stacklevel=find_stack_level(),
198+
)
199+
return True
200+
raise AssertionError(f"{a} != {b}")
201+
elif checknull(b):
186202
raise AssertionError(f"{a} != {b}")
187203

188204
if a == b:

pandas/_libs/tslibs/offsets.pyx

+52
Original file line numberDiff line numberDiff line change
@@ -2506,6 +2506,19 @@ cdef class BQuarterEnd(QuarterOffset):
25062506
startingMonth = 2 corresponds to dates like 2/28/2007, 5/31/2007, ...
25072507
startingMonth = 3 corresponds to dates like 3/30/2007, 6/29/2007, ...
25082508
2509+
Parameters
2510+
----------
2511+
n : int, default 1
2512+
The number of quarters represented.
2513+
normalize : bool, default False
2514+
Normalize start/end dates to midnight before generating date range.
2515+
startingMonth : int, default 3
2516+
A specific integer for the month of the year from which we start quarters.
2517+
2518+
See Also
2519+
--------
2520+
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.
2521+
25092522
Examples
25102523
--------
25112524
>>> from pandas.tseries.offsets import BQuarterEnd
@@ -2534,6 +2547,19 @@ cdef class BQuarterBegin(QuarterOffset):
25342547
startingMonth = 2 corresponds to dates like 2/01/2007, 5/01/2007, ...
25352548
startingMonth = 3 corresponds to dates like 3/01/2007, 6/01/2007, ...
25362549
2550+
Parameters
2551+
----------
2552+
n : int, default 1
2553+
The number of quarters represented.
2554+
normalize : bool, default False
2555+
Normalize start/end dates to midnight before generating date range.
2556+
startingMonth : int, default 3
2557+
A specific integer for the month of the year from which we start quarters.
2558+
2559+
See Also
2560+
--------
2561+
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.
2562+
25372563
Examples
25382564
--------
25392565
>>> from pandas.tseries.offsets import BQuarterBegin
@@ -2562,6 +2588,19 @@ cdef class QuarterEnd(QuarterOffset):
25622588
startingMonth = 2 corresponds to dates like 2/28/2007, 5/31/2007, ...
25632589
startingMonth = 3 corresponds to dates like 3/31/2007, 6/30/2007, ...
25642590
2591+
Parameters
2592+
----------
2593+
n : int, default 1
2594+
The number of quarters represented.
2595+
normalize : bool, default False
2596+
Normalize start/end dates to midnight before generating date range.
2597+
startingMonth : int, default 3
2598+
A specific integer for the month of the year from which we start quarters.
2599+
2600+
See Also
2601+
--------
2602+
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.
2603+
25652604
Examples
25662605
--------
25672606
>>> ts = pd.Timestamp(2022, 1, 1)
@@ -2590,6 +2629,19 @@ cdef class QuarterBegin(QuarterOffset):
25902629
startingMonth = 2 corresponds to dates like 2/01/2007, 5/01/2007, ...
25912630
startingMonth = 3 corresponds to dates like 3/01/2007, 6/01/2007, ...
25922631
2632+
Parameters
2633+
----------
2634+
n : int, default 1
2635+
The number of quarters represented.
2636+
normalize : bool, default False
2637+
Normalize start/end dates to midnight before generating date range.
2638+
startingMonth : int, default 3
2639+
A specific integer for the month of the year from which we start quarters.
2640+
2641+
See Also
2642+
--------
2643+
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.
2644+
25932645
Examples
25942646
--------
25952647
>>> ts = pd.Timestamp(2022, 1, 1)

pandas/_libs/tslibs/timedeltas.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1592,7 +1592,7 @@ cdef class _Timedelta(timedelta):
15921592

15931593
def as_unit(self, str unit, bint round_ok=True):
15941594
"""
1595-
Convert the underlying int64 representaton to the given unit.
1595+
Convert the underlying int64 representation to the given unit.
15961596
15971597
Parameters
15981598
----------

pandas/core/algorithms.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,10 @@
3232
from pandas.util._decorators import doc
3333
from pandas.util._exceptions import find_stack_level
3434

35-
from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike
35+
from pandas.core.dtypes.cast import (
36+
construct_1d_object_array_from_listlike,
37+
np_find_common_type,
38+
)
3639
from pandas.core.dtypes.common import (
3740
ensure_float64,
3841
ensure_object,
@@ -518,7 +521,7 @@ def f(c, v):
518521
f = np.in1d
519522

520523
else:
521-
common = np.find_common_type([values.dtype, comps_array.dtype], [])
524+
common = np_find_common_type(values.dtype, comps_array.dtype)
522525
values = values.astype(common, copy=False)
523526
comps_array = comps_array.astype(common, copy=False)
524527
f = htable.ismember

0 commit comments

Comments
 (0)