Skip to content

Commit 48dc7fd

Browse files
authored
Merge branch 'main' into string
2 parents 043c667 + 1e530b6 commit 48dc7fd

35 files changed

+268
-96
lines changed

.circleci/config.yml

-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ jobs:
3434
fi
3535
python -m pip install --no-build-isolation -ve . -Csetup-args="--werror"
3636
PATH=$HOME/miniconda3/envs/pandas-dev/bin:$HOME/miniconda3/condabin:$PATH
37-
sudo apt-get update && sudo apt-get install -y libegl1 libopengl0
3837
ci/run_tests.sh
3938
test-linux-musl:
4039
docker:

.github/workflows/unit-tests.yml

+4-2
Original file line numberDiff line numberDiff line change
@@ -385,10 +385,12 @@ jobs:
385385
nogil: true
386386

387387
- name: Build Environment
388+
# TODO: Once numpy 2.2.1 is out, don't install nightly version
389+
# Tests segfault with numpy 2.2.0: https://github.com/numpy/numpy/pull/27955
388390
run: |
389391
python --version
390-
python -m pip install --upgrade pip setuptools wheel numpy meson[ninja]==1.2.1 meson-python==0.13.1
391-
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython
392+
python -m pip install --upgrade pip setuptools wheel meson[ninja]==1.2.1 meson-python==0.13.1
393+
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython numpy
392394
python -m pip install versioneer[toml]
393395
python -m pip install python-dateutil pytz tzdata hypothesis>=6.84.0 pytest>=7.3.2 pytest-xdist>=3.4.0 pytest-cov
394396
python -m pip install -ve . --no-build-isolation --no-index --no-deps -Csetup-args="--werror"

ci/code_checks.sh

-4
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8181
-i "pandas.Timestamp.resolution PR02" \
8282
-i "pandas.Timestamp.tzinfo GL08" \
8383
-i "pandas.arrays.ArrowExtensionArray PR07,SA01" \
84-
-i "pandas.arrays.IntervalArray.length SA01" \
8584
-i "pandas.arrays.NumpyExtensionArray SA01" \
8685
-i "pandas.arrays.TimedeltaArray PR07,SA01" \
8786
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02" \
@@ -94,11 +93,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9493
-i "pandas.core.resample.Resampler.std SA01" \
9594
-i "pandas.core.resample.Resampler.transform PR01,RT03,SA01" \
9695
-i "pandas.core.resample.Resampler.var SA01" \
97-
-i "pandas.errors.UndefinedVariableError PR01,SA01" \
9896
-i "pandas.errors.ValueLabelTypeMismatch SA01" \
99-
-i "pandas.io.json.build_table_schema PR07,RT03,SA01" \
10097
-i "pandas.plotting.andrews_curves RT03,SA01" \
101-
-i "pandas.plotting.scatter_matrix PR07,SA01" \
10298
-i "pandas.tseries.offsets.BDay PR02,SA01" \
10399
-i "pandas.tseries.offsets.BQuarterBegin.is_on_offset GL08" \
104100
-i "pandas.tseries.offsets.BQuarterBegin.n GL08" \

doc/source/whatsnew/v3.0.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Other enhancements
5656
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`)
5757
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
5858
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
59+
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
5960
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
6061
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
6162
- :meth:`str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
@@ -801,6 +802,7 @@ Other
801802
- Bug in ``Series.list`` methods not preserving the original :class:`Index`. (:issue:`58425`)
802803
- Bug in ``Series.list`` methods not preserving the original name. (:issue:`60522`)
803804
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
805+
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
804806

805807
.. ***DO NOT USE THIS SECTION***
806808

pandas/core/arrays/interval.py

+14
Original file line numberDiff line numberDiff line change
@@ -1306,6 +1306,20 @@ def length(self) -> Index:
13061306
"""
13071307
Return an Index with entries denoting the length of each Interval.
13081308
1309+
The length of an interval is calculated as the difference between
1310+
its `right` and `left` bounds. This property is particularly useful
1311+
when working with intervals where the size of the interval is an important
1312+
attribute, such as in time-series analysis or spatial data analysis.
1313+
1314+
See Also
1315+
--------
1316+
arrays.IntervalArray.left : Return the left endpoints of each Interval in
1317+
the IntervalArray as an Index.
1318+
arrays.IntervalArray.right : Return the right endpoints of each Interval in
1319+
the IntervalArray as an Index.
1320+
arrays.IntervalArray.mid : Return the midpoint of each Interval in the
1321+
IntervalArray as an Index.
1322+
13091323
Examples
13101324
--------
13111325

pandas/core/computation/expressions.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def _evaluate_numexpr(op, op_str, left_op, right_op):
108108
try:
109109
result = ne.evaluate(
110110
f"left_value {op_str} right_value",
111-
local_dict={"left_value": left_value, "right_value": right_op},
111+
local_dict={"left_value": left_value, "right_value": right_value},
112112
casting="safe",
113113
)
114114
except TypeError:
@@ -257,11 +257,17 @@ def where(cond, left_op, right_op, use_numexpr: bool = True):
257257
Whether to try to use numexpr.
258258
"""
259259
assert _where is not None
260+
string
260261
return (
261262
_where(cond, left_op, right_op)
262263
if use_numexpr
263264
else _where_standard(cond, left_op, right_op)
264265
)
266+
if use_numexpr:
267+
return _where(cond, left_op, right_op)
268+
else:
269+
return _where_standard(cond, left_op, right_op)
270+
main
265271

266272

267273
def set_test_mode(v: bool = True) -> None:

pandas/core/dtypes/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ def is_period_dtype(arr_or_dtype) -> bool:
430430
Check whether an array-like or dtype is of the Period dtype.
431431
432432
.. deprecated:: 2.2.0
433-
Use isinstance(dtype, pd.Period) instead.
433+
Use isinstance(dtype, pd.PeriodDtype) instead.
434434
435435
Parameters
436436
----------

pandas/core/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -665,7 +665,7 @@ def size(self) -> int:
665665
666666
See Also
667667
--------
668-
ndarray.size : Number of elements in the array.
668+
numpy.ndarray.size : Number of elements in the array.
669669
670670
Examples
671671
--------

pandas/core/window/ewm.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ def online(
490490
klass="Series/Dataframe",
491491
axis="",
492492
)
493-
def aggregate(self, func, *args, **kwargs):
493+
def aggregate(self, func=None, *args, **kwargs):
494494
return super().aggregate(func, *args, **kwargs)
495495

496496
agg = aggregate
@@ -981,7 +981,7 @@ def reset(self) -> None:
981981
"""
982982
self._mean.reset()
983983

984-
def aggregate(self, func, *args, **kwargs):
984+
def aggregate(self, func=None, *args, **kwargs):
985985
raise NotImplementedError("aggregate is not implemented.")
986986

987987
def std(self, bias: bool = False, *args, **kwargs):

pandas/core/window/expanding.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ def _get_window_indexer(self) -> BaseIndexer:
167167
klass="Series/Dataframe",
168168
axis="",
169169
)
170-
def aggregate(self, func, *args, **kwargs):
170+
def aggregate(self, func=None, *args, **kwargs):
171171
return super().aggregate(func, *args, **kwargs)
172172

173173
agg = aggregate

pandas/core/window/rolling.py

+11-4
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,10 @@
4444

4545
from pandas.core._numba import executor
4646
from pandas.core.algorithms import factorize
47-
from pandas.core.apply import ResamplerWindowApply
47+
from pandas.core.apply import (
48+
ResamplerWindowApply,
49+
reconstruct_func,
50+
)
4851
from pandas.core.arrays import ExtensionArray
4952
from pandas.core.base import SelectionMixin
5053
import pandas.core.common as com
@@ -646,8 +649,12 @@ def _numba_apply(
646649
out = obj._constructor(result, index=index, columns=columns)
647650
return self._resolve_output(out, obj)
648651

649-
def aggregate(self, func, *args, **kwargs):
652+
def aggregate(self, func=None, *args, **kwargs):
653+
relabeling, func, columns, order = reconstruct_func(func, **kwargs)
650654
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
655+
if isinstance(result, ABCDataFrame) and relabeling:
656+
result = result.iloc[:, order]
657+
result.columns = columns # type: ignore[union-attr]
651658
if result is None:
652659
return self.apply(func, raw=False, args=args, kwargs=kwargs)
653660
return result
@@ -1239,7 +1246,7 @@ def calc(x):
12391246
klass="Series/DataFrame",
12401247
axis="",
12411248
)
1242-
def aggregate(self, func, *args, **kwargs):
1249+
def aggregate(self, func=None, *args, **kwargs):
12431250
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
12441251
if result is None:
12451252
# these must apply directly
@@ -1951,7 +1958,7 @@ def _raise_monotonic_error(self, msg: str):
19511958
klass="Series/Dataframe",
19521959
axis="",
19531960
)
1954-
def aggregate(self, func, *args, **kwargs):
1961+
def aggregate(self, func=None, *args, **kwargs):
19551962
return super().aggregate(func, *args, **kwargs)
19561963

19571964
agg = aggregate

pandas/errors/__init__.py

+14
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,20 @@ class UndefinedVariableError(NameError):
588588
589589
It will also specify whether the undefined variable is local or not.
590590
591+
Parameters
592+
----------
593+
name : str
594+
The name of the undefined variable.
595+
is_local : bool or None, optional
596+
Indicates whether the undefined variable is considered a local variable.
597+
If ``True``, the error message specifies it as a local variable.
598+
If ``False`` or ``None``, the variable is treated as a non-local name.
599+
600+
See Also
601+
--------
602+
DataFrame.query : Query the columns of a DataFrame with a boolean expression.
603+
DataFrame.eval : Evaluate a string describing operations on DataFrame columns.
604+
591605
Examples
592606
--------
593607
>>> df = pd.DataFrame({"A": [1, 1, 1]})

pandas/io/formats/format.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,6 @@
7878
)
7979
from pandas.core.indexes.datetimes import DatetimeIndex
8080
from pandas.core.indexes.timedeltas import TimedeltaIndex
81-
from pandas.core.reshape.concat import concat
8281

8382
from pandas.io.common import (
8483
check_parent_directory,
@@ -245,7 +244,11 @@ def _chk_truncate(self) -> None:
245244
series = series.iloc[:max_rows]
246245
else:
247246
row_num = max_rows // 2
248-
series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
247+
_len = len(series)
248+
_slice = np.hstack(
249+
[np.arange(row_num), np.arange(_len - row_num, _len)]
250+
)
251+
series = series.iloc[_slice]
249252
self.tr_row_num = row_num
250253
else:
251254
self.tr_row_num = None

pandas/io/json/_table_schema.py

+14-1
Original file line numberDiff line numberDiff line change
@@ -239,9 +239,16 @@ def build_table_schema(
239239
"""
240240
Create a Table schema from ``data``.
241241
242+
This method is a utility to generate a JSON-serializable schema
243+
representation of a pandas Series or DataFrame, compatible with the
244+
Table Schema specification. It enables structured data to be shared
245+
and validated in various applications, ensuring consistency and
246+
interoperability.
247+
242248
Parameters
243249
----------
244-
data : Series, DataFrame
250+
data : Series or DataFrame
251+
The input data for which the table schema is to be created.
245252
index : bool, default True
246253
Whether to include ``data.index`` in the schema.
247254
primary_key : bool or None, default True
@@ -256,6 +263,12 @@ def build_table_schema(
256263
Returns
257264
-------
258265
dict
266+
A dictionary representing the Table schema.
267+
268+
See Also
269+
--------
270+
DataFrame.to_json : Convert the object to a JSON string.
271+
read_json : Convert a JSON string to pandas object.
259272
260273
Notes
261274
-----

pandas/io/sql.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ def read_sql_table( # pyright: ignore[reportOverlappingOverload]
241241
schema=...,
242242
index_col: str | list[str] | None = ...,
243243
coerce_float=...,
244-
parse_dates: list[str] | dict[str, str] | None = ...,
244+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = ...,
245245
columns: list[str] | None = ...,
246246
chunksize: None = ...,
247247
dtype_backend: DtypeBackend | lib.NoDefault = ...,
@@ -255,7 +255,7 @@ def read_sql_table(
255255
schema=...,
256256
index_col: str | list[str] | None = ...,
257257
coerce_float=...,
258-
parse_dates: list[str] | dict[str, str] | None = ...,
258+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = ...,
259259
columns: list[str] | None = ...,
260260
chunksize: int = ...,
261261
dtype_backend: DtypeBackend | lib.NoDefault = ...,
@@ -268,7 +268,7 @@ def read_sql_table(
268268
schema: str | None = None,
269269
index_col: str | list[str] | None = None,
270270
coerce_float: bool = True,
271-
parse_dates: list[str] | dict[str, str] | None = None,
271+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = None,
272272
columns: list[str] | None = None,
273273
chunksize: int | None = None,
274274
dtype_backend: DtypeBackend | lib.NoDefault = lib.no_default,
@@ -372,7 +372,7 @@ def read_sql_query( # pyright: ignore[reportOverlappingOverload]
372372
index_col: str | list[str] | None = ...,
373373
coerce_float=...,
374374
params: list[Any] | Mapping[str, Any] | None = ...,
375-
parse_dates: list[str] | dict[str, str] | None = ...,
375+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = ...,
376376
chunksize: None = ...,
377377
dtype: DtypeArg | None = ...,
378378
dtype_backend: DtypeBackend | lib.NoDefault = ...,
@@ -386,7 +386,7 @@ def read_sql_query(
386386
index_col: str | list[str] | None = ...,
387387
coerce_float=...,
388388
params: list[Any] | Mapping[str, Any] | None = ...,
389-
parse_dates: list[str] | dict[str, str] | None = ...,
389+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = ...,
390390
chunksize: int = ...,
391391
dtype: DtypeArg | None = ...,
392392
dtype_backend: DtypeBackend | lib.NoDefault = ...,
@@ -399,7 +399,7 @@ def read_sql_query(
399399
index_col: str | list[str] | None = None,
400400
coerce_float: bool = True,
401401
params: list[Any] | Mapping[str, Any] | None = None,
402-
parse_dates: list[str] | dict[str, str] | None = None,
402+
parse_dates: list[str] | dict[str, str] | dict[str, dict[str, Any]] | None = None,
403403
chunksize: int | None = None,
404404
dtype: DtypeArg | None = None,
405405
dtype_backend: DtypeBackend | lib.NoDefault = lib.no_default,

pandas/plotting/_misc.py

+15
Original file line numberDiff line numberDiff line change
@@ -178,14 +178,21 @@ def scatter_matrix(
178178
"""
179179
Draw a matrix of scatter plots.
180180
181+
Each pair of numeric columns in the DataFrame is plotted against each other,
182+
resulting in a matrix of scatter plots. The diagonal plots can display either
183+
histograms or Kernel Density Estimation (KDE) plots for each variable.
184+
181185
Parameters
182186
----------
183187
frame : DataFrame
188+
The data to be plotted.
184189
alpha : float, optional
185190
Amount of transparency applied.
186191
figsize : (float,float), optional
187192
A tuple (width, height) in inches.
188193
ax : Matplotlib axis object, optional
194+
An existing Matplotlib axis object for the plots. If None, a new axis is
195+
created.
189196
grid : bool, optional
190197
Setting this to True will show the grid.
191198
diagonal : {'hist', 'kde'}
@@ -208,6 +215,14 @@ def scatter_matrix(
208215
numpy.ndarray
209216
A matrix of scatter plots.
210217
218+
See Also
219+
--------
220+
plotting.parallel_coordinates : Plots parallel coordinates for multivariate data.
221+
plotting.andrews_curves : Generates Andrews curves for visualizing clusters of
222+
multivariate data.
223+
plotting.radviz : Creates a RadViz visualization.
224+
plotting.bootstrap_plot : Visualizes uncertainty in data via bootstrap sampling.
225+
211226
Examples
212227
--------
213228

pandas/tests/extension/test_arrow.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1647,7 +1647,7 @@ def test_from_arrow_respecting_given_dtype():
16471647

16481648
def test_from_arrow_respecting_given_dtype_unsafe():
16491649
array = pa.array([1.5, 2.5], type=pa.float64())
1650-
with pytest.raises(pa.ArrowInvalid, match="Float value 1.5 was truncated"):
1650+
with tm.external_error_raised(pa.ArrowInvalid):
16511651
array.to_pandas(types_mapper={pa.float64(): ArrowDtype(pa.int64())}.get)
16521652

16531653

0 commit comments

Comments
 (0)