Skip to content

Commit c942b70

Browse files
committed
Merge branch 'main' into pandas-devgh-51117-positional-fold-bug
2 parents 635e0cc + 8c7b8a4 commit c942b70

File tree

118 files changed

+1003
-898
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+1003
-898
lines changed

.pre-commit-config.yaml

+3-11
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ repos:
2828
types_or: [python, pyi]
2929
additional_dependencies: [black==23.1.0]
3030
- repo: https://github.com/charliermarsh/ruff-pre-commit
31-
rev: v0.0.255
31+
rev: v0.0.259
3232
hooks:
3333
- id: ruff
3434
args: [--exit-non-zero-on-fix]
@@ -392,14 +392,6 @@ repos:
392392
files: ^pandas/
393393
exclude: ^(pandas/_libs/|pandas/tests/|pandas/errors/__init__.py$|pandas/_version.py)
394394
types: [python]
395-
- id: flake8-pyi
396-
name: flake8-pyi
397-
entry: flake8 --extend-ignore=E301,E302,E305,E701,E704
398-
types: [pyi]
399-
language: python
400-
additional_dependencies:
401-
- flake8==5.0.4
402-
- flake8-pyi==22.8.1
403395
- id: future-annotations
404396
name: import annotations from __future__
405397
entry: 'from __future__ import annotations'
@@ -421,8 +413,8 @@ repos:
421413
language: python
422414
stages: [manual]
423415
additional_dependencies:
424-
- autotyping==22.9.0
425-
- libcst==0.4.7
416+
- autotyping==23.3.0
417+
- libcst==0.4.9
426418
- id: check-test-naming
427419
name: check that test names start with 'test'
428420
entry: python -m scripts.check_test_naming

asv_bench/benchmarks/arithmetic.py

+4
Original file line numberDiff line numberDiff line change
@@ -266,10 +266,14 @@ def setup(self, tz):
266266
self.ts = self.s[halfway]
267267

268268
self.s2 = Series(date_range("20010101", periods=N, freq="s", tz=tz))
269+
self.ts_different_reso = Timestamp("2001-01-02", tz=tz)
269270

270271
def time_series_timestamp_compare(self, tz):
271272
self.s <= self.ts
272273

274+
def time_series_timestamp_different_reso_compare(self, tz):
275+
self.s <= self.ts_different_reso
276+
273277
def time_timestamp_series_compare(self, tz):
274278
self.ts >= self.s
275279

5.17 KB
Loading

doc/source/getting_started/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -533,7 +533,7 @@ Data sets do not only contain numerical data. pandas provides a wide range of fu
533533
Coming from...
534534
--------------
535535

536-
Are you familiar with other software for manipulating tablular data? Learn
536+
Are you familiar with other software for manipulating tabular data? Learn
537537
the pandas-equivalent operations compared to software you already know:
538538

539539
.. panels::

doc/source/user_guide/advanced.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
322322
.. warning::
323323

324324
You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
325-
for the **columns**. There are some ambiguous cases where the passed indexer could be mis-interpreted
325+
for the **columns**. There are some ambiguous cases where the passed indexer could be misinterpreted
326326
  as indexing *both* axes, rather than into say the ``MultiIndex`` for the rows.
327327

328328
You should do this:

doc/source/user_guide/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ the columns except the one we specify:
149149
grouped.sum()
150150
151151
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
152-
a tranpose:
152+
a transpose:
153153

154154
.. ipython::
155155

doc/source/user_guide/reshaping.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Reshaping by pivoting DataFrame objects
1313

1414
.. image:: ../_static/reshaping_pivot.png
1515

16-
Data is often stored in so-called "stacked" or "record" format:
16+
Data is often stored in so-called "stacked" or "record" format. In a "record" or "wide" format typically there is one row for each subject. In the "stacked" or "long" format there are multiple rows for each subject where applicable.
1717

1818
.. ipython:: python
1919

doc/source/user_guide/timeseries.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -507,14 +507,18 @@ used if a custom frequency string is passed.
507507
Timestamp limitations
508508
---------------------
509509

510-
Since pandas represents timestamps in nanosecond resolution, the time span that
510+
The limits of timestamp representation depend on the chosen resolution. For
511+
nanosecond resolution, the time span that
511512
can be represented using a 64-bit integer is limited to approximately 584 years:
512513

513514
.. ipython:: python
514515
515516
pd.Timestamp.min
516517
pd.Timestamp.max
517518
519+
When choosing second-resolution, the available range grows to ``+/- 2.9e11 years``.
520+
Different resolutions can be converted to each other through ``as_unit``.
521+
518522
.. seealso::
519523

520524
:ref:`timeseries.oob`

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1190,6 +1190,7 @@ Timedelta
11901190
- Bug in :func:`to_timedelta` raising error when input has nullable dtype ``Float64`` (:issue:`48796`)
11911191
- Bug in :class:`Timedelta` constructor incorrectly raising instead of returning ``NaT`` when given a ``np.timedelta64("nat")`` (:issue:`48898`)
11921192
- Bug in :class:`Timedelta` constructor failing to raise when passed both a :class:`Timedelta` object and keywords (e.g. days, seconds) (:issue:`48898`)
1193+
- Bug in :class:`Timedelta` comparisons with very large ``datetime.timedelta`` objects incorrect raising ``OutOfBoundsTimedelta`` (:issue:`49021`)
11931194

11941195
Timezones
11951196
^^^^^^^^^

doc/source/whatsnew/v2.1.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Other enhancements
3636
- :class:`api.extensions.ExtensionArray` now has a :meth:`~api.extensions.ExtensionArray.map` method (:issue:`51809`)
3737
- Improve error message when having incompatible columns using :meth:`DataFrame.merge` (:issue:`51861`)
3838
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`)
39+
- :meth:`DataFrame.applymap` now uses the :meth:`~api.extensions.ExtensionArray.map` method of underlying :class:`api.extensions.ExtensionArray` instances (:issue:`52219`)
3940
- :meth:`arrays.SparseArray.map` now supports ``na_action`` (:issue:`52096`).
4041

4142
.. ---------------------------------------------------------------------------
@@ -209,6 +210,7 @@ I/O
209210
^^^
210211
- Bug in :func:`read_html`, tail texts were removed together with elements containing ``display:none`` style (:issue:`51629`)
211212
- :meth:`DataFrame.to_orc` now raising ``ValueError`` when non-default :class:`Index` is given (:issue:`51828`)
213+
- Bug in :func:`read_html`, style elements were read into DataFrames (:issue:`52197`)
212214
-
213215

214216
Period

pandas/_config/config.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -737,7 +737,7 @@ def pp(name: str, ks: Iterable[str]) -> list[str]:
737737

738738

739739
@contextmanager
740-
def config_prefix(prefix) -> Generator[None, None, None]:
740+
def config_prefix(prefix: str) -> Generator[None, None, None]:
741741
"""
742742
contextmanager for multiple invocations of API with a common prefix
743743

pandas/_libs/parsers.pyx

+14-20
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ import sys
1010
import time
1111
import warnings
1212

13-
from pandas.errors import ParserError
1413
from pandas.util._exceptions import find_stack_level
1514

1615
from pandas import StringDtype
@@ -106,15 +105,10 @@ from pandas.errors import (
106105
ParserWarning,
107106
)
108107

109-
from pandas.core.dtypes.common import (
110-
is_bool_dtype,
111-
is_datetime64_dtype,
112-
is_extension_array_dtype,
113-
is_float_dtype,
114-
is_integer_dtype,
115-
is_object_dtype,
108+
from pandas.core.dtypes.dtypes import (
109+
CategoricalDtype,
110+
ExtensionDtype,
116111
)
117-
from pandas.core.dtypes.dtypes import CategoricalDtype
118112
from pandas.core.dtypes.inference import is_dict_like
119113

120114
cdef:
@@ -1077,7 +1071,7 @@ cdef class TextReader:
10771071

10781072
# don't try to upcast EAs
10791073
if (
1080-
na_count > 0 and not is_extension_array_dtype(col_dtype)
1074+
na_count > 0 and not isinstance(col_dtype, ExtensionDtype)
10811075
or self.dtype_backend != "numpy"
10821076
):
10831077
use_dtype_backend = self.dtype_backend != "numpy" and col_dtype is None
@@ -1142,14 +1136,14 @@ cdef class TextReader:
11421136
# (see _try_bool_flex()). Usually this would be taken care of using
11431137
# _maybe_upcast(), but if col_dtype is a floating type we should just
11441138
# take care of that cast here.
1145-
if col_res.dtype == np.bool_ and is_float_dtype(col_dtype):
1139+
if col_res.dtype == np.bool_ and col_dtype.kind == "f":
11461140
mask = col_res.view(np.uint8) == na_values[np.uint8]
11471141
col_res = col_res.astype(col_dtype)
11481142
np.putmask(col_res, mask, np.nan)
11491143
return col_res, na_count
11501144

11511145
# NaNs are already cast to True here, so can not use astype
1152-
if col_res.dtype == np.bool_ and is_integer_dtype(col_dtype):
1146+
if col_res.dtype == np.bool_ and col_dtype.kind in "iu":
11531147
if na_count > 0:
11541148
raise ValueError(
11551149
f"cannot safely convert passed user dtype of "
@@ -1193,14 +1187,14 @@ cdef class TextReader:
11931187
cats, codes, dtype, true_values=true_values)
11941188
return cat, na_count
11951189

1196-
elif is_extension_array_dtype(dtype):
1190+
elif isinstance(dtype, ExtensionDtype):
11971191
result, na_count = self._string_convert(i, start, end, na_filter,
11981192
na_hashset)
11991193

12001194
array_type = dtype.construct_array_type()
12011195
try:
12021196
# use _from_sequence_of_strings if the class defines it
1203-
if is_bool_dtype(dtype):
1197+
if dtype.kind == "b":
12041198
true_values = [x.decode() for x in self.true_values]
12051199
false_values = [x.decode() for x in self.false_values]
12061200
result = array_type._from_sequence_of_strings(
@@ -1216,7 +1210,7 @@ cdef class TextReader:
12161210

12171211
return result, na_count
12181212

1219-
elif is_integer_dtype(dtype):
1213+
elif dtype.kind in "iu":
12201214
try:
12211215
result, na_count = _try_int64(self.parser, i, start,
12221216
end, na_filter, na_hashset)
@@ -1233,14 +1227,14 @@ cdef class TextReader:
12331227

12341228
return result, na_count
12351229

1236-
elif is_float_dtype(dtype):
1230+
elif dtype.kind == "f":
12371231
result, na_count = _try_double(self.parser, i, start, end,
12381232
na_filter, na_hashset, na_flist)
12391233

12401234
if result is not None and dtype != "float64":
12411235
result = result.astype(dtype)
12421236
return result, na_count
1243-
elif is_bool_dtype(dtype):
1237+
elif dtype.kind == "b":
12441238
result, na_count = _try_bool_flex(self.parser, i, start, end,
12451239
na_filter, na_hashset,
12461240
self.true_set, self.false_set)
@@ -1267,10 +1261,10 @@ cdef class TextReader:
12671261
# unicode variable width
12681262
return self._string_convert(i, start, end, na_filter,
12691263
na_hashset)
1270-
elif is_object_dtype(dtype):
1264+
elif dtype == object:
12711265
return self._string_convert(i, start, end, na_filter,
12721266
na_hashset)
1273-
elif is_datetime64_dtype(dtype):
1267+
elif dtype.kind == "M":
12741268
raise TypeError(f"the dtype {dtype} is not supported "
12751269
f"for parsing, pass this column "
12761270
f"using parse_dates instead")
@@ -1438,7 +1432,7 @@ def _maybe_upcast(
14381432
-------
14391433
The casted array.
14401434
"""
1441-
if is_extension_array_dtype(arr.dtype):
1435+
if isinstance(arr.dtype, ExtensionDtype):
14421436
# TODO: the docstring says arr is an ndarray, in which case this cannot
14431437
# be reached. Is that incorrect?
14441438
return arr

pandas/_libs/tslibs/timedeltas.pyx

+25-2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ import warnings
44
cimport cython
55
from cpython.object cimport (
66
Py_EQ,
7+
Py_GE,
8+
Py_GT,
9+
Py_LE,
10+
Py_LT,
711
Py_NE,
812
PyObject,
913
PyObject_RichCompare,
@@ -1154,8 +1158,27 @@ cdef class _Timedelta(timedelta):
11541158
if isinstance(other, _Timedelta):
11551159
ots = other
11561160
elif is_any_td_scalar(other):
1157-
ots = Timedelta(other)
1158-
# TODO: watch out for overflows
1161+
try:
1162+
ots = Timedelta(other)
1163+
except OutOfBoundsTimedelta as err:
1164+
# GH#49021 pytimedelta.max overflows
1165+
if not PyDelta_Check(other):
1166+
# TODO: handle this case
1167+
raise
1168+
ltup = (self.days, self.seconds, self.microseconds, self.nanoseconds)
1169+
rtup = (other.days, other.seconds, other.microseconds, 0)
1170+
if op == Py_EQ:
1171+
return ltup == rtup
1172+
elif op == Py_NE:
1173+
return ltup != rtup
1174+
elif op == Py_LT:
1175+
return ltup < rtup
1176+
elif op == Py_LE:
1177+
return ltup <= rtup
1178+
elif op == Py_GT:
1179+
return ltup > rtup
1180+
elif op == Py_GE:
1181+
return ltup >= rtup
11591182

11601183
elif other is NaT:
11611184
return op == Py_NE

pandas/_testing/_random.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@
1010
RANDS_CHARS = np.array(list(string.ascii_letters + string.digits), dtype=(np.str_, 1))
1111

1212

13-
def rands_array(nchars, size, dtype: NpDtype = "O", replace: bool = True) -> np.ndarray:
13+
def rands_array(
14+
nchars, size: int, dtype: NpDtype = "O", replace: bool = True
15+
) -> np.ndarray:
1416
"""
1517
Generate an array of byte strings.
1618
"""

pandas/_testing/asserters.py

+10-5
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,8 @@
1313

1414
from pandas.core.dtypes.common import (
1515
is_bool,
16-
is_categorical_dtype,
1716
is_extension_array_dtype,
1817
is_integer_dtype,
19-
is_interval_dtype,
2018
is_number,
2119
is_numeric_dtype,
2220
needs_i8_conversion,
@@ -33,6 +31,7 @@
3331
DataFrame,
3432
DatetimeIndex,
3533
Index,
34+
IntervalDtype,
3635
IntervalIndex,
3736
MultiIndex,
3837
PeriodIndex,
@@ -238,7 +237,9 @@ def _check_types(left, right, obj: str = "Index") -> None:
238237
assert_attr_equal("inferred_type", left, right, obj=obj)
239238

240239
# Skip exact dtype checking when `check_categorical` is False
241-
if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
240+
if isinstance(left.dtype, CategoricalDtype) and isinstance(
241+
right.dtype, CategoricalDtype
242+
):
242243
if check_categorical:
243244
assert_attr_equal("dtype", left, right, obj=obj)
244245
assert_index_equal(left.categories, right.categories, exact=exact)
@@ -335,7 +336,9 @@ def _get_ilevel_values(index, level):
335336
assert_interval_array_equal(left._values, right._values)
336337

337338
if check_categorical:
338-
if is_categorical_dtype(left.dtype) or is_categorical_dtype(right.dtype):
339+
if isinstance(left.dtype, CategoricalDtype) or isinstance(
340+
right.dtype, CategoricalDtype
341+
):
339342
assert_categorical_equal(left._values, right._values, obj=f"{obj} category")
340343

341344

@@ -946,7 +949,9 @@ def assert_series_equal(
946949
f"is not equal to {right._values}."
947950
)
948951
raise AssertionError(msg)
949-
elif is_interval_dtype(left.dtype) and is_interval_dtype(right.dtype):
952+
elif isinstance(left.dtype, IntervalDtype) and isinstance(
953+
right.dtype, IntervalDtype
954+
):
950955
assert_interval_array_equal(left.array, right.array)
951956
elif isinstance(left.dtype, CategoricalDtype) or isinstance(
952957
right.dtype, CategoricalDtype

pandas/_testing/contexts.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ def ensure_safe_environment_variables() -> Generator[None, None, None]:
154154

155155

156156
@contextmanager
157-
def with_csv_dialect(name, **kwargs) -> Generator[None, None, None]:
157+
def with_csv_dialect(name: str, **kwargs) -> Generator[None, None, None]:
158158
"""
159159
Context manager to temporarily register a CSV dialect for parsing CSV.
160160

pandas/compat/numpy/function.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -342,7 +342,7 @@ def validate_take_with_convert(convert: ndarray | bool | None, args, kwargs) ->
342342
)
343343

344344

345-
def validate_groupby_func(name, args, kwargs, allowed=None) -> None:
345+
def validate_groupby_func(name: str, args, kwargs, allowed=None) -> None:
346346
"""
347347
'args' and 'kwargs' should be empty, except for allowed kwargs because all
348348
of their necessary parameters are explicitly listed in the function

0 commit comments

Comments
 (0)