Skip to content

Commit 8cc2a11

Browse files
committed
Merge remote-tracking branch 'upstream/main' into issue-60099
2 parents 7b92ce1 + 4cef979 commit 8cc2a11

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+479
-219
lines changed

doc/source/whatsnew/v2.3.0.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ MultiIndex
133133

134134
I/O
135135
^^^
136-
-
136+
- :meth:`DataFrame.to_excel` was storing decimals as strings instead of numbers (:issue:`49598`)
137137
-
138138

139139
Period
@@ -173,7 +173,8 @@ Styler
173173

174174
Other
175175
^^^^^
176-
-
176+
- Fixed usage of ``inspect`` when the optional dependencies ``pyarrow`` or ``jinja2``
177+
are not installed (:issue:`60196`)
177178
-
178179

179180
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v3.0.0.rst

+4
Original file line numberDiff line numberDiff line change
@@ -613,6 +613,7 @@ Categorical
613613
Datetimelike
614614
^^^^^^^^^^^^
615615
- Bug in :attr:`is_year_start` where a DateTimeIndex constructed via a date_range with frequency 'MS' wouldn't have the correct year or quarter start attributes (:issue:`57377`)
616+
- Bug in :class:`DataFrame` raising ``ValueError`` when ``dtype`` is ``timedelta64`` and ``data`` is a list containing ``None`` (:issue:`60064`)
616617
- Bug in :class:`Timestamp` constructor failing to raise when ``tz=None`` is explicitly specified in conjunction with timezone-aware ``tzinfo`` or data (:issue:`48688`)
617618
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
618619
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
@@ -710,6 +711,7 @@ Period
710711
Plotting
711712
^^^^^^^^
712713
- Bug in :meth:`.DataFrameGroupBy.boxplot` failed when there were multiple groupings (:issue:`14701`)
714+
- Bug in :meth:`DataFrame.plot.bar` with ``stacked=True`` where labels on stacked bars with zero-height segments were incorrectly positioned at the base instead of the label position of the previous segment (:issue:`59429`)
713715
- Bug in :meth:`DataFrame.plot.line` raising ``ValueError`` when set both color and a ``dict`` style (:issue:`59461`)
714716
- Bug in :meth:`DataFrame.plot` that causes a shift to the right when the frequency multiplier is greater than one. (:issue:`57587`)
715717
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
@@ -771,7 +773,9 @@ Other
771773
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
772774
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)
773775
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which did not allow to use ``tan`` function. (:issue:`55091`)
776+
- Bug in :meth:`DataFrame.query` where using duplicate column names led to a ``TypeError``. (:issue:`59950`)
774777
- Bug in :meth:`DataFrame.query` which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character ``#``, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (:issue:`59285`) (:issue:`49633`)
778+
- Bug in :meth:`DataFrame.shift` where passing a ``freq`` on a DataFrame with no columns did not shift the index correctly. (:issue:`60102`)
775779
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` and ``ascending=False`` not returning a :class:`RangeIndex` columns (:issue:`57293`)
776780
- Bug in :meth:`DataFrame.transform` that was returning the wrong order unless the index was monotonically increasing. (:issue:`57069`)
777781
- Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`)

pandas/_libs/tslibs/period.pyx

+2
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ from pandas._libs.tslibs.offsets import (
114114
INVALID_FREQ_ERR_MSG,
115115
BDay,
116116
)
117+
from pandas.util._decorators import set_module
117118

118119
cdef:
119120
enum:
@@ -2830,6 +2831,7 @@ cdef class _Period(PeriodMixin):
28302831
return period_format(self.ordinal, base, fmt)
28312832

28322833

2834+
@set_module("pandas")
28332835
class Period(_Period):
28342836
"""
28352837
Represents a period of time.

pandas/_libs/tslibs/timedeltas.pyx

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import collections
22
import warnings
33

4+
from pandas.util._decorators import set_module
45
from pandas.util._exceptions import find_stack_level
56

67
cimport cython
@@ -1854,7 +1855,7 @@ cdef class _Timedelta(timedelta):
18541855

18551856
# Python front end to C extension type _Timedelta
18561857
# This serves as the box for timedelta64
1857-
1858+
@set_module("pandas")
18581859
class Timedelta(_Timedelta):
18591860
"""
18601861
Represents a duration, the difference between two dates or times.
@@ -1916,7 +1917,7 @@ class Timedelta(_Timedelta):
19161917
--------
19171918
Here we initialize Timedelta object with both value and unit
19181919
1919-
>>> td = pd.Timedelta(1, "d")
1920+
>>> td = pd.Timedelta(1, "D")
19201921
>>> td
19211922
Timedelta('1 days 00:00:00')
19221923

pandas/_libs/tslibs/timestamps.pyx

+3-2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ import datetime as dt
5050
from pandas._libs.tslibs cimport ccalendar
5151
from pandas._libs.tslibs.base cimport ABCTimestamp
5252

53+
from pandas.util._decorators import set_module
5354
from pandas.util._exceptions import find_stack_level
5455

5556
from pandas._libs.tslibs.conversion cimport (
@@ -1648,7 +1649,7 @@ cdef class _Timestamp(ABCTimestamp):
16481649
# Python front end to C extension type _Timestamp
16491650
# This serves as the box for datetime64
16501651
1651-
1652+
@set_module("pandas")
16521653
class Timestamp(_Timestamp):
16531654
"""
16541655
Pandas replacement for python datetime.datetime object.
@@ -2926,7 +2927,7 @@ timedelta}, default 'raise'
29262927
--------
29272928
>>> ts = pd.Timestamp(1584226800, unit='s', tz='Europe/Stockholm')
29282929
>>> ts.tz
2929-
<DstTzInfo 'Europe/Stockholm' CET+1:00:00 STD>
2930+
zoneinfo.ZoneInfo(key='Europe/Stockholm')
29302931
"""
29312932
return self.tzinfo
29322933

pandas/_testing/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,8 @@ def shares_memory(left, right) -> bool:
501501
if isinstance(left, MultiIndex):
502502
return shares_memory(left._codes, right)
503503
if isinstance(left, (Index, Series)):
504+
if isinstance(right, (Index, Series)):
505+
return shares_memory(left._values, right._values)
504506
return shares_memory(left._values, right)
505507

506508
if isinstance(left, NDArrayBackedExtensionArray):

pandas/conftest.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -600,7 +600,7 @@ def multiindex_year_month_day_dataframe_random_data():
600600
"""
601601
tdf = DataFrame(
602602
np.random.default_rng(2).standard_normal((100, 4)),
603-
columns=Index(list("ABCD"), dtype=object),
603+
columns=Index(list("ABCD")),
604604
index=date_range("2000-01-01", periods=100, freq="B"),
605605
)
606606
ymd = tdf.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day]).sum()
@@ -787,7 +787,7 @@ def string_series() -> Series:
787787
"""
788788
return Series(
789789
np.arange(30, dtype=np.float64) * 1.1,
790-
index=Index([f"i_{i}" for i in range(30)], dtype=object),
790+
index=Index([f"i_{i}" for i in range(30)]),
791791
name="series",
792792
)
793793

@@ -798,7 +798,7 @@ def object_series() -> Series:
798798
Fixture for Series of dtype object with Index of unique strings
799799
"""
800800
data = [f"foo_{i}" for i in range(30)]
801-
index = Index([f"bar_{i}" for i in range(30)], dtype=object)
801+
index = Index([f"bar_{i}" for i in range(30)])
802802
return Series(data, index=index, name="objects", dtype=object)
803803

804804

@@ -890,8 +890,8 @@ def int_frame() -> DataFrame:
890890
"""
891891
return DataFrame(
892892
np.ones((30, 4), dtype=np.int64),
893-
index=Index([f"foo_{i}" for i in range(30)], dtype=object),
894-
columns=Index(list("ABCD"), dtype=object),
893+
index=Index([f"foo_{i}" for i in range(30)]),
894+
columns=Index(list("ABCD")),
895895
)
896896

897897

pandas/core/arrays/arrow/accessors.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def _is_valid_pyarrow_dtype(self, pyarrow_dtype) -> bool:
4646

4747
def _validate(self, data) -> None:
4848
dtype = data.dtype
49-
if not isinstance(dtype, ArrowDtype):
49+
if pa_version_under10p1 or not isinstance(dtype, ArrowDtype):
5050
# Raise AttributeError so that inspect can handle non-struct Series.
5151
raise AttributeError(self._validation_msg.format(dtype=dtype))
5252

pandas/core/arrays/arrow/array.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1145,7 +1145,7 @@ def fillna(
11451145
try:
11461146
fill_value = self._box_pa(value, pa_type=self._pa_array.type)
11471147
except pa.ArrowTypeError as err:
1148-
msg = f"Invalid value '{value!s}' for dtype {self.dtype}"
1148+
msg = f"Invalid value '{value!s}' for dtype '{self.dtype}'"
11491149
raise TypeError(msg) from err
11501150

11511151
try:
@@ -2136,7 +2136,7 @@ def _maybe_convert_setitem_value(self, value):
21362136
try:
21372137
value = self._box_pa(value, self._pa_array.type)
21382138
except pa.ArrowTypeError as err:
2139-
msg = f"Invalid value '{value!s}' for dtype {self.dtype}"
2139+
msg = f"Invalid value '{value!s}' for dtype '{self.dtype}'"
21402140
raise TypeError(msg) from err
21412141
return value
21422142

pandas/core/arrays/masked.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def _validate_setitem_value(self, value):
286286

287287
# Note: without the "str" here, the f-string rendering raises in
288288
# py38 builds.
289-
raise TypeError(f"Invalid value '{value!s}' for dtype {self.dtype}")
289+
raise TypeError(f"Invalid value '{value!s}' for dtype '{self.dtype}'")
290290

291291
def __setitem__(self, key, value) -> None:
292292
key = check_array_indexer(self, key)

pandas/core/arrays/string_.py

+26-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from __future__ import annotations
22

3+
from functools import partial
34
import operator
45
from typing import (
56
TYPE_CHECKING,
@@ -64,6 +65,8 @@
6465
from pandas.core.indexers import check_array_indexer
6566
from pandas.core.missing import isna
6667

68+
from pandas.io.formats import printing
69+
6770
if TYPE_CHECKING:
6871
import pyarrow
6972

@@ -391,6 +394,14 @@ def _from_scalars(cls, scalars, dtype: DtypeObj) -> Self:
391394
raise ValueError
392395
return cls._from_sequence(scalars, dtype=dtype)
393396

397+
def _formatter(self, boxed: bool = False):
398+
formatter = partial(
399+
printing.pprint_thing,
400+
escape_chars=("\t", "\r", "\n"),
401+
quote_strings=not boxed,
402+
)
403+
return formatter
404+
394405
def _str_map(
395406
self,
396407
f,
@@ -641,7 +652,8 @@ def _validate_scalar(self, value):
641652
return self.dtype.na_value
642653
elif not isinstance(value, str):
643654
raise TypeError(
644-
f"Cannot set non-string value '{value}' into a string array."
655+
f"Invalid value '{value}' for dtype '{self.dtype}'. Value should be a "
656+
f"string or missing value, got '{type(value).__name__}' instead."
645657
)
646658
return value
647659

@@ -732,7 +744,9 @@ def __setitem__(self, key, value) -> None:
732744
value = self.dtype.na_value
733745
elif not isinstance(value, str):
734746
raise TypeError(
735-
f"Cannot set non-string value '{value}' into a StringArray."
747+
f"Invalid value '{value}' for dtype '{self.dtype}'. Value should "
748+
f"be a string or missing value, got '{type(value).__name__}' "
749+
"instead."
736750
)
737751
else:
738752
if not is_array_like(value):
@@ -742,7 +756,10 @@ def __setitem__(self, key, value) -> None:
742756
# compatible, compatibility with arrow backed strings
743757
value = np.asarray(value)
744758
if len(value) and not lib.is_string_array(value, skipna=True):
745-
raise TypeError("Must provide strings.")
759+
raise TypeError(
760+
"Invalid value for dtype 'str'. Value should be a "
761+
"string or missing value (or array of those)."
762+
)
746763

747764
mask = isna(value)
748765
if mask.any():
@@ -757,6 +774,12 @@ def _putmask(self, mask: npt.NDArray[np.bool_], value) -> None:
757774
# base class implementation that uses __setitem__
758775
ExtensionArray._putmask(self, mask, value)
759776

777+
def _where(self, mask: npt.NDArray[np.bool_], value) -> Self:
778+
# the super() method NDArrayBackedExtensionArray._where uses
779+
# np.putmask which doesn't properly handle None/pd.NA, so using the
780+
# base class implementation that uses __setitem__
781+
return ExtensionArray._where(self, mask, value)
782+
760783
def isin(self, values: ArrayLike) -> npt.NDArray[np.bool_]:
761784
if isinstance(values, BaseStringArray) or (
762785
isinstance(values, ExtensionArray) and is_string_dtype(values.dtype)

pandas/core/arrays/string_arrow.py

+19-3
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
from pandas.compat import (
1818
pa_version_under10p1,
1919
pa_version_under13p0,
20+
pa_version_under16p0,
2021
)
2122
from pandas.util._exceptions import find_stack_level
2223

@@ -71,6 +72,10 @@ def _chk_pyarrow_available() -> None:
7172
raise ImportError(msg)
7273

7374

75+
def _is_string_view(typ):
76+
return not pa_version_under16p0 and pa.types.is_string_view(typ)
77+
78+
7479
# TODO: Inherit directly from BaseStringArrayMethods. Currently we inherit from
7580
# ObjectStringArrayMixin because we want to have the object-dtype based methods as
7681
# fallback for the ones that pyarrow doesn't yet support
@@ -128,11 +133,13 @@ def __init__(self, values) -> None:
128133
_chk_pyarrow_available()
129134
if isinstance(values, (pa.Array, pa.ChunkedArray)) and (
130135
pa.types.is_string(values.type)
136+
or _is_string_view(values.type)
131137
or (
132138
pa.types.is_dictionary(values.type)
133139
and (
134140
pa.types.is_string(values.type.value_type)
135141
or pa.types.is_large_string(values.type.value_type)
142+
or _is_string_view(values.type.value_type)
136143
)
137144
)
138145
):
@@ -216,7 +223,10 @@ def insert(self, loc: int, item) -> ArrowStringArray:
216223
if self.dtype.na_value is np.nan and item is np.nan:
217224
item = libmissing.NA
218225
if not isinstance(item, str) and item is not libmissing.NA:
219-
raise TypeError("Scalar must be NA or str")
226+
raise TypeError(
227+
f"Invalid value '{item}' for dtype 'str'. Value should be a "
228+
f"string or missing value, got '{type(item).__name__}' instead."
229+
)
220230
return super().insert(loc, item)
221231

222232
def _convert_bool_result(self, values, na=lib.no_default, method_name=None):
@@ -248,13 +258,19 @@ def _maybe_convert_setitem_value(self, value):
248258
if isna(value):
249259
value = None
250260
elif not isinstance(value, str):
251-
raise TypeError("Scalar must be NA or str")
261+
raise TypeError(
262+
f"Invalid value '{value}' for dtype 'str'. Value should be a "
263+
f"string or missing value, got '{type(value).__name__}' instead."
264+
)
252265
else:
253266
value = np.array(value, dtype=object, copy=True)
254267
value[isna(value)] = None
255268
for v in value:
256269
if not (v is None or isinstance(v, str)):
257-
raise TypeError("Must provide strings")
270+
raise TypeError(
271+
"Invalid value for dtype 'str'. Value should be a "
272+
"string or missing value (or array of those)."
273+
)
258274
return super()._maybe_convert_setitem_value(value)
259275

260276
def isin(self, values: ArrayLike) -> npt.NDArray[np.bool_]:

pandas/core/base.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -361,8 +361,11 @@ def __len__(self) -> int:
361361
# We need this defined here for mypy
362362
raise AbstractMethodError(self)
363363

364+
# Temporarily avoid using `-> Literal[1]:` because of an IPython (jedi) bug
365+
# https://github.com/ipython/ipython/issues/14412
366+
# https://github.com/davidhalter/jedi/issues/1990
364367
@property
365-
def ndim(self) -> Literal[1]:
368+
def ndim(self) -> int:
366369
"""
367370
Number of dimensions of the underlying data, by definition 1.
368371

pandas/core/config_init.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,10 @@ def use_numba_cb(key: str) -> None:
100100
: int
101101
If max_rows is exceeded, switch to truncate view. Depending on
102102
`large_repr`, objects are either centrally truncated or printed as
103-
a summary view. 'None' value means unlimited.
103+
a summary view.
104+
105+
'None' value means unlimited. Beware that printing a large number of rows
106+
could cause your rendering environment (the browser, etc.) to crash.
104107
105108
In case python/IPython is running in a terminal and `large_repr`
106109
equals 'truncate' this can be set to 0 and pandas will auto-detect
@@ -121,7 +124,11 @@ def use_numba_cb(key: str) -> None:
121124
: int
122125
If max_cols is exceeded, switch to truncate view. Depending on
123126
`large_repr`, objects are either centrally truncated or printed as
124-
a summary view. 'None' value means unlimited.
127+
a summary view.
128+
129+
'None' value means unlimited. Beware that printing a large number of
130+
columns could cause your rendering environment (the browser, etc.) to
131+
crash.
125132
126133
In case python/IPython is running in a terminal and `large_repr`
127134
equals 'truncate' this can be set to 0 or None and pandas will auto-detect

pandas/core/construction.py

+6
Original file line numberDiff line numberDiff line change
@@ -807,6 +807,12 @@ def _try_cast(
807807
)
808808

809809
elif dtype.kind in "mM":
810+
if is_ndarray:
811+
arr = cast(np.ndarray, arr)
812+
if arr.ndim == 2 and arr.shape[1] == 1:
813+
# GH#60081: DataFrame Constructor converts 1D data to array of
814+
# shape (N, 1), but maybe_cast_to_datetime assumes 1D input
815+
return maybe_cast_to_datetime(arr[:, 0], dtype).reshape(arr.shape)
810816
return maybe_cast_to_datetime(arr, dtype)
811817

812818
# GH#15832: Check if we are requesting a numeric dtype and

pandas/core/dtypes/cast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1205,7 +1205,7 @@ def maybe_infer_to_datetimelike(
12051205

12061206
def maybe_cast_to_datetime(
12071207
value: np.ndarray | list, dtype: np.dtype
1208-
) -> ExtensionArray | np.ndarray:
1208+
) -> DatetimeArray | TimedeltaArray | np.ndarray:
12091209
"""
12101210
try to cast the array/value to a datetimelike dtype, converting float
12111211
nan to iNaT

0 commit comments

Comments
 (0)