Skip to content

Commit 7bc57e4

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into non-nano-strftime
2 parents b1a66f6 + 3f0af5e commit 7bc57e4

File tree

22 files changed

+350
-131
lines changed

22 files changed

+350
-131
lines changed

.github/workflows/ubuntu.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ jobs:
7777
- name: "Numpy Dev"
7878
env_file: actions-310-numpydev.yaml
7979
pattern: "not slow and not network and not single_cpu"
80-
test_args: "-W error::DeprecationWarning:numpy -W error::FutureWarning:numpy"
80+
test_args: "-W error::DeprecationWarning -W error::FutureWarning"
8181
error_on_warnings: "0"
8282
exclude:
8383
- env_file: actions-38.yaml

doc/source/whatsnew/v1.5.1.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -76,14 +76,14 @@ Fixed regressions
7676
- Fixed regression in :meth:`DataFrame.loc` raising ``FutureWarning`` when setting an empty :class:`DataFrame` (:issue:`48480`)
7777
- Fixed regression in :meth:`DataFrame.describe` raising ``TypeError`` when result contains ``NA`` (:issue:`48778`)
7878
- Fixed regression in :meth:`DataFrame.plot` ignoring invalid ``colormap`` for ``kind="scatter"`` (:issue:`48726`)
79-
- Fixed regression in :meth:`MultiIndex.values`` resetting ``freq`` attribute of underlying :class:`Index` object (:issue:`49054`)
79+
- Fixed regression in :meth:`MultiIndex.values` resetting ``freq`` attribute of underlying :class:`Index` object (:issue:`49054`)
8080
- Fixed performance regression in :func:`factorize` when ``na_sentinel`` is not ``None`` and ``sort=False`` (:issue:`48620`)
8181
- Fixed regression causing an ``AttributeError`` during warning emitted if the provided table name in :meth:`DataFrame.to_sql` and the table name actually used in the database do not match (:issue:`48733`)
8282
- Fixed regression in :func:`to_datetime` when ``arg`` was a date string with nanosecond and ``format`` contained ``%f`` would raise a ``ValueError`` (:issue:`48767`)
83-
- Fixed regression in :func:`assert_frame_equal` raising for :class:`MultiIndex` with :class:`Categorical` and ``check_like=True`` (:issue:`48975`)
83+
- Fixed regression in :func:`testing.assert_frame_equal` raising for :class:`MultiIndex` with :class:`Categorical` and ``check_like=True`` (:issue:`48975`)
8484
- Fixed regression in :meth:`DataFrame.fillna` replacing wrong values for ``datetime64[ns]`` dtype and ``inplace=True`` (:issue:`48863`)
8585
- Fixed :meth:`.DataFrameGroupBy.size` not returning a Series when ``axis=1`` (:issue:`48738`)
86-
- Fixed Regression in :meth:`DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)
86+
- Fixed Regression in :meth:`.DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)
8787
- Fixed regression in :meth:`DataFrame.apply` when passing non-zero ``axis`` via keyword argument (:issue:`48656`)
8888
- Fixed regression in :meth:`Series.groupby` and :meth:`DataFrame.groupby` when the grouper is a nullable data type (e.g. :class:`Int64`) or a PyArrow-backed string array, contains null values, and ``dropna=False`` (:issue:`48794`)
8989
- Fixed performance regression in :meth:`Series.isin` with mismatching dtypes (:issue:`49162`)
@@ -99,7 +99,7 @@ Bug fixes
9999
~~~~~~~~~
100100
- Bug in :meth:`Series.__getitem__` not falling back to positional for integer keys and boolean :class:`Index` (:issue:`48653`)
101101
- Bug in :meth:`DataFrame.to_hdf` raising ``AssertionError`` with boolean index (:issue:`48667`)
102-
- Bug in :func:`assert_index_equal` for extension arrays with non matching ``NA`` raising ``ValueError`` (:issue:`48608`)
102+
- Bug in :func:`testing.assert_index_equal` for extension arrays with non matching ``NA`` raising ``ValueError`` (:issue:`48608`)
103103
- Bug in :meth:`DataFrame.pivot_table` raising unexpected ``FutureWarning`` when setting datetime column as index (:issue:`48683`)
104104
- Bug in :meth:`DataFrame.sort_values` emitting unnecessary ``FutureWarning`` when called on :class:`DataFrame` with boolean sparse columns (:issue:`48784`)
105105
- Bug in :class:`.arrays.ArrowExtensionArray` with a comparison operator to an invalid object would not raise a ``NotImplementedError`` (:issue:`48833`)

doc/source/whatsnew/v1.5.3.rst

+4-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_153:
22

3-
What's new in 1.5.3 (December ??, 2022)
4-
---------------------------------------
3+
What's new in 1.5.3 (January 18, 2023)
4+
--------------------------------------
55

66
These are the changes in pandas 1.5.3. See :ref:`release` for a full changelog
77
including other versions of pandas.
@@ -15,12 +15,11 @@ Fixed regressions
1515
~~~~~~~~~~~~~~~~~
1616
- Fixed performance regression in :meth:`Series.isin` when ``values`` is empty (:issue:`49839`)
1717
- Fixed regression in :meth:`DataFrame.memory_usage` showing unnecessary ``FutureWarning`` when :class:`DataFrame` is empty (:issue:`50066`)
18-
- Fixed regression in :meth:`DataFrameGroupBy.transform` when used with ``as_index=False`` (:issue:`49834`)
18+
- Fixed regression in :meth:`.DataFrameGroupBy.transform` when used with ``as_index=False`` (:issue:`49834`)
1919
- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
20-
- Fixed regression in :meth:`SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
20+
- Fixed regression in :meth:`.SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
2121
- Fixed performance regression in setting with the :meth:`~DataFrame.at` indexer (:issue:`49771`)
2222
- Fixed regression in :func:`to_datetime` raising ``ValueError`` when parsing array of ``float`` containing ``np.nan`` (:issue:`50237`)
23-
-
2423

2524
.. ---------------------------------------------------------------------------
2625
.. _whatsnew_153.bug_fixes:
@@ -48,7 +47,6 @@ Other
4847
as pandas works toward compatibility with SQLAlchemy 2.0.
4948

5049
- Reverted deprecation (:issue:`45324`) of behavior of :meth:`Series.__getitem__` and :meth:`Series.__setitem__` slicing with an integer :class:`Index`; this will remain positional (:issue:`49612`)
51-
-
5250

5351
.. ---------------------------------------------------------------------------
5452
.. _whatsnew_153.contributors:

doc/source/whatsnew/v2.0.0.rst

+8-5
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4141
* :func:`read_excel`
4242
* :func:`read_html`
4343
* :func:`read_xml`
44+
* :func:`read_json`
4445
* :func:`read_sql`
4546
* :func:`read_sql_query`
4647
* :func:`read_sql_table`
@@ -56,6 +57,7 @@ to select the nullable dtypes implementation.
5657
* :func:`read_excel`
5758
* :func:`read_html`
5859
* :func:`read_xml`
60+
* :func:`read_json`
5961
* :func:`read_parquet`
6062
* :func:`read_orc`
6163
* :func:`read_feather`
@@ -92,10 +94,9 @@ Copy-on-Write improvements
9294
was added to the following methods:
9395

9496
- :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
95-
- :meth:`DataFrame.set_index` / :meth:`Series.set_index`
97+
- :meth:`DataFrame.set_index`
9698
- :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
9799
- :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
98-
- :meth:`DataFrame.rename_columns`
99100
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
100101
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
101102
- :meth:`DataFrame.assign`
@@ -175,8 +176,8 @@ These are bug fixes that might have notable behavior changes.
175176

176177
.. _whatsnew_200.notable_bug_fixes.cumsum_cumprod_overflow:
177178

178-
:meth:`.GroupBy.cumsum` and :meth:`.GroupBy.cumprod` overflow instead of lossy casting to float
179-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
179+
:meth:`.DataFrameGroupBy.cumsum` and :meth:`.DataFrameGroupBy.cumprod` overflow instead of lossy casting to float
180+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
180181

181182
In previous versions we cast to float when applying ``cumsum`` and ``cumprod`` which
182183
lead to incorrect results even if the result could be hold by ``int64`` dtype.
@@ -822,7 +823,7 @@ Removal of prior version deprecations/changes
822823

823824
Performance improvements
824825
~~~~~~~~~~~~~~~~~~~~~~~~
825-
- Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.GroupBy.cumprod` for nullable dtypes (:issue:`37493`)
826+
- Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.DataFrameGroupBy.cumprod` for nullable dtypes (:issue:`37493`)
826827
- Performance improvement in :meth:`.DataFrameGroupBy.all`, :meth:`.DataFrameGroupBy.any`, :meth:`.SeriesGroupBy.all`, and :meth:`.SeriesGroupBy.any` for object dtype (:issue:`50623`)
827828
- Performance improvement in :meth:`MultiIndex.argsort` and :meth:`MultiIndex.sort_values` (:issue:`48406`)
828829
- Performance improvement in :meth:`MultiIndex.size` (:issue:`48723`)
@@ -952,6 +953,7 @@ Conversion
952953
Strings
953954
^^^^^^^
954955
- Bug in :func:`pandas.api.dtypes.is_string_dtype` that would not return ``True`` for :class:`StringDtype` (:issue:`15585`)
956+
- Bug in converting string dtypes to "datetime64[ns]" or "timedelta64[ns]" incorrectly raising ``TypeError`` (:issue:`36153`)
955957
-
956958

957959
Interval
@@ -1018,6 +1020,7 @@ I/O
10181020
- Bug in :meth:`DataFrame.to_string` ignoring float formatter for extension arrays (:issue:`39336`)
10191021
- Fixed memory leak which stemmed from the initialization of the internal JSON module (:issue:`49222`)
10201022
- Fixed issue where :func:`json_normalize` would incorrectly remove leading characters from column names that matched the ``sep`` argument (:issue:`49861`)
1023+
- Bug in :meth:`DataFrame.to_dict` not converting ``NA`` to ``None`` (:issue:`50795`)
10211024
- Bug in :meth:`DataFrame.to_json` where it would segfault when failing to encode a string (:issue:`50307`)
10221025

10231026
Period

pandas/compat/pickle_compat.py

+14-1
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,20 @@ def load_reduce(self):
120120
),
121121
("pandas.indexes.numeric", "Float64Index"): (
122122
"pandas.core.indexes.numeric",
123-
"Float64Index",
123+
"Index", # updated in 50775
124+
),
125+
# 50775, remove Int64Index, UInt64Index & Float64Index from codabase
126+
("pandas.core.indexes.numeric", "Int64Index"): (
127+
"pandas.core.indexes.base",
128+
"Index",
129+
),
130+
("pandas.core.indexes.numeric", "UInt64Index"): (
131+
"pandas.core.indexes.base",
132+
"Index",
133+
),
134+
("pandas.core.indexes.numeric", "Float64Index"): (
135+
"pandas.core.indexes.base",
136+
"Index",
124137
),
125138
}
126139

pandas/core/arrays/base.py

+12
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,11 @@
5555

5656
from pandas.core.dtypes.cast import maybe_cast_to_extension_array
5757
from pandas.core.dtypes.common import (
58+
is_datetime64_dtype,
5859
is_dtype_equal,
5960
is_list_like,
6061
is_scalar,
62+
is_timedelta64_dtype,
6163
pandas_dtype,
6264
)
6365
from pandas.core.dtypes.dtypes import ExtensionDtype
@@ -580,6 +582,16 @@ def astype(self, dtype: AstypeArg, copy: bool = True) -> ArrayLike:
580582
cls = dtype.construct_array_type()
581583
return cls._from_sequence(self, dtype=dtype, copy=copy)
582584

585+
elif is_datetime64_dtype(dtype):
586+
from pandas.core.arrays import DatetimeArray
587+
588+
return DatetimeArray._from_sequence(self, dtype=dtype, copy=copy)
589+
590+
elif is_timedelta64_dtype(dtype):
591+
from pandas.core.arrays import TimedeltaArray
592+
593+
return TimedeltaArray._from_sequence(self, dtype=dtype, copy=copy)
594+
583595
return np.array(self, dtype=dtype, copy=copy)
584596

585597
def isna(self) -> np.ndarray | ExtensionArraySupportsAnyAll:

pandas/core/dtypes/cast.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@
2020
import numpy as np
2121

2222
from pandas._libs import lib
23+
from pandas._libs.missing import (
24+
NA,
25+
NAType,
26+
)
2327
from pandas._libs.tslibs import (
2428
NaT,
2529
OutOfBoundsDatetime,
@@ -176,7 +180,7 @@ def maybe_box_datetimelike(value: Scalar, dtype: Dtype | None = None) -> Scalar:
176180
return value
177181

178182

179-
def maybe_box_native(value: Scalar) -> Scalar:
183+
def maybe_box_native(value: Scalar | None | NAType) -> Scalar | None | NAType:
180184
"""
181185
If passed a scalar cast the scalar to a python native type.
182186
@@ -202,6 +206,8 @@ def maybe_box_native(value: Scalar) -> Scalar:
202206
value = bool(value)
203207
elif isinstance(value, (np.datetime64, np.timedelta64)):
204208
value = maybe_box_datetimelike(value)
209+
elif value is NA:
210+
value = None
205211
return value
206212

207213

@@ -1590,7 +1596,15 @@ def maybe_cast_to_integer_array(arr: list | np.ndarray, dtype: np.dtype) -> np.n
15901596

15911597
try:
15921598
if not isinstance(arr, np.ndarray):
1593-
casted = np.array(arr, dtype=dtype, copy=False)
1599+
with warnings.catch_warnings():
1600+
# We already disallow dtype=uint w/ negative numbers
1601+
# (test_constructor_coercion_signed_to_unsigned) so safe to ignore.
1602+
warnings.filterwarnings(
1603+
"ignore",
1604+
"NumPy will stop allowing conversion of out-of-bound Python int",
1605+
DeprecationWarning,
1606+
)
1607+
casted = np.array(arr, dtype=dtype, copy=False)
15941608
else:
15951609
casted = arr.astype(dtype, copy=False)
15961610
except OverflowError as err:

pandas/core/dtypes/generic.py

-20
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,6 @@
2929
TimedeltaArray,
3030
)
3131
from pandas.core.generic import NDFrame
32-
from pandas.core.indexes.api import (
33-
Float64Index,
34-
Int64Index,
35-
UInt64Index,
36-
)
3732

3833

3934
# define abstract base classes to enable isinstance type checking on our
@@ -62,22 +57,10 @@ def _subclasscheck(cls, inst) -> bool:
6257
return meta(name, (), dct)
6358

6459

65-
ABCInt64Index = cast(
66-
"Type[Int64Index]",
67-
create_pandas_abc_type("ABCInt64Index", "_typ", ("int64index",)),
68-
)
69-
ABCUInt64Index = cast(
70-
"Type[UInt64Index]",
71-
create_pandas_abc_type("ABCUInt64Index", "_typ", ("uint64index",)),
72-
)
7360
ABCRangeIndex = cast(
7461
"Type[RangeIndex]",
7562
create_pandas_abc_type("ABCRangeIndex", "_typ", ("rangeindex",)),
7663
)
77-
ABCFloat64Index = cast(
78-
"Type[Float64Index]",
79-
create_pandas_abc_type("ABCFloat64Index", "_typ", ("float64index",)),
80-
)
8164
ABCMultiIndex = cast(
8265
"Type[MultiIndex]",
8366
create_pandas_abc_type("ABCMultiIndex", "_typ", ("multiindex",)),
@@ -109,10 +92,7 @@ def _subclasscheck(cls, inst) -> bool:
10992
"_typ",
11093
{
11194
"index",
112-
"int64index",
11395
"rangeindex",
114-
"float64index",
115-
"uint64index",
11696
"numericindex",
11797
"multiindex",
11898
"datetimeindex",

pandas/core/methods/to_dict.py

+16-11
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@
66
from pandas.util._exceptions import find_stack_level
77

88
from pandas.core.dtypes.cast import maybe_box_native
9-
from pandas.core.dtypes.common import is_object_dtype
9+
from pandas.core.dtypes.common import (
10+
is_extension_array_dtype,
11+
is_object_dtype,
12+
)
1013

1114
from pandas import DataFrame
1215
from pandas.core import common as com
@@ -88,16 +91,18 @@ def to_dict(
8891
# GH46470 Return quickly if orient series to avoid creating dtype objects
8992
return into_c((k, v) for k, v in df.items())
9093

91-
object_dtype_indices = [
92-
i for i, col_dtype in enumerate(df.dtypes.values) if is_object_dtype(col_dtype)
94+
box_native_indices = [
95+
i
96+
for i, col_dtype in enumerate(df.dtypes.values)
97+
if is_object_dtype(col_dtype) or is_extension_array_dtype(col_dtype)
9398
]
94-
are_all_object_dtype_cols = len(object_dtype_indices) == len(df.dtypes)
99+
are_all_object_dtype_cols = len(box_native_indices) == len(df.dtypes)
95100

96101
if orient == "dict":
97102
return into_c((k, v.to_dict(into)) for k, v in df.items())
98103

99104
elif orient == "list":
100-
object_dtype_indices_as_set = set(object_dtype_indices)
105+
object_dtype_indices_as_set = set(box_native_indices)
101106
return into_c(
102107
(
103108
k,
@@ -110,7 +115,7 @@ def to_dict(
110115

111116
elif orient == "split":
112117
data = df._create_data_for_split_and_tight_to_dict(
113-
are_all_object_dtype_cols, object_dtype_indices
118+
are_all_object_dtype_cols, box_native_indices
114119
)
115120

116121
return into_c(
@@ -123,7 +128,7 @@ def to_dict(
123128

124129
elif orient == "tight":
125130
data = df._create_data_for_split_and_tight_to_dict(
126-
are_all_object_dtype_cols, object_dtype_indices
131+
are_all_object_dtype_cols, box_native_indices
127132
)
128133

129134
return into_c(
@@ -155,8 +160,8 @@ def to_dict(
155160
data = [
156161
into_c(zip(columns, t)) for t in df.itertuples(index=False, name=None)
157162
]
158-
if object_dtype_indices:
159-
object_dtype_indices_as_set = set(object_dtype_indices)
163+
if box_native_indices:
164+
object_dtype_indices_as_set = set(box_native_indices)
160165
object_dtype_cols = {
161166
col
162167
for i, col in enumerate(df.columns)
@@ -176,8 +181,8 @@ def to_dict(
176181
(t[0], dict(zip(df.columns, map(maybe_box_native, t[1:]))))
177182
for t in df.itertuples(name=None)
178183
)
179-
elif object_dtype_indices:
180-
object_dtype_indices_as_set = set(object_dtype_indices)
184+
elif box_native_indices:
185+
object_dtype_indices_as_set = set(box_native_indices)
181186
is_object_dtype_by_index = [
182187
i in object_dtype_indices_as_set for i in range(len(df.columns))
183188
]

pandas/core/series.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@
8989
from pandas.core.dtypes.common import (
9090
ensure_platform_int,
9191
is_dict_like,
92+
is_extension_array_dtype,
9293
is_integer,
9394
is_iterator,
9495
is_list_like,
@@ -1832,7 +1833,7 @@ def to_dict(self, into: type[dict] = dict) -> dict:
18321833
# GH16122
18331834
into_c = com.standardize_mapping(into)
18341835

1835-
if is_object_dtype(self):
1836+
if is_object_dtype(self) or is_extension_array_dtype(self):
18361837
return into_c((k, maybe_box_native(v)) for k, v in self.items())
18371838
else:
18381839
# Not an object dtype => all types will be the same so let the default

0 commit comments

Comments
 (0)