Skip to content

Commit 2f66f87

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into allow-mixed-iso
2 parents b247bbd + 73840ef commit 2f66f87

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+826
-204
lines changed

.github/workflows/sdist.yml

-1
Original file line numberDiff line numberDiff line change
@@ -92,5 +92,4 @@ jobs:
9292
- name: Import pandas
9393
run: |
9494
cd ..
95-
conda list
9695
python -c "import pandas; pandas.show_versions();"

.github/workflows/ubuntu.yml

-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ jobs:
7373
env_file: actions-pypy-38.yaml
7474
pattern: "not slow and not network and not single_cpu"
7575
test_args: "--max-worker-restart 0"
76-
error_on_warnings: "0"
7776
- name: "Numpy Dev"
7877
env_file: actions-310-numpydev.yaml
7978
pattern: "not slow and not network and not single_cpu"

doc/source/development/maintaining.rst

+15-15
Original file line numberDiff line numberDiff line change
@@ -349,10 +349,10 @@ The release process makes a snapshot of pandas (a git commit) available to users
349349
a particular version number. After the release the new pandas version will be available
350350
in the next places:
351351

352-
- Git repo with a [new tag](https://github.com/pandas-dev/pandas/tags)
353-
- Source distribution in a [GitHub release](https://github.com/pandas-dev/pandas/releases)
354-
- Pip packages in the [PyPI](https://pypi.org/project/pandas/)
355-
- Conda/Mamba packages in [conda-forge](https://anaconda.org/conda-forge/pandas)
352+
- Git repo with a `new tag <https://github.com/pandas-dev/pandas/tags>`_
353+
- Source distribution in a `GitHub release <https://github.com/pandas-dev/pandas/releases>`_
354+
- Pip packages in the `PyPI <https://pypi.org/project/pandas/>`_
355+
- Conda/Mamba packages in `conda-forge <https://anaconda.org/conda-forge/pandas>`_
356356

357357
The process for releasing a new version of pandas is detailed next section.
358358

@@ -368,11 +368,11 @@ Prerequisites
368368

369369
In order to be able to release a new pandas version, the next permissions are needed:
370370

371-
- Merge rights to the [pandas](https://github.com/pandas-dev/pandas/),
372-
[pandas-wheels](https://github.com/MacPython/pandas-wheels), and
373-
[pandas-feedstock](https://github.com/conda-forge/pandas-feedstock/) repositories.
371+
- Merge rights to the `pandas <https://github.com/pandas-dev/pandas/>`_,
372+
`pandas-wheels <https://github.com/MacPython/pandas-wheels>`_, and
373+
`pandas-feedstock <https://github.com/conda-forge/pandas-feedstock/>`_ repositories.
374374
- Permissions to push to main in the pandas repository, to push the new tags.
375-
- Write permissions to [PyPI](https://github.com/conda-forge/pandas-feedstock/pulls)
375+
- `Write permissions to PyPI <https://github.com/conda-forge/pandas-feedstock/pulls>`_
376376
- Access to the social media accounts, to publish the announcements.
377377

378378
Pre-release
@@ -408,7 +408,7 @@ Pre-release
408408
Release
409409
```````
410410

411-
1. Create an empty commit and a tag in the last commit of the branch to be released:
411+
1. Create an empty commit and a tag in the last commit of the branch to be released::
412412

413413
git checkout <branch>
414414
git pull --ff-only upstream <branch>
@@ -423,7 +423,7 @@ which will be triggered when the tag is pushed.
423423
2. Only if the release is a release candidate, we want to create a new branch for it, immediately
424424
after creating the tag. For example, if we are releasing pandas 1.4.0rc0, we would like to
425425
create the branch 1.4.x to backport commits to the 1.4 versions. As well as create a tag to
426-
mark the start of the development of 1.5.0 (assuming it is the next version):
426+
mark the start of the development of 1.5.0 (assuming it is the next version)::
427427

428428
git checkout -b 1.4.x
429429
git push upstream 1.4.x
@@ -436,7 +436,7 @@ which will be triggered when the tag is pushed.
436436

437437
./setup.py sdist --formats=gztar --quiet
438438

439-
4. Create a [new GitHub release](https://github.com/pandas-dev/pandas/releases/new):
439+
4. Create a `new GitHub release <https://github.com/pandas-dev/pandas/releases/new>`_:
440440

441441
- Title: ``Pandas <version>``
442442
- Tag: ``<version>``
@@ -447,13 +447,13 @@ which will be triggered when the tag is pushed.
447447
(e.g. releasing 1.4.5 after 1.5 has been released)
448448

449449
5. The GitHub release will after some hours trigger an
450-
[automated conda-forge PR](https://github.com/conda-forge/pandas-feedstock/pulls).
450+
`automated conda-forge PR <https://github.com/conda-forge/pandas-feedstock/pulls>`_.
451451
Merge it once the CI is green, and it will generate the conda-forge packages.
452452

453453
6. Packages for supported versions in PyPI are built in the
454-
[MacPython repo](https://github.com/MacPython/pandas-wheels).
454+
`MacPython repo <https://github.com/MacPython/pandas-wheels>`_.
455455
Open a PR updating the build commit to the released version, and merge it once the
456-
CI is green.
456+
CI is green. To do this type::
457457

458458
git checkout master
459459
git pull --ff-only upstream master
@@ -486,7 +486,7 @@ Post-Release
486486
4. Create a new issue for the next release, with the estimated date of release.
487487

488488
5. Open a PR with the placeholder for the release notes of the next version. See
489-
for example [the PR for 1.5.3](https://github.com/pandas-dev/pandas/pull/49843/files).
489+
for example `the PR for 1.5.3 <https://github.com/pandas-dev/pandas/pull/49843/files>`_.
490490

491491
6. Announce the new release in the official channels (use previous announcements
492492
for reference):

doc/source/reference/testing.rst

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Exceptions and warnings
2828
errors.AccessorRegistrationWarning
2929
errors.AttributeConflictWarning
3030
errors.CategoricalConversionWarning
31+
errors.ChainedAssignmentError
3132
errors.ClosedFileError
3233
errors.CSSWarning
3334
errors.DatabaseError

doc/source/whatsnew/v2.0.0.rst

+24
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,16 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4949
* :func:`read_feather`
5050
* :func:`to_numeric`
5151

52+
To simplify opting-in to nullable dtypes for these functions, a new option ``nullable_dtypes`` was added that allows setting
53+
the keyword argument globally to ``True`` if not specified directly. The option can be enabled
54+
through:
55+
56+
.. ipython:: python
57+
58+
pd.options.mode.nullable_dtypes = True
59+
60+
The option will only work for functions with the keyword ``use_nullable_dtypes``.
61+
5262
Additionally a new global configuration, ``mode.dtype_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
5363
to select the nullable dtypes implementation.
5464

@@ -125,6 +135,14 @@ Copy-on-Write improvements
125135
a modification to the data happens) when constructing a Series from an existing
126136
Series with the default of ``copy=False`` (:issue:`50471`)
127137

138+
- Trying to set values using chained assignment (for example, ``df["a"][1:3] = 0``)
139+
will now always raise an exception when Copy-on-Write is enabled. In this mode,
140+
chained assignment can never work because we are always setting into a temporary
141+
object that is the result of an indexing operation (getitem), which under
142+
Copy-on-Write always behaves as a copy. Thus, assigning through a chain
143+
can never update the original Series or DataFrame. Therefore, an informative
144+
error is raised to the user instead of silently doing nothing (:issue:`49467`)
145+
128146
Copy-on-Write can be enabled through
129147

130148
.. code-block:: python
@@ -608,6 +626,7 @@ Other API changes
608626
methods to get a full slice (for example ``df.loc[:]`` or ``df[:]``) (:issue:`49469`)
609627
- Disallow computing ``cumprod`` for :class:`Timedelta` object; previously this returned incorrect values (:issue:`50246`)
610628
- Loading a JSON file with duplicate columns using ``read_json(orient='split')`` renames columns to avoid duplicates, as :func:`read_csv` and the other readers do (:issue:`50370`)
629+
- The levels of the index of the :class:`Series` returned from ``Series.sparse.from_coo`` now always have dtype ``int32``. Previously they had dtype ``int64`` (:issue:`50926`)
611630
- :func:`to_datetime` with ``unit`` of either "Y" or "M" will now raise if a sequence contains a non-round ``float`` value, matching the ``Timestamp`` behavior (:issue:`50301`)
612631
-
613632

@@ -623,6 +642,7 @@ Deprecations
623642
- :meth:`Index.is_floating` has been deprecated. Use :func:`pandas.api.types.is_float_dtype` instead (:issue:`50042`)
624643
- :meth:`Index.holds_integer` has been deprecated. Use :func:`pandas.api.types.infer_dtype` instead (:issue:`50243`)
625644
- :meth:`Index.is_categorical` has been deprecated. Use :func:`pandas.api.types.is_categorical_dtype` instead (:issue:`50042`)
645+
- :meth:`Index.is_interval` has been deprecated. Use :func:`pandas.api.types.is_intterval_dtype` instead (:issue:`50042`)
626646

627647
.. ---------------------------------------------------------------------------
628648
.. _whatsnew_200.prior_deprecations:
@@ -906,6 +926,7 @@ Performance improvements
906926
- Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
907927
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
908928
- Performance improvement in :meth:`Series.to_numpy` if ``copy=True`` by avoiding copying twice (:issue:`24345`)
929+
- Performance improvement in :meth:`Series.rename` with :class:`MultiIndex` (:issue:`21055`)
909930
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``sort=False`` (:issue:`48976`)
910931
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``observed=False`` (:issue:`49596`)
911932
- Performance improvement in :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default). Now the index will be a :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49745`)
@@ -961,6 +982,8 @@ Datetimelike
961982
- Bug in :func:`Timestamp.utctimetuple` raising a ``TypeError`` (:issue:`32174`)
962983
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing mixed-offset :class:`Timestamp` with ``errors='ignore'`` (:issue:`50585`)
963984
- Bug in :func:`to_datetime` was incorrectly handling floating-point inputs within 1 ``unit`` of the overflow boundaries (:issue:`50183`)
985+
- Bug in :func:`to_datetime` with unit of "Y" or "M" giving incorrect results, not matching pointwise :class:`Timestamp` results (:issue:`50870`)
986+
-
964987

965988
Timedelta
966989
^^^^^^^^^
@@ -1026,6 +1049,7 @@ Indexing
10261049
- Bug in :meth:`DataFrame.iloc` raising ``IndexError`` when indexer is a :class:`Series` with numeric extension array dtype (:issue:`49521`)
10271050
- Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
10281051
- Bug in :meth:`DataFrame.compare` does not recognize differences when comparing ``NA`` with value in nullable dtypes (:issue:`48939`)
1052+
- Bug in :meth:`Series.rename` with :class:`MultiIndex` losing extension array dtypes (:issue:`21055`)
10291053
- Bug in :meth:`DataFrame.isetitem` coercing extension array dtypes in :class:`DataFrame` to object (:issue:`49922`)
10301054
- Bug in :class:`BusinessHour` would cause creation of :class:`DatetimeIndex` to fail when no opening hour was included in the index (:issue:`49835`)
10311055
-

pandas/_config/__init__.py

+5
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,8 @@
3333
def using_copy_on_write():
3434
_mode_options = _global_config["mode"]
3535
return _mode_options["copy_on_write"] and _mode_options["data_manager"] == "block"
36+
37+
38+
def using_nullable_dtypes():
39+
_mode_options = _global_config["mode"]
40+
return _mode_options["nullable_dtypes"]

pandas/_libs/tslib.pyx

+3-16
Original file line numberDiff line numberDiff line change
@@ -220,19 +220,6 @@ def format_array_from_datetime(
220220
return result
221221

222222

223-
cdef int64_t _wrapped_cast_from_unit(object val, str unit) except? -1:
224-
"""
225-
Call cast_from_unit and re-raise OverflowError as OutOfBoundsDatetime
226-
"""
227-
# See also timedeltas._maybe_cast_from_unit
228-
try:
229-
return cast_from_unit(val, unit)
230-
except OverflowError as err:
231-
raise OutOfBoundsDatetime(
232-
f"cannot convert input {val} with the unit '{unit}'"
233-
) from err
234-
235-
236223
def array_with_unit_to_datetime(
237224
ndarray[object] values,
238225
str unit,
@@ -302,7 +289,7 @@ def array_with_unit_to_datetime(
302289
if val != val or val == NPY_NAT:
303290
iresult[i] = NPY_NAT
304291
else:
305-
iresult[i] = _wrapped_cast_from_unit(val, unit)
292+
iresult[i] = cast_from_unit(val, unit)
306293

307294
elif isinstance(val, str):
308295
if len(val) == 0 or val in nat_strings:
@@ -317,7 +304,7 @@ def array_with_unit_to_datetime(
317304
f"non convertible value {val} with the unit '{unit}'"
318305
)
319306

320-
iresult[i] = _wrapped_cast_from_unit(fval, unit)
307+
iresult[i] = cast_from_unit(fval, unit)
321308

322309
else:
323310
# TODO: makes more sense as TypeError, but that would be an
@@ -362,7 +349,7 @@ cdef _array_with_unit_to_datetime_object_fallback(ndarray[object] values, str un
362349
else:
363350
try:
364351
oresult[i] = Timestamp(val, unit=unit)
365-
except OverflowError:
352+
except OutOfBoundsDatetime:
366353
oresult[i] = val
367354

368355
elif isinstance(val, str):

pandas/_libs/tslibs/conversion.pyx

+30-23
Original file line numberDiff line numberDiff line change
@@ -108,22 +108,41 @@ cdef int64_t cast_from_unit(object ts, str unit) except? -1:
108108
if ts is None:
109109
return m
110110

111-
if unit in ["Y", "M"] and is_float_object(ts) and not ts.is_integer():
112-
# GH#47267 it is clear that 2 "M" corresponds to 1970-02-01,
113-
# but not clear what 2.5 "M" corresponds to, so we will
114-
# disallow that case.
115-
raise ValueError(
116-
f"Conversion of non-round float with unit={unit} "
117-
"is ambiguous"
118-
)
111+
if unit in ["Y", "M"]:
112+
if is_float_object(ts) and not ts.is_integer():
113+
# GH#47267 it is clear that 2 "M" corresponds to 1970-02-01,
114+
# but not clear what 2.5 "M" corresponds to, so we will
115+
# disallow that case.
116+
raise ValueError(
117+
f"Conversion of non-round float with unit={unit} "
118+
"is ambiguous"
119+
)
120+
# GH#47266 go through np.datetime64 to avoid weird results e.g. with "Y"
121+
# and 150 we'd get 2120-01-01 09:00:00
122+
if is_float_object(ts):
123+
ts = int(ts)
124+
dt64obj = np.datetime64(ts, unit)
125+
return get_datetime64_nanos(dt64obj, NPY_FR_ns)
119126

120127
# cast the unit, multiply base/frace separately
121128
# to avoid precision issues from float -> int
122-
base = <int64_t>ts
129+
try:
130+
base = <int64_t>ts
131+
except OverflowError as err:
132+
raise OutOfBoundsDatetime(
133+
f"cannot convert input {ts} with the unit '{unit}'"
134+
) from err
135+
123136
frac = ts - base
124137
if p:
125138
frac = round(frac, p)
126-
return <int64_t>(base * m) + <int64_t>(frac * m)
139+
140+
try:
141+
return <int64_t>(base * m) + <int64_t>(frac * m)
142+
except OverflowError as err:
143+
raise OutOfBoundsDatetime(
144+
f"cannot convert input {ts} with the unit '{unit}'"
145+
) from err
127146

128147

129148
cpdef inline (int64_t, int) precision_from_unit(str unit):
@@ -278,25 +297,13 @@ cdef _TSObject convert_to_tsobject(object ts, tzinfo tz, str unit,
278297
if ts == NPY_NAT:
279298
obj.value = NPY_NAT
280299
else:
281-
if unit in ["Y", "M"]:
282-
# GH#47266 cast_from_unit leads to weird results e.g. with "Y"
283-
# and 150 we'd get 2120-01-01 09:00:00
284-
ts = np.datetime64(ts, unit)
285-
return convert_to_tsobject(ts, tz, None, False, False)
286-
287-
ts = ts * cast_from_unit(None, unit)
300+
ts = cast_from_unit(ts, unit)
288301
obj.value = ts
289302
pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)
290303
elif is_float_object(ts):
291304
if ts != ts or ts == NPY_NAT:
292305
obj.value = NPY_NAT
293306
else:
294-
if unit in ["Y", "M"]:
295-
if ts == int(ts):
296-
# GH#47266 Avoid cast_from_unit, which would give weird results
297-
# e.g. with "Y" and 150.0 we'd get 2120-01-01 09:00:00
298-
return convert_to_tsobject(int(ts), tz, unit, False, False)
299-
300307
ts = cast_from_unit(ts, unit)
301308
obj.value = ts
302309
pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)

pandas/_libs/tslibs/timedeltas.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,7 @@ cdef _maybe_cast_from_unit(ts, str unit):
373373
# assert unit not in ["Y", "y", "M"]
374374
try:
375375
ts = cast_from_unit(ts, unit)
376-
except OverflowError as err:
376+
except OutOfBoundsDatetime as err:
377377
raise OutOfBoundsTimedelta(
378378
f"Cannot cast {ts} from {unit} to 'ns' without overflow."
379379
) from err

0 commit comments

Comments
 (0)