Skip to content

Commit 82b0278

Browse files
committed
Merge branch 'master' into fix-48855
2 parents 219630f + 712c2b1 commit 82b0278

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+1730
-1162
lines changed

.github/workflows/python-dev.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
os: [ubuntu-latest, macOS-latest, windows-latest]
5555

5656
name: actions-311-dev
57-
timeout-minutes: 80
57+
timeout-minutes: 120
5858

5959
concurrency:
6060
#https://github.community/t/concurrecy-not-work-for-push/183068/7
@@ -75,7 +75,7 @@ jobs:
7575
run: |
7676
python --version
7777
python -m pip install --upgrade pip setuptools wheel
78-
python -m pip install git+https://github.com/numpy/numpy.git
78+
python -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
7979
python -m pip install git+https://github.com/nedbat/coveragepy.git
8080
python -m pip install python-dateutil pytz cython hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
8181
python -m pip list
@@ -84,7 +84,7 @@ jobs:
8484
- name: Build Pandas
8585
run: |
8686
python setup.py build_ext -q -j1
87-
python -m pip install -e . --no-build-isolation --no-use-pep517
87+
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
8888
8989
- name: Build Version
9090
run: |

.github/workflows/wheels.yml

+2-12
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ jobs:
5454
# TODO: support PyPy?
5555
python: [["cp38", "3.8"], ["cp39", "3.9"], ["cp310", "3.10"], ["cp311", "3.11-dev"]]# "pp38", "pp39"]
5656
env:
57-
IS_32_BIT: ${{ matrix.buildplat[1] == 'win32' }}
5857
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
5958
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
6059
steps:
@@ -72,15 +71,6 @@ jobs:
7271
uses: pypa/[email protected]
7372
env:
7473
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
75-
CIBW_ENVIRONMENT: IS_32_BIT='${{ env.IS_32_BIT }}'
76-
# We can't test directly with cibuildwheel, since we need to have to wheel location
77-
# to mount into the docker image
78-
CIBW_TEST_COMMAND_LINUX: "python {project}/ci/test_wheels.py"
79-
CIBW_TEST_COMMAND_MACOS: "python {project}/ci/test_wheels.py"
80-
CIBW_TEST_REQUIRES: hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-asyncio>=0.17
81-
CIBW_REPAIR_WHEEL_COMMAND_WINDOWS: "python ci/fix_wheels.py {wheel} {dest_dir}"
82-
CIBW_ARCHS_MACOS: x86_64 universal2
83-
CIBW_BUILD_VERBOSITY: 3
8474

8575
# Used to test the built wheels
8676
- uses: actions/setup-python@v3
@@ -118,7 +108,7 @@ jobs:
118108

119109
- name: Upload wheels
120110
if: success()
121-
shell: bash
111+
shell: bash -el {0}
122112
env:
123113
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
124114
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
@@ -195,7 +185,7 @@ jobs:
195185
196186
- name: Upload sdist
197187
if: success()
198-
shell: bash
188+
shell: bash -el {0}
199189
env:
200190
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
201191
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}

.pre-commit-config.yaml

+7
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,13 @@ repos:
226226
entry: python scripts/no_bool_in_generic.py
227227
language: python
228228
files: ^pandas/core/generic\.py$
229+
- id: no-return-exception
230+
name: Use raise instead of return for exceptions
231+
language: pygrep
232+
entry: 'return [A-Za-z]+(Error|Exit|Interrupt|Exception|Iteration)'
233+
files: ^pandas/
234+
types: [python]
235+
exclude: ^pandas/tests/
229236
- id: pandas-errors-documented
230237
name: Ensure pandas errors are documented in doc/source/reference/testing.rst
231238
entry: python scripts/pandas_errors_documented.py

README.md

-2
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,6 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_in
128128
python -m pip install -e . --no-build-isolation --no-use-pep517
129129
```
130130

131-
If you have `make`, you can also use `make develop` to run the same command.
132-
133131
or alternatively
134132

135133
```sh

asv_bench/benchmarks/index_object.py

+9
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,15 @@ def time_datetime_difference_disjoint(self):
6565
self.datetime_left.difference(self.datetime_right)
6666

6767

68+
class UnionWithDuplicates:
69+
def setup(self):
70+
self.left = Index(np.repeat(np.arange(1000), 100))
71+
self.right = Index(np.tile(np.arange(500, 1500), 50))
72+
73+
def time_union_with_duplicates(self):
74+
self.left.union(self.right)
75+
76+
6877
class Range:
6978
def setup(self):
7079
self.idx_inc = RangeIndex(start=0, stop=10**6, step=3)

doc/source/development/maintaining.rst

+43
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,49 @@ Here's a typical workflow for triaging a newly opened issue.
121121
unless it's know that this issue should be addressed in a specific release (say
122122
because it's a large regression).
123123

124+
.. _maintaining.regressions:
125+
126+
Investigating regressions
127+
-------------------------
128+
129+
Regressions are bugs that unintentionally break previously working code. The common way
130+
to investigate regressions is by using
131+
`git bisect <https://git-scm.com/docs/git-bisect>`_,
132+
which finds the first commit that introduced the bug.
133+
134+
For example: a user reports that ``pd.Series([1, 1]).sum()`` returns ``3``
135+
in pandas version ``1.5.0`` while in version ``1.4.0`` it returned ``2``. To begin,
136+
create a file ``t.py`` in your pandas directory, which contains
137+
138+
.. code-block:: python
139+
140+
import pandas as pd
141+
assert pd.Series([1, 1]).sum() == 2
142+
143+
and then run::
144+
145+
git bisect start
146+
git bisect good v1.4.0
147+
git bisect bad v1.5.0
148+
git bisect run bash -c "python setup.py build_ext -j 4; python t.py"
149+
150+
This finds the first commit that changed the behavior. The C extensions have to be
151+
rebuilt at every step, so the search can take a while.
152+
153+
Exit bisect and rebuild the current version::
154+
155+
git bisect reset
156+
python setup.py build_ext -j 4
157+
158+
Report your findings under the corresponding issue and ping the commit author to get
159+
their input.
160+
161+
.. note::
162+
In the ``bisect run`` command above, commits are considered good if ``t.py`` exits
163+
with ``0`` and bad otherwise. When raising an exception is the desired behavior,
164+
wrap the code in an appropriate ``try/except`` statement. See :issue:`35685` for
165+
more examples.
166+
124167
.. _maintaining.closing:
125168

126169
Closing issues

doc/source/user_guide/io.rst

+8
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,14 @@ dtype : Type name or dict of column -> type, default ``None``
197197
Support for defaultdict was added. Specify a defaultdict as input where
198198
the default determines the dtype of the columns which are not explicitly
199199
listed.
200+
201+
use_nullable_dtypes : bool = False
202+
Whether or not to use nullable dtypes as default when reading data. If
203+
set to True, nullable dtypes are used for all dtypes that have a nullable
204+
implementation, even if no nulls are present.
205+
206+
.. versionadded:: 2.0
207+
200208
engine : {``'c'``, ``'python'``, ``'pyarrow'``}
201209
Parser engine to use. The C and pyarrow engines are faster, while the python engine
202210
is currently more feature-complete. Multithreading is currently only supported by

doc/source/whatsnew/v1.5.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ Fixed regressions
7979
- Fixed performance regression in :func:`factorize` when ``na_sentinel`` is not ``None`` and ``sort=False`` (:issue:`48620`)
8080
- Fixed regression causing an ``AttributeError`` during warning emitted if the provided table name in :meth:`DataFrame.to_sql` and the table name actually used in the database do not match (:issue:`48733`)
8181
- Fixed regression in :func:`to_datetime` when ``arg`` was a date string with nanosecond and ``format`` contained ``%f`` would raise a ``ValueError`` (:issue:`48767`)
82+
- Fixed regression in :func:`assert_frame_equal` raising for :class:`MultiIndex` with :class:`Categorical` and ``check_like=True`` (:issue:`48975`)
8283
- Fixed regression in :meth:`DataFrame.fillna` replacing wrong values for ``datetime64[ns]`` dtype and ``inplace=True`` (:issue:`48863`)
8384
- Fixed :meth:`.DataFrameGroupBy.size` not returning a Series when ``axis=1`` (:issue:`48738`)
8485
- Fixed Regression in :meth:`DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)

doc/source/whatsnew/v1.6.0.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Other enhancements
3232
- :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` now preserve nullable dtypes instead of casting to numpy dtypes (:issue:`37493`)
3333
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an ``axis`` argument. If ``axis`` is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`)
3434
- :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to ``pytest``'s output (:issue:`47910`)
35+
- Added new argument ``use_nullable_dtypes`` to :func:`read_csv` to enable automatic conversion to nullable dtypes (:issue:`36712`)
3536
- Added ``index`` parameter to :meth:`DataFrame.to_dict` (:issue:`46398`)
3637
- Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
3738
- :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in ``pandas.errors`` (:issue:`27656`)
@@ -118,6 +119,7 @@ Other API changes
118119
- Passing ``nanoseconds`` greater than 999 or less than 0 in :class:`Timestamp` now raises a ``ValueError`` (:issue:`48538`, :issue:`48255`)
119120
- :func:`read_csv`: specifying an incorrect number of columns with ``index_col`` of now raises ``ParserError`` instead of ``IndexError`` when using the c parser.
120121
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (:issue:`48928`)
122+
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (:issue:`48963`)
121123
-
122124

123125
.. ---------------------------------------------------------------------------
@@ -140,6 +142,7 @@ Performance improvements
140142
- Performance improvement in :meth:`MultiIndex.difference` (:issue:`48606`)
141143
- Performance improvement in :meth:`.DataFrameGroupBy.mean`, :meth:`.SeriesGroupBy.mean`, :meth:`.DataFrameGroupBy.var`, and :meth:`.SeriesGroupBy.var` for extension array dtypes (:issue:`37493`)
142144
- Performance improvement in :meth:`MultiIndex.isin` when ``level=None`` (:issue:`48622`)
145+
- Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
143146
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
144147
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
145148
- Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
@@ -151,6 +154,7 @@ Performance improvements
151154
- Performance improvement in ``var`` for nullable dtypes (:issue:`48379`).
152155
- Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
153156
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
157+
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``sort=False`` (:issue:`48976`)
154158

155159
.. ---------------------------------------------------------------------------
156160
.. _whatsnew_160.bug_fixes:
@@ -219,11 +223,12 @@ Missing
219223

220224
MultiIndex
221225
^^^^^^^^^^
226+
- Bug in :meth:`MultiIndex.argsort` raising ``TypeError`` when index contains :attr:`NA` (:issue:`48495`)
222227
- Bug in :meth:`MultiIndex.difference` losing extension array dtype (:issue:`48606`)
223228
- Bug in :class:`MultiIndex.set_levels` raising ``IndexError`` when setting empty level (:issue:`48636`)
224229
- Bug in :meth:`MultiIndex.unique` losing extension array dtype (:issue:`48335`)
225230
- Bug in :meth:`MultiIndex.intersection` losing extension array (:issue:`48604`)
226-
- Bug in :meth:`MultiIndex.union` losing extension array (:issue:`48498`, :issue:`48505`)
231+
- Bug in :meth:`MultiIndex.union` losing extension array (:issue:`48498`, :issue:`48505`, :issue:`48900`)
227232
- Bug in :meth:`MultiIndex.append` not checking names for equality (:issue:`48288`)
228233
- Bug in :meth:`MultiIndex.symmetric_difference` losing extension array (:issue:`48607`)
229234
-

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ dependencies:
100100
- natsort # DataFrame.sort_values doctest
101101
- numpydoc
102102
- pandas-dev-flaker=0.5.0
103-
- pydata-sphinx-theme
103+
- pydata-sphinx-theme<0.11
104104
- pytest-cython # doctest
105105
- sphinx
106106
- sphinx-panels

pandas/_libs/parsers.pyx

+13-3
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,7 @@ cdef class TextReader:
342342
object index_col
343343
object skiprows
344344
object dtype
345+
bint use_nullable_dtypes
345346
object usecols
346347
set unnamed_cols # set[str]
347348

@@ -380,7 +381,8 @@ cdef class TextReader:
380381
bint mangle_dupe_cols=True,
381382
float_precision=None,
382383
bint skip_blank_lines=True,
383-
encoding_errors=b"strict"):
384+
encoding_errors=b"strict",
385+
use_nullable_dtypes=False):
384386

385387
# set encoding for native Python and C library
386388
if isinstance(encoding_errors, str):
@@ -505,6 +507,7 @@ cdef class TextReader:
505507
# - DtypeObj
506508
# - dict[Any, DtypeObj]
507509
self.dtype = dtype
510+
self.use_nullable_dtypes = use_nullable_dtypes
508511

509512
# XXX
510513
self.noconvert = set()
@@ -933,6 +936,7 @@ cdef class TextReader:
933936
bint na_filter = 0
934937
int64_t num_cols
935938
dict result
939+
bint use_nullable_dtypes
936940

937941
start = self.parser_start
938942

@@ -1053,8 +1057,14 @@ cdef class TextReader:
10531057
self._free_na_set(na_hashset)
10541058

10551059
# don't try to upcast EAs
1056-
if na_count > 0 and not is_extension_array_dtype(col_dtype):
1057-
col_res = _maybe_upcast(col_res)
1060+
if (
1061+
na_count > 0 and not is_extension_array_dtype(col_dtype)
1062+
or self.use_nullable_dtypes
1063+
):
1064+
use_nullable_dtypes = self.use_nullable_dtypes and col_dtype is None
1065+
col_res = _maybe_upcast(
1066+
col_res, use_nullable_dtypes=use_nullable_dtypes
1067+
)
10581068

10591069
if col_res is None:
10601070
raise ParserError(f'Unable to parse column {i}')

pandas/_libs/tslibs/parsing.pyx

-4
Original file line numberDiff line numberDiff line change
@@ -963,10 +963,6 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:
963963
datetime format string (for `strftime` or `strptime`),
964964
or None if it can't be guessed.
965965
"""
966-
967-
if not isinstance(dt_str, str):
968-
return None
969-
970966
day_attribute_and_format = (('day',), '%d', 2)
971967

972968
# attr name, format, padding (if any)

pandas/_libs/tslibs/timedeltas.pxd

+1
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ cdef class _Timedelta(timedelta):
2525
cdef _ensure_components(_Timedelta self)
2626
cdef inline bint _compare_mismatched_resos(self, _Timedelta other, op)
2727
cdef _Timedelta _as_reso(self, NPY_DATETIMEUNIT reso, bint round_ok=*)
28+
cpdef _maybe_cast_to_matching_resos(self, _Timedelta other)

0 commit comments

Comments
 (0)