Skip to content

Commit e5ea765

Browse files
authored
Merge branch 'main' into issue-48949
2 parents aee587a + 85c2cb3 commit e5ea765

File tree

11 files changed

+125
-119
lines changed

11 files changed

+125
-119
lines changed

.github/workflows/wheels.yml

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -72,14 +72,22 @@ jobs:
7272
env:
7373
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
7474

75-
# Used to test the built wheels
76-
- uses: actions/setup-python@v4
75+
# Used to test(Windows-only) and push the built wheels
76+
# You might need to use setup-python separately
77+
# if the new Python-dev version
78+
# is unavailable on conda-forge.
79+
- uses: conda-incubator/setup-miniconda@v2
7780
with:
81+
auto-update-conda: true
7882
python-version: ${{ matrix.python[1] }}
83+
activate-environment: test
84+
channels: conda-forge, anaconda
85+
channel-priority: true
86+
mamba-version: "*"
7987

8088
- name: Test wheels (Windows 64-bit only)
8189
if: ${{ matrix.buildplat[1] == 'win_amd64' }}
82-
shell: cmd
90+
shell: cmd /C CALL {0}
8391
run: |
8492
python ci/test_wheels.py wheelhouse
8593
@@ -88,26 +96,15 @@ jobs:
8896
name: ${{ matrix.python[0] }}-${{ startsWith(matrix.buildplat[1], 'macosx') && 'macosx' || matrix.buildplat[1] }}
8997
path: ./wheelhouse/*.whl
9098

91-
# Used to push the built wheels
92-
# TODO: once Python 3.11 is available on conda, de-dup with
93-
# setup python above
94-
- uses: conda-incubator/setup-miniconda@v2
95-
with:
96-
auto-update-conda: true
97-
# Really doesn't matter what version we upload with
98-
# just the version we test with
99-
python-version: '3.8'
100-
channels: conda-forge
101-
channel-priority: true
102-
mamba-version: "*"
10399

104100
- name: Install anaconda client
105101
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
102+
shell: bash -el {0}
106103
run: conda install -q -y anaconda-client
107104

108105

109106
- name: Upload wheels
110-
if: success()
107+
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
111108
shell: bash -el {0}
112109
env:
113110
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
@@ -180,11 +177,12 @@ jobs:
180177

181178
- name: Install anaconda client
182179
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
180+
shell: bash -el {0}
183181
run: |
184182
conda install -q -y anaconda-client
185183
186184
- name: Upload sdist
187-
if: success()
185+
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
188186
shell: bash -el {0}
189187
env:
190188
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,6 @@ You can also triage issues which may include reproducing bug reports, or asking
166166

167167
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
168168

169-
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas).
169+
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
170170

171171
As contributors and maintainers to this project, you are expected to abide by pandas' code of conduct. More information can be found at: [Contributor Code of Conduct](https://github.com/pandas-dev/.github/blob/master/CODE_OF_CONDUCT.md)

ci/test_wheels.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
import glob
22
import os
3-
import platform
43
import shutil
54
import subprocess
65
import sys
76

87
if os.name == "nt":
9-
py_ver = platform.python_version()
8+
py_ver = f"{sys.version_info.major}.{sys.version_info.minor}"
109
is_32_bit = os.getenv("IS_32_BIT") == "true"
1110
try:
1211
wheel_dir = sys.argv[1]

doc/source/whatsnew/v2.0.0.rst

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,24 @@ Available optional dependencies (listed in order of appearance at `install guide
2828
``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
2929
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).
3030

31-
.. _whatsnew_200.enhancements.enhancement2:
31+
.. _whatsnew_200.enhancements.io_readers_nullable_pyarrow:
3232

33-
enhancement2
34-
^^^^^^^^^^^^
33+
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
34+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
35+
36+
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``)
37+
to return pyarrow-backed dtypes when set to ``"pyarrow"`` (:issue:`48957`).
38+
39+
.. ipython:: python
40+
41+
import io
42+
data = io.StringIO("""a,b,c,d,e,f,g,h,i
43+
1,2.5,True,a,,,,,
44+
3,4.5,False,b,6,7.5,True,a,
45+
""")
46+
with pd.option_context("io.nullable_backend", "pyarrow"):
47+
df = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
48+
df
3549
3650
.. _whatsnew_200.enhancements.other:
3751

@@ -42,7 +56,6 @@ Other enhancements
4256
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an ``axis`` argument. If ``axis`` is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`)
4357
- :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to ``pytest``'s output (:issue:`47910`)
4458
- Added new argument ``use_nullable_dtypes`` to :func:`read_csv` and :func:`read_excel` to enable automatic conversion to nullable dtypes (:issue:`36712`)
45-
- Added new global configuration, ``io.nullable_backend`` to allow ``use_nullable_dtypes=True`` to return pyarrow-backed dtypes when set to ``"pyarrow"`` in :func:`read_parquet` (:issue:`48957`)
4659
- Added ``index`` parameter to :meth:`DataFrame.to_dict` (:issue:`46398`)
4760
- Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
4861
- :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in ``pandas.errors`` (:issue:`27656`)

pandas/_libs/hashtable.pyx

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ cdef class ObjectFactorizer(Factorizer):
9090
self.uniques = ObjectVector()
9191

9292
def factorize(
93-
self, ndarray[object] values, sort=False, na_sentinel=-1, na_value=None
93+
self, ndarray[object] values, na_sentinel=-1, na_value=None
9494
) -> np.ndarray:
9595
"""
9696

@@ -115,14 +115,6 @@ cdef class ObjectFactorizer(Factorizer):
115115
self.uniques = uniques
116116
labels = self.table.get_labels(values, self.uniques,
117117
self.count, na_sentinel, na_value)
118-
mask = (labels == na_sentinel)
119-
# sort on
120-
if sort:
121-
sorter = self.uniques.to_array().argsort()
122-
reverse_indexer = np.empty(len(sorter), dtype=np.intp)
123-
reverse_indexer.put(sorter, np.arange(len(sorter)))
124-
labels = reverse_indexer.take(labels, mode='clip')
125-
labels[mask] = na_sentinel
126118
self.count = len(self.uniques)
127119
return labels
128120

@@ -136,7 +128,7 @@ cdef class Int64Factorizer(Factorizer):
136128
self.table = Int64HashTable(size_hint)
137129
self.uniques = Int64Vector()
138130

139-
def factorize(self, const int64_t[:] values, sort=False,
131+
def factorize(self, const int64_t[:] values,
140132
na_sentinel=-1, na_value=None) -> np.ndarray:
141133
"""
142134
Returns
@@ -161,14 +153,5 @@ cdef class Int64Factorizer(Factorizer):
161153
labels = self.table.get_labels(values, self.uniques,
162154
self.count, na_sentinel,
163155
na_value=na_value)
164-
165-
# sort on
166-
if sort:
167-
sorter = self.uniques.to_array().argsort()
168-
reverse_indexer = np.empty(len(sorter), dtype=np.intp)
169-
reverse_indexer.put(sorter, np.arange(len(sorter)))
170-
171-
labels = reverse_indexer.take(labels)
172-
173156
self.count = len(self.uniques)
174157
return labels

pandas/core/indexes/category.py

Lines changed: 1 addition & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,6 @@ def __contains__(self, key: Any) -> bool:
365365

366366
return contains(self, key, container=self._engine)
367367

368-
# TODO(2.0): remove reindex once non-unique deprecation is enforced
369368
def reindex(
370369
self, target, method=None, level=None, limit=None, tolerance=None
371370
) -> tuple[Index, npt.NDArray[np.intp] | None]:
@@ -392,51 +391,7 @@ def reindex(
392391
raise NotImplementedError(
393392
"argument limit is not implemented for CategoricalIndex.reindex"
394393
)
395-
396-
target = ibase.ensure_index(target)
397-
398-
if self.equals(target):
399-
indexer = None
400-
missing = np.array([], dtype=np.intp)
401-
else:
402-
indexer, missing = self.get_indexer_non_unique(target)
403-
if not self.is_unique:
404-
# GH#42568
405-
raise ValueError("cannot reindex on an axis with duplicate labels")
406-
407-
new_target: Index
408-
if len(self) and indexer is not None:
409-
new_target = self.take(indexer)
410-
else:
411-
new_target = target
412-
413-
# filling in missing if needed
414-
if len(missing):
415-
cats = self.categories.get_indexer(target)
416-
417-
if not isinstance(target, CategoricalIndex) or (cats == -1).any():
418-
new_target, indexer, _ = super()._reindex_non_unique(target)
419-
else:
420-
# error: "Index" has no attribute "codes"
421-
codes = new_target.codes.copy() # type: ignore[attr-defined]
422-
codes[indexer == -1] = cats[missing]
423-
cat = self._data._from_backing_data(codes)
424-
new_target = type(self)._simple_new(cat, name=self.name)
425-
426-
# we always want to return an Index type here
427-
# to be consistent with .reindex for other index types (e.g. they don't
428-
# coerce based on the actual values, only on the dtype)
429-
# unless we had an initial Categorical to begin with
430-
# in which case we are going to conform to the passed Categorical
431-
if is_categorical_dtype(target):
432-
cat = Categorical(new_target, dtype=target.dtype)
433-
new_target = type(self)._simple_new(cat, name=self.name)
434-
else:
435-
# e.g. test_reindex_with_categoricalindex, test_reindex_duplicate_target
436-
new_target_array = np.asarray(new_target)
437-
new_target = Index._with_infer(new_target_array, name=self.name)
438-
439-
return new_target, indexer
394+
return super().reindex(target)
440395

441396
# --------------------------------------------------------------------
442397
# Indexing Methods

pandas/core/series.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1982,7 +1982,7 @@ def groupby(
19821982
self,
19831983
by=None,
19841984
axis: Axis = 0,
1985-
level: Level = None,
1985+
level: IndexLabel = None,
19861986
as_index: bool = True,
19871987
sort: bool = True,
19881988
group_keys: bool | lib.NoDefault = no_default,

pandas/io/parsers/arrow_parser_wrapper.py

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
from __future__ import annotations
22

3-
from typing import TYPE_CHECKING
4-
53
from pandas._typing import ReadBuffer
64
from pandas.compat._optional import import_optional_dependency
75

86
from pandas.core.dtypes.inference import is_integer
97

10-
from pandas.io.parsers.base_parser import ParserBase
8+
from pandas import (
9+
DataFrame,
10+
arrays,
11+
get_option,
12+
)
1113

12-
if TYPE_CHECKING:
13-
from pandas import DataFrame
14+
from pandas.io.parsers.base_parser import ParserBase
1415

1516

1617
class ArrowParserWrapper(ParserBase):
@@ -77,7 +78,7 @@ def _get_pyarrow_options(self) -> None:
7778
else self.kwds["skiprows"],
7879
}
7980

80-
def _finalize_output(self, frame: DataFrame) -> DataFrame:
81+
def _finalize_pandas_output(self, frame: DataFrame) -> DataFrame:
8182
"""
8283
Processes data read in based on kwargs.
8384
@@ -148,6 +149,16 @@ def read(self) -> DataFrame:
148149
parse_options=pyarrow_csv.ParseOptions(**self.parse_options),
149150
convert_options=pyarrow_csv.ConvertOptions(**self.convert_options),
150151
)
151-
152-
frame = table.to_pandas()
153-
return self._finalize_output(frame)
152+
if (
153+
self.kwds["use_nullable_dtypes"]
154+
and get_option("io.nullable_backend") == "pyarrow"
155+
):
156+
frame = DataFrame(
157+
{
158+
col_name: arrays.ArrowExtensionArray(pa_col)
159+
for col_name, pa_col in zip(table.column_names, table.itercolumns())
160+
}
161+
)
162+
else:
163+
frame = table.to_pandas()
164+
return self._finalize_pandas_output(frame)

pandas/io/parsers/readers.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424

2525
import numpy as np
2626

27+
from pandas._config import get_option
28+
2729
from pandas._libs import lib
2830
from pandas._libs.parsers import STR_NA_VALUES
2931
from pandas._typing import (
@@ -560,6 +562,14 @@ def _read(
560562
raise ValueError(
561563
"The 'chunksize' option is not supported with the 'pyarrow' engine"
562564
)
565+
elif (
566+
kwds.get("use_nullable_dtypes", False)
567+
and get_option("io.nullable_backend") == "pyarrow"
568+
):
569+
raise NotImplementedError(
570+
f"use_nullable_dtypes=True and engine={kwds['engine']} with "
571+
"io.nullable_backend set to 'pyarrow' is not implemented."
572+
)
563573
else:
564574
chunksize = validate_integer("chunksize", chunksize, 1)
565575

0 commit comments

Comments
 (0)