Skip to content

Commit 271ddce

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into multiindex_union
� Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/tests/indexes/test_setops.py
2 parents a052b86 + 84d9c5e commit 271ddce

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+1847
-1128
lines changed

asv_bench/benchmarks/groupby.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -480,7 +480,19 @@ class GroupByCythonAgg:
480480
param_names = ["dtype", "method"]
481481
params = [
482482
["float64"],
483-
["sum", "prod", "min", "max", "mean", "median", "var", "first", "last"],
483+
[
484+
"sum",
485+
"prod",
486+
"min",
487+
"max",
488+
"mean",
489+
"median",
490+
"var",
491+
"first",
492+
"last",
493+
"any",
494+
"all",
495+
],
484496
]
485497

486498
def setup(self, dtype, method):

ci/azure/windows.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
vmImage: ${{ parameters.vmImage }}
99
strategy:
1010
matrix:
11-
py37_np16:
11+
py37_np17:
1212
ENV_FILE: ci/deps/azure-windows-37.yaml
1313
CONDA_PY: "37"
1414
PATTERN: "not slow and not network"

ci/deps/actions-37-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dependencies:
1818
- jinja2=2.10
1919
- numba=0.46.0
2020
- numexpr=2.6.8
21-
- numpy=1.16.5
21+
- numpy=1.17.3
2222
- openpyxl=3.0.0
2323
- pytables=3.5.1
2424
- python-dateutil=2.7.3

ci/deps/azure-macos-37.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ dependencies:
1919
- matplotlib=2.2.3
2020
- nomkl
2121
- numexpr
22-
- numpy=1.16.5
22+
- numpy=1.17.3
2323
- openpyxl
2424
- pyarrow=0.15.1
2525
- pytables

ci/deps/azure-windows-37.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ dependencies:
2424
- moto>=1.3.14
2525
- flask
2626
- numexpr
27-
- numpy=1.16.*
27+
- numpy=1.17.*
2828
- openpyxl
2929
- pyarrow=0.15
3030
- pytables

doc/source/getting_started/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ Dependencies
222222
Package Minimum supported version
223223
================================================================ ==========================
224224
`setuptools <https://setuptools.readthedocs.io/en/latest/>`__ 38.6.0
225-
`NumPy <https://numpy.org>`__ 1.16.5
225+
`NumPy <https://numpy.org>`__ 1.17.3
226226
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.7.3
227227
`pytz <https://pypi.org/project/pytz/>`__ 2017.3
228228
================================================================ ==========================

doc/source/whatsnew/v1.2.5.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17-
17+
- Regression in :func:`concat` between two :class:`DataFrames` where one has an :class:`Index` that is all-None and the other is :class:`DatetimeIndex` incorrectly raising (:issue:`40841`)
1818
-
1919
-
2020

doc/source/whatsnew/v1.3.0.rst

+8-3
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,9 @@ Other enhancements
217217
- :class:`RangeIndex` can now be constructed by passing a ``range`` object directly e.g. ``pd.RangeIndex(range(3))`` (:issue:`12067`)
218218
- :meth:`round` being enabled for the nullable integer and floating dtypes (:issue:`38844`)
219219
- :meth:`pandas.read_csv` and :meth:`pandas.read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
220+
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
221+
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
222+
-
220223

221224
.. ---------------------------------------------------------------------------
222225
@@ -465,7 +468,7 @@ If installed, we now require:
465468
+-----------------+-----------------+----------+---------+
466469
| Package | Minimum Version | Required | Changed |
467470
+=================+=================+==========+=========+
468-
| numpy | 1.16.5 | X | |
471+
| numpy | 1.17.3 | X | X |
469472
+-----------------+-----------------+----------+---------+
470473
| pytz | 2017.3 | X | |
471474
+-----------------+-----------------+----------+---------+
@@ -678,7 +681,7 @@ Interval
678681
Indexing
679682
^^^^^^^^
680683

681-
- Bug in :meth:`Index.union` and :meth:`MultiIndex.union` dropping duplicate ``Index`` values when ``Index`` was not monotonic or ``sort`` was set to ``False`` (:issue:`36289`, :issue:`31326`, :issue:`38745`)
684+
- Bug in :meth:`Index.union` and :meth:`MultiIndex.union` dropping duplicate ``Index`` values when ``Index`` was not monotonic or ``sort`` was set to ``False`` (:issue:`36289`, :issue:`31326`, :issue:`40862`)
682685
- Bug in :meth:`CategoricalIndex.get_indexer` failing to raise ``InvalidIndexError`` when non-unique (:issue:`38372`)
683686
- Bug in inserting many new columns into a :class:`DataFrame` causing incorrect subsequent indexing behavior (:issue:`38380`)
684687
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` when setting multiple values to duplicate columns (:issue:`15695`)
@@ -711,7 +714,7 @@ Missing
711714

712715
- Bug in :class:`Grouper` now correctly propagates ``dropna`` argument and :meth:`DataFrameGroupBy.transform` now correctly handles missing values for ``dropna=True`` (:issue:`35612`)
713716
- Bug in :func:`isna`, and :meth:`Series.isna`, :meth:`Index.isna`, :meth:`DataFrame.isna` (and the corresponding ``notna`` functions) not recognizing ``Decimal("NaN")`` objects (:issue:`39409`)
714-
-
717+
- Bug in :meth:`DataFrame.fillna` not accepting dictionary for ``downcast`` keyword (:issue:`40809`)
715718

716719
MultiIndex
717720
^^^^^^^^^^
@@ -787,6 +790,8 @@ Groupby/resample/rolling
787790
- Bug in :meth:`Series.asfreq` and :meth:`DataFrame.asfreq` dropping rows when the index is not sorted (:issue:`39805`)
788791
- Bug in aggregation functions for :class:`DataFrame` not respecting ``numeric_only`` argument when ``level`` keyword was given (:issue:`40660`)
789792
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
793+
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
794+
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
790795

791796
Reshaping
792797
^^^^^^^^^

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ channels:
33
- conda-forge
44
dependencies:
55
# required
6-
- numpy>=1.16.5
6+
- numpy>=1.17.3
77
- python=3
88
- python-dateutil>=2.7.3
99
- pytz

pandas/__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020

2121
# numpy compat
2222
from pandas.compat import (
23-
np_version_under1p17 as _np_version_under1p17,
2423
np_version_under1p18 as _np_version_under1p18,
2524
is_numpy_dev as _is_numpy_dev,
2625
)

pandas/_libs/arrays.pyx

+167
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
"""
2+
Cython implementations for internal ExtensionArrays.
3+
"""
4+
cimport cython
5+
6+
import numpy as np
7+
8+
cimport numpy as cnp
9+
from numpy cimport ndarray
10+
11+
cnp.import_array()
12+
13+
14+
@cython.freelist(16)
15+
cdef class NDArrayBacked:
16+
"""
17+
Implementing these methods in cython improves performance quite a bit.
18+
19+
import pandas as pd
20+
21+
from pandas._libs.arrays import NDArrayBacked as cls
22+
23+
dti = pd.date_range("2016-01-01", periods=3)
24+
dta = dti._data
25+
arr = dta._ndarray
26+
27+
obj = cls._simple_new(arr, arr.dtype)
28+
29+
# for foo in [arr, dta, obj]: ...
30+
31+
%timeit foo.copy()
32+
299 ns ± 30 ns per loop # <-- arr underlying ndarray (for reference)
33+
530 ns ± 9.24 ns per loop # <-- dta with cython NDArrayBacked
34+
1.66 µs ± 46.3 ns per loop # <-- dta without cython NDArrayBacked
35+
328 ns ± 5.29 ns per loop # <-- obj with NDArrayBacked.__cinit__
36+
371 ns ± 6.97 ns per loop # <-- obj with NDArrayBacked._simple_new
37+
38+
%timeit foo.T
39+
125 ns ± 6.27 ns per loop # <-- arr underlying ndarray (for reference)
40+
226 ns ± 7.66 ns per loop # <-- dta with cython NDArrayBacked
41+
911 ns ± 16.6 ns per loop # <-- dta without cython NDArrayBacked
42+
215 ns ± 4.54 ns per loop # <-- obj with NDArrayBacked._simple_new
43+
44+
"""
45+
# TODO: implement take in terms of cnp.PyArray_TakeFrom
46+
# TODO: implement concat_same_type in terms of cnp.PyArray_Concatenate
47+
48+
cdef:
49+
readonly ndarray _ndarray
50+
readonly object _dtype
51+
52+
def __init__(self, ndarray values, object dtype):
53+
self._ndarray = values
54+
self._dtype = dtype
55+
56+
@classmethod
57+
def _simple_new(cls, ndarray values, object dtype):
58+
cdef:
59+
NDArrayBacked obj
60+
obj = NDArrayBacked.__new__(cls)
61+
obj._ndarray = values
62+
obj._dtype = dtype
63+
return obj
64+
65+
cpdef NDArrayBacked _from_backing_data(self, ndarray values):
66+
"""
67+
Construct a new ExtensionArray `new_array` with `arr` as its _ndarray.
68+
69+
This should round-trip:
70+
self == self._from_backing_data(self._ndarray)
71+
"""
72+
# TODO: re-reuse simple_new if/when it can be cpdef
73+
cdef:
74+
NDArrayBacked obj
75+
obj = NDArrayBacked.__new__(type(self))
76+
obj._ndarray = values
77+
obj._dtype = self._dtype
78+
return obj
79+
80+
cpdef __setstate__(self, state):
81+
if isinstance(state, dict):
82+
if "_data" in state:
83+
data = state.pop("_data")
84+
elif "_ndarray" in state:
85+
data = state.pop("_ndarray")
86+
else:
87+
raise ValueError
88+
self._ndarray = data
89+
self._dtype = state.pop("_dtype")
90+
91+
for key, val in state.items():
92+
setattr(self, key, val)
93+
elif isinstance(state, tuple):
94+
if len(state) != 3:
95+
if len(state) == 1 and isinstance(state[0], dict):
96+
self.__setstate__(state[0])
97+
return
98+
raise NotImplementedError(state)
99+
100+
data, dtype = state[:2]
101+
if isinstance(dtype, np.ndarray):
102+
dtype, data = data, dtype
103+
self._ndarray = data
104+
self._dtype = dtype
105+
106+
if isinstance(state[2], dict):
107+
for key, val in state[2].items():
108+
setattr(self, key, val)
109+
else:
110+
raise NotImplementedError(state)
111+
else:
112+
raise NotImplementedError(state)
113+
114+
def __len__(self) -> int:
115+
return len(self._ndarray)
116+
117+
@property
118+
def shape(self):
119+
# object cast bc _ndarray.shape is npy_intp*
120+
return (<object>(self._ndarray)).shape
121+
122+
@property
123+
def ndim(self) -> int:
124+
return self._ndarray.ndim
125+
126+
@property
127+
def size(self) -> int:
128+
return self._ndarray.size
129+
130+
@property
131+
def nbytes(self) -> int:
132+
return self._ndarray.nbytes
133+
134+
def copy(self):
135+
# NPY_ANYORDER -> same order as self._ndarray
136+
res_values = cnp.PyArray_NewCopy(self._ndarray, cnp.NPY_ANYORDER)
137+
return self._from_backing_data(res_values)
138+
139+
def delete(self, loc, axis=0):
140+
res_values = np.delete(self._ndarray, loc, axis=axis)
141+
return self._from_backing_data(res_values)
142+
143+
def swapaxes(self, axis1, axis2):
144+
res_values = cnp.PyArray_SwapAxes(self._ndarray, axis1, axis2)
145+
return self._from_backing_data(res_values)
146+
147+
# TODO: pass NPY_MAXDIMS equiv to axis=None?
148+
def repeat(self, repeats, axis: int = 0):
149+
if axis is None:
150+
axis = 0
151+
res_values = cnp.PyArray_Repeat(self._ndarray, repeats, <int>axis)
152+
return self._from_backing_data(res_values)
153+
154+
def reshape(self, *args, **kwargs):
155+
res_values = self._ndarray.reshape(*args, **kwargs)
156+
return self._from_backing_data(res_values)
157+
158+
def ravel(self, order="C"):
159+
# cnp.PyArray_OrderConverter(PyObject* obj, NPY_ORDER* order)
160+
# res_values = cnp.PyArray_Ravel(self._ndarray, order)
161+
res_values = self._ndarray.ravel(order)
162+
return self._from_backing_data(res_values)
163+
164+
@property
165+
def T(self):
166+
res_values = self._ndarray.T
167+
return self._from_backing_data(res_values)

0 commit comments

Comments
 (0)