Skip to content

Commit 99fbf2a

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into ERR/imporve_error_message_cut/qcut
2 parents c0662fb + 2a3d840 commit 99fbf2a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+1197
-577
lines changed

doc/source/development/contributing_docstring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ DataFrame:
399399
* DataFrame
400400
* pandas.Index
401401
* pandas.Categorical
402-
* pandas.SparseArray
402+
* pandas.arrays.SparseArray
403403

404404
If the exact type is not relevant, but must be compatible with a numpy
405405
array, array-like can be specified. If Any type that can be iterated is

doc/source/getting_started/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1951,7 +1951,7 @@ documentation sections for more on each type.
19511951
| period | :class:`PeriodDtype` | :class:`Period` | :class:`arrays.PeriodArray` | ``'period[<freq>]'``, | :ref:`timeseries.periods` |
19521952
| (time spans) | | | | ``'Period[<freq>]'`` | |
19531953
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1954-
| sparse | :class:`SparseDtype` | (none) | :class:`SparseArray` | ``'Sparse'``, ``'Sparse[int]'``, | :ref:`sparse` |
1954+
| sparse | :class:`SparseDtype` | (none) | :class:`arrays.SparseArray` | ``'Sparse'``, ``'Sparse[int]'``, | :ref:`sparse` |
19551955
| | | | | ``'Sparse[float]'`` | |
19561956
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
19571957
| intervals | :class:`IntervalDtype` | :class:`Interval` | :class:`arrays.IntervalArray` | ``'interval'``, ``'Interval'``, | :ref:`advanced.intervalindex` |

doc/source/getting_started/dsintro.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -676,11 +676,11 @@ similar to an ndarray:
676676
# only show the first 5 rows
677677
df[:5].T
678678
679+
.. _dsintro.numpy_interop:
680+
679681
DataFrame interoperability with NumPy functions
680682
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
681683

682-
.. _dsintro.numpy_interop:
683-
684684
Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
685685
can be used with no issues on Series and DataFrame, assuming the data within
686686
are numeric:
@@ -741,7 +741,7 @@ implementation takes precedence and a Series is returned.
741741
np.maximum(ser, idx)
742742
743743
NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
744-
for example :class:`SparseArray` (see :ref:`sparse.calculation`). If possible,
744+
for example :class:`arrays.SparseArray` (see :ref:`sparse.calculation`). If possible,
745745
the ufunc is applied without converting the underlying data to an ndarray.
746746

747747
Console display

doc/source/reference/arrays.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -444,13 +444,13 @@ Sparse data
444444
-----------
445445

446446
Data where a single value is repeated many times (e.g. ``0`` or ``NaN``) may
447-
be stored efficiently as a :class:`SparseArray`.
447+
be stored efficiently as a :class:`arrays.SparseArray`.
448448

449449
.. autosummary::
450450
:toctree: api/
451451
:template: autosummary/class_without_autosummary.rst
452452

453-
SparseArray
453+
arrays.SparseArray
454454

455455
.. autosummary::
456456
:toctree: api/

doc/source/user_guide/missing_data.rst

+26
Original file line numberDiff line numberDiff line change
@@ -920,3 +920,29 @@ filling missing values beforehand.
920920

921921
A similar situation occurs when using Series or DataFrame objects in ``if``
922922
statements, see :ref:`gotchas.truth`.
923+
924+
NumPy ufuncs
925+
------------
926+
927+
:attr:`pandas.NA` implements NumPy's ``__array_ufunc__`` protocol. Most ufuncs
928+
work with ``NA``, and generally return ``NA``:
929+
930+
.. ipython:: python
931+
932+
np.log(pd.NA)
933+
np.add(pd.NA, 1)
934+
935+
.. warning::
936+
937+
Currently, ufuncs involving an ndarray and ``NA`` will return an
938+
object-dtype filled with NA values.
939+
940+
.. ipython:: python
941+
942+
a = np.array([1, 2, 3])
943+
np.greater(a, pd.NA)
944+
945+
The return type here may change to return a different array type
946+
in the future.
947+
948+
See :ref:`dsintro.numpy_interop` for more on ufuncs.

doc/source/user_guide/sparse.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ can be chosen, including 0) is omitted. The compressed values are not actually s
1515
1616
arr = np.random.randn(10)
1717
arr[2:-2] = np.nan
18-
ts = pd.Series(pd.SparseArray(arr))
18+
ts = pd.Series(pd.arrays.SparseArray(arr))
1919
ts
2020
2121
Notice the dtype, ``Sparse[float64, nan]``. The ``nan`` means that elements in the
@@ -51,7 +51,7 @@ identical to their dense counterparts.
5151
SparseArray
5252
-----------
5353

54-
:class:`SparseArray` is a :class:`~pandas.api.extensions.ExtensionArray`
54+
:class:`arrays.SparseArray` is a :class:`~pandas.api.extensions.ExtensionArray`
5555
for storing an array of sparse values (see :ref:`basics.dtypes` for more
5656
on extension arrays). It is a 1-dimensional ndarray-like object storing
5757
only values distinct from the ``fill_value``:
@@ -61,7 +61,7 @@ only values distinct from the ``fill_value``:
6161
arr = np.random.randn(10)
6262
arr[2:5] = np.nan
6363
arr[7:8] = np.nan
64-
sparr = pd.SparseArray(arr)
64+
sparr = pd.arrays.SparseArray(arr)
6565
sparr
6666
6767
A sparse array can be converted to a regular (dense) ndarray with :meth:`numpy.asarray`
@@ -144,7 +144,7 @@ to ``SparseArray`` and get a ``SparseArray`` as a result.
144144

145145
.. ipython:: python
146146
147-
arr = pd.SparseArray([1., np.nan, np.nan, -2., np.nan])
147+
arr = pd.arrays.SparseArray([1., np.nan, np.nan, -2., np.nan])
148148
np.abs(arr)
149149
150150
@@ -153,7 +153,7 @@ the correct dense result.
153153

154154
.. ipython:: python
155155
156-
arr = pd.SparseArray([1., -1, -1, -2., -1], fill_value=-1)
156+
arr = pd.arrays.SparseArray([1., -1, -1, -2., -1], fill_value=-1)
157157
np.abs(arr)
158158
np.abs(arr).to_dense()
159159
@@ -194,7 +194,7 @@ From an array-like, use the regular :class:`Series` or
194194
.. ipython:: python
195195
196196
# New way
197-
pd.DataFrame({"A": pd.SparseArray([0, 1])})
197+
pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
198198
199199
From a SciPy sparse matrix, use :meth:`DataFrame.sparse.from_spmatrix`,
200200

@@ -256,10 +256,10 @@ Instead, you'll need to ensure that the values being assigned are sparse
256256

257257
.. ipython:: python
258258
259-
df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
259+
df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
260260
df['B'] = [0, 0] # remains dense
261261
df['B'].dtype
262-
df['B'] = pd.SparseArray([0, 0])
262+
df['B'] = pd.arrays.SparseArray([0, 0])
263263
df['B'].dtype
264264
265265
The ``SparseDataFrame.default_kind`` and ``SparseDataFrame.default_fill_value`` attributes

doc/source/whatsnew/v0.19.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1225,6 +1225,7 @@ Previously, sparse data were ``float64`` dtype by default, even if all inputs we
12251225
As of v0.19.0, sparse data keeps the input dtype, and uses more appropriate ``fill_value`` defaults (``0`` for ``int64`` dtype, ``False`` for ``bool`` dtype).
12261226

12271227
.. ipython:: python
1228+
:okwarning:
12281229
12291230
pd.SparseArray([1, 2, 0, 0], dtype=np.int64)
12301231
pd.SparseArray([True, False, False, False])

doc/source/whatsnew/v0.25.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,7 @@ When passed DataFrames whose values are sparse, :func:`concat` will now return a
354354
:class:`Series` or :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`25702`).
355355

356356
.. ipython:: python
357+
:okwarning:
357358
358359
df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
359360
@@ -910,6 +911,7 @@ by a ``Series`` or ``DataFrame`` with sparse values.
910911
**New way**
911912
912913
.. ipython:: python
914+
:okwarning:
913915
914916
df = pd.DataFrame({"A": pd.SparseArray([0, 0, 1, 2])})
915917
df.dtypes

doc/source/whatsnew/v1.0.0.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -568,7 +568,7 @@ Deprecations
568568
- :func:`eval` keyword argument "truediv" is deprecated and will be removed in a future version (:issue:`29812`)
569569
- :meth:`DateOffset.isAnchored` and :meth:`DatetOffset.onOffset` are deprecated and will be removed in a future version, use :meth:`DateOffset.is_anchored` and :meth:`DateOffset.is_on_offset` instead (:issue:`30340`)
570570
- ``pandas.tseries.frequencies.get_offset`` is deprecated and will be removed in a future version, use ``pandas.tseries.frequencies.to_offset`` instead (:issue:`4205`)
571-
- :meth:`Categorical.take_nd` is deprecated, use :meth:`Categorical.take` instead (:issue:`27745`)
571+
- :meth:`Categorical.take_nd` and :meth:`CategoricalIndex.take_nd` are deprecated, use :meth:`Categorical.take` and :meth:`CategoricalIndex.take` instead (:issue:`27745`)
572572
- The parameter ``numeric_only`` of :meth:`Categorical.min` and :meth:`Categorical.max` is deprecated and replaced with ``skipna`` (:issue:`25303`)
573573
- The parameter ``label`` in :func:`lreshape` has been deprecated and will be removed in a future version (:issue:`29742`)
574574
- ``pandas.core.index`` has been deprecated and will be removed in a future version, the public classes are available in the top-level namespace (:issue:`19711`)
@@ -578,6 +578,7 @@ Deprecations
578578
- :meth:`DataFrame.to_stata`, :meth:`DataFrame.to_feather`, and :meth:`DataFrame.to_parquet` argument "fname" is deprecated, use "path" instead (:issue:`23574`)
579579
- The deprecated internal attributes ``_start``, ``_stop`` and ``_step`` of :class:`RangeIndex` now raise a ``FutureWarning`` instead of a ``DeprecationWarning`` (:issue:`26581`)
580580
- The ``pandas.util.testing`` module has been deprecated. Use the public API in ``pandas.testing`` documented at :ref:`api.general.testing` (:issue:`16232`).
581+
- ``pandas.SparseArray`` has been deprecated. Use ``pandas.arrays.SparseArray`` (:class:`arrays.SparseArray`) instead. (:issue:`30642`)
581582

582583
**Selecting Columns from a Grouped DataFrame**
583584

@@ -833,6 +834,7 @@ Datetimelike
833834
- Bug in :meth:`Series.cummin` and :meth:`Series.cummax` with timezone-aware dtype incorrectly dropping its timezone (:issue:`15553`)
834835
- Bug in :class:`DatetimeArray`, :class:`TimedeltaArray`, and :class:`PeriodArray` where inplace addition and subtraction did not actually operate inplace (:issue:`24115`)
835836
- Bug in :func:`pandas.to_datetime` when called with ``Series`` storing ``IntegerArray`` raising ``TypeError`` instead of returning ``Series`` (:issue:`30050`)
837+
- Bug in :func:`date_range` with custom business hours as ``freq`` and given number of ``periods`` (:issue:`30593`)
836838

837839
Timedelta
838840
^^^^^^^^^
@@ -883,6 +885,7 @@ Interval
883885
- Bug in :meth:`IntervalIndex.get_indexer` where a :class:`Categorical` or :class:`CategoricalIndex` ``target`` would incorrectly raise a ``TypeError`` (:issue:`30063`)
884886
- Bug in ``pandas.core.dtypes.cast.infer_dtype_from_scalar`` where passing ``pandas_dtype=True`` did not infer :class:`IntervalDtype` (:issue:`30337`)
885887
- Bug in :class:`IntervalDtype` where the ``kind`` attribute was incorrectly set as ``None`` instead of ``"O"`` (:issue:`30568`)
888+
- Bug in :class:`IntervalIndex`, :class:`~arrays.IntervalArray`, and :class:`Series` with interval data where equality comparisons were incorrect (:issue:`24112`)
886889

887890
Indexing
888891
^^^^^^^^
@@ -907,7 +910,7 @@ Missing
907910
MultiIndex
908911
^^^^^^^^^^
909912

910-
- Constructior for :class:`MultiIndex` verifies that the given ``sortorder`` is compatible with the actual ``lexsort_depth`` if ``verify_integrity`` parameter is ``True`` (the default) (:issue:`28735`)
913+
- Constructor for :class:`MultiIndex` verifies that the given ``sortorder`` is compatible with the actual ``lexsort_depth`` if ``verify_integrity`` parameter is ``True`` (the default) (:issue:`28735`)
911914
- Series and MultiIndex `.drop` with `MultiIndex` raise exception if labels not in given in level (:issue:`8594`)
912915
-
913916

pandas/__init__.py

+17-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@
115115
DataFrame,
116116
)
117117

118-
from pandas.core.arrays.sparse import SparseArray, SparseDtype
118+
from pandas.core.arrays.sparse import SparseDtype
119119

120120
from pandas.tseries.api import infer_freq
121121
from pandas.tseries import offsets
@@ -246,6 +246,19 @@ class Panel:
246246

247247
return type(name, (), {})
248248

249+
elif name == "SparseArray":
250+
251+
warnings.warn(
252+
"The pandas.SparseArray class is deprecated "
253+
"and will be removed from pandas in a future version. "
254+
"Use pandas.arrays.SparseArray instead.",
255+
FutureWarning,
256+
stacklevel=2,
257+
)
258+
from pandas.core.arrays.sparse import SparseArray as _SparseArray
259+
260+
return _SparseArray
261+
249262
raise AttributeError(f"module 'pandas' has no attribute '{name}'")
250263

251264

@@ -308,6 +321,9 @@ def __getattr__(self, item):
308321

309322
datetime = __Datetime().datetime
310323

324+
class SparseArray:
325+
pass
326+
311327

312328
# module level doc-string
313329
__doc__ = """

pandas/_libs/algos_take_helper.pxi.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ def take_2d_axis0_{{name}}_{{dest}}(ndarray[{{c_type_in}}, ndim=2] values,
116116

117117
IF {{True if c_type_in == c_type_out != "object" else False}}:
118118
cdef:
119-
{{c_type_out}} *v
119+
const {{c_type_out}} *v
120120
{{c_type_out}} *o
121121

122122
# GH#3130

pandas/_libs/hashtable.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cimport cython
22

33
from cpython.ref cimport PyObject, Py_INCREF
4-
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
4+
from cpython.mem cimport PyMem_Malloc, PyMem_Free
55

66
from libc.stdlib cimport malloc, free
77

pandas/_libs/missing.pyx

+48-5
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ from pandas._libs.tslibs.np_datetime cimport (
1414
get_timedelta64_value, get_datetime64_value)
1515
from pandas._libs.tslibs.nattype cimport (
1616
checknull_with_nat, c_NaT as NaT, is_null_datetimelike)
17+
from pandas._libs.ops_dispatch import maybe_dispatch_ufunc_to_dunder_op
1718

1819
from pandas.compat import is_platform_32bit
1920

@@ -290,16 +291,29 @@ cdef inline bint is_null_period(v):
290291
# Implementation of NA singleton
291292

292293

293-
def _create_binary_propagating_op(name, divmod=False):
294+
def _create_binary_propagating_op(name, is_divmod=False):
294295

295296
def method(self, other):
296297
if (other is C_NA or isinstance(other, str)
297-
or isinstance(other, (numbers.Number, np.bool_))):
298-
if divmod:
298+
or isinstance(other, (numbers.Number, np.bool_))
299+
or isinstance(other, np.ndarray) and not other.shape):
300+
# Need the other.shape clause to handle NumPy scalars,
301+
# since we do a setitem on `out` below, which
302+
# won't work for NumPy scalars.
303+
if is_divmod:
299304
return NA, NA
300305
else:
301306
return NA
302307

308+
elif isinstance(other, np.ndarray):
309+
out = np.empty(other.shape, dtype=object)
310+
out[:] = NA
311+
312+
if is_divmod:
313+
return out, out.copy()
314+
else:
315+
return out
316+
303317
return NotImplemented
304318

305319
method.__name__ = name
@@ -369,8 +383,8 @@ class NAType(C_NAType):
369383
__rfloordiv__ = _create_binary_propagating_op("__rfloordiv__")
370384
__mod__ = _create_binary_propagating_op("__mod__")
371385
__rmod__ = _create_binary_propagating_op("__rmod__")
372-
__divmod__ = _create_binary_propagating_op("__divmod__", divmod=True)
373-
__rdivmod__ = _create_binary_propagating_op("__rdivmod__", divmod=True)
386+
__divmod__ = _create_binary_propagating_op("__divmod__", is_divmod=True)
387+
__rdivmod__ = _create_binary_propagating_op("__rdivmod__", is_divmod=True)
374388
# __lshift__ and __rshift__ are not implemented
375389

376390
__eq__ = _create_binary_propagating_op("__eq__")
@@ -397,6 +411,8 @@ class NAType(C_NAType):
397411
return type(other)(1)
398412
else:
399413
return NA
414+
elif isinstance(other, np.ndarray):
415+
return np.where(other == 0, other.dtype.type(1), NA)
400416

401417
return NotImplemented
402418

@@ -408,6 +424,8 @@ class NAType(C_NAType):
408424
return other
409425
else:
410426
return NA
427+
elif isinstance(other, np.ndarray):
428+
return np.where((other == 1) | (other == -1), other, NA)
411429

412430
return NotImplemented
413431

@@ -440,6 +458,31 @@ class NAType(C_NAType):
440458

441459
__rxor__ = __xor__
442460

461+
__array_priority__ = 1000
462+
_HANDLED_TYPES = (np.ndarray, numbers.Number, str, np.bool_)
463+
464+
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
465+
types = self._HANDLED_TYPES + (NAType,)
466+
for x in inputs:
467+
if not isinstance(x, types):
468+
return NotImplemented
469+
470+
if method != "__call__":
471+
raise ValueError(f"ufunc method '{method}' not supported for NA")
472+
result = maybe_dispatch_ufunc_to_dunder_op(
473+
self, ufunc, method, *inputs, **kwargs
474+
)
475+
if result is NotImplemented:
476+
# For a NumPy ufunc that's not a binop, like np.logaddexp
477+
index = [i for i, x in enumerate(inputs) if x is NA][0]
478+
result = np.broadcast_arrays(*inputs)[index]
479+
if result.ndim == 0:
480+
result = result.item()
481+
if ufunc.nout > 1:
482+
result = (NA,) * ufunc.nout
483+
484+
return result
485+
443486

444487
C_NA = NAType() # C-visible
445488
NA = C_NA # Python-visible

0 commit comments

Comments
 (0)