Skip to content

Commit 8afb6e2

Browse files
committed
Merge remote-tracking branch 'upstream/master' into GH35612
2 parents 21a6fbb + 2067d7e commit 8afb6e2

36 files changed

+230
-171
lines changed

doc/source/user_guide/groupby.rst

+3-5
Original file line numberDiff line numberDiff line change
@@ -87,11 +87,9 @@ The mapping can be specified many different ways:
8787
* A Python function, to be called on each of the axis labels.
8888
* A list or NumPy array of the same length as the selected axis.
8989
* A dict or ``Series``, providing a ``label -> group name`` mapping.
90-
* For ``DataFrame`` objects, a string indicating a column to be used to group.
91-
Of course ``df.groupby('A')`` is just syntactic sugar for
92-
``df.groupby(df['A'])``, but it makes life simpler.
93-
* For ``DataFrame`` objects, a string indicating an index level to be used to
94-
group.
90+
* For ``DataFrame`` objects, a string indicating either a column name or
91+
an index level name to be used to group.
92+
* ``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
9593
* A list of any of the above things.
9694

9795
Collectively we refer to the grouping objects as the **keys**. For example,

doc/source/user_guide/indexing.rst

+12-7
Original file line numberDiff line numberDiff line change
@@ -313,8 +313,10 @@ Selection by label
313313
314314
.. warning::
315315

316-
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list with missing labels. In the future
317-
this will raise a ``KeyError``. See :ref:`list-like Using loc with missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>`.
316+
.. versionchanged:: 1.0.0
317+
318+
Pandas will raise a ``KeyError`` if indexing with a list with missing labels. See :ref:`list-like Using loc with
319+
missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>`.
318320

319321
pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
320322
Every label asked for must be in the index, or a ``KeyError`` will be raised.
@@ -578,8 +580,9 @@ IX indexer is deprecated
578580

579581
.. warning::
580582

581-
Starting in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc``
582-
and ``.loc`` indexers.
583+
.. versionchanged:: 1.0.0
584+
585+
The ``.ix`` indexer was removed, in favor of the more strict ``.iloc`` and ``.loc`` indexers.
583586

584587
``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide
585588
to index *positionally* OR via *labels* depending on the data type of the index. This has caused quite a
@@ -636,11 +639,13 @@ Indexing with list with missing labels is deprecated
636639

637640
.. warning::
638641

639-
Starting in 0.21.0, using ``.loc`` or ``[]`` with a list with one or more missing labels, is deprecated, in favor of ``.reindex``.
642+
.. versionchanged:: 1.0.0
643+
644+
Using ``.loc`` or ``[]`` with a list with one or more missing labels will no longer reindex, in favor of ``.reindex``.
640645

641646
In prior versions, using ``.loc[list-of-labels]`` would work as long as *at least 1* of the keys was found (otherwise it
642-
would raise a ``KeyError``). This behavior is deprecated and will show a warning message pointing to this section. The
643-
recommended alternative is to use ``.reindex()``.
647+
would raise a ``KeyError``). This behavior was changed and will now raise a ``KeyError`` if at least one label is missing.
648+
The recommended alternative is to use ``.reindex()``.
644649

645650
For example.
646651

doc/source/user_guide/io.rst

+3-10
Original file line numberDiff line numberDiff line change
@@ -3024,19 +3024,12 @@ It is often the case that users will insert columns to do temporary computations
30243024
in Excel and you may not want to read in those columns. ``read_excel`` takes
30253025
a ``usecols`` keyword to allow you to specify a subset of columns to parse.
30263026

3027-
.. deprecated:: 0.24.0
3027+
.. versionchanged:: 1.0.0
30283028

3029-
Passing in an integer for ``usecols`` has been deprecated. Please pass in a list
3029+
Passing in an integer for ``usecols`` will no longer work. Please pass in a list
30303030
of ints from 0 to ``usecols`` inclusive instead.
30313031

3032-
If ``usecols`` is an integer, then it is assumed to indicate the last column
3033-
to be parsed.
3034-
3035-
.. code-block:: python
3036-
3037-
pd.read_excel('path_to_file.xls', 'Sheet1', usecols=2)
3038-
3039-
You can also specify a comma-delimited set of Excel columns and ranges as a string:
3032+
You can specify a comma-delimited set of Excel columns and ranges as a string:
30403033

30413034
.. code-block:: python
30423035

doc/source/user_guide/timeseries.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -327,11 +327,11 @@ which can be specified. These are computed from the starting point specified by
327327
that was discussed :ref:`above<timeseries.converting.format>`). The
328328
available units are listed on the documentation for :func:`pandas.to_datetime`.
329329

330+
.. versionchanged:: 1.0.0
331+
330332
Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp
331-
with the ``tz`` argument specified will currently localize the epoch timestamps to UTC
332-
first then convert the result to the specified time zone. However, this behavior
333-
is :ref:`deprecated <whatsnew_0240.deprecations.integer_tz>`, and if you have
334-
epochs in wall time in another timezone, it is recommended to read the epochs
333+
with the ``tz`` argument specified will raise a ValueError. If you have
334+
epochs in wall time in another timezone, you can read the epochs
335335
as timezone-naive timestamps and then localize to the appropriate timezone:
336336

337337
.. ipython:: python

doc/source/whatsnew/v1.2.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ For example:
100100

101101
Other enhancements
102102
^^^^^^^^^^^^^^^^^^
103-
104103
- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a ``Series`` or ``DataFrame`` (:issue:`28394`)
104+
- :meth:`DataFrame.applymap` now supports ``na_action`` (:issue:`23803`)
105105
- :class:`Index` with object dtype supports division and multiplication (:issue:`34160`)
106106
- :meth:`DataFrame.explode` and :meth:`Series.explode` now support exploding of sets (:issue:`35614`)
107107
-
@@ -334,7 +334,7 @@ Sparse
334334
ExtensionArray
335335
^^^^^^^^^^^^^^
336336

337-
-
337+
- Fixed Bug where :class:`DataFrame` column set to scalar extension type via a dict instantion was considered an object type rather than the extension type (:issue:`35965`)
338338
-
339339

340340

pandas/_libs/index.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ cdef class IndexEngine:
260260
def get_indexer_non_unique(self, targets):
261261
"""
262262
Return an indexer suitable for taking from a non unique index
263-
return the labels in the same order ast the target
263+
return the labels in the same order as the target
264264
and a missing indexer into the targets (which correspond
265265
to the -1 indices in the results
266266
"""

pandas/_libs/lib.pyx

+7-1
Original file line numberDiff line numberDiff line change
@@ -2377,14 +2377,17 @@ def map_infer_mask(ndarray arr, object f, const uint8_t[:] mask, bint convert=Tr
23772377

23782378
@cython.boundscheck(False)
23792379
@cython.wraparound(False)
2380-
def map_infer(ndarray arr, object f, bint convert=True):
2380+
def map_infer(ndarray arr, object f, bint convert=True, bint ignore_na=False):
23812381
"""
23822382
Substitute for np.vectorize with pandas-friendly dtype inference.
23832383
23842384
Parameters
23852385
----------
23862386
arr : ndarray
23872387
f : function
2388+
convert : bint
2389+
ignore_na : bint
2390+
If True, NA values will not have f applied
23882391
23892392
Returns
23902393
-------
@@ -2398,6 +2401,9 @@ def map_infer(ndarray arr, object f, bint convert=True):
23982401
n = len(arr)
23992402
result = np.empty(n, dtype=object)
24002403
for i in range(n):
2404+
if ignore_na and checknull(arr[i]):
2405+
result[i] = arr[i]
2406+
continue
24012407
val = f(arr[i])
24022408

24032409
if cnp.PyArray_IsZeroDim(val):

pandas/compat/numpy/function.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from distutils.version import LooseVersion
2222
from typing import Any, Dict, Optional, Union
2323

24-
from numpy import __version__ as _np_version, ndarray
24+
from numpy import __version__, ndarray
2525

2626
from pandas._libs.lib import is_bool, is_integer
2727
from pandas.errors import UnsupportedFunctionCall
@@ -122,7 +122,7 @@ def validate_argmax_with_skipna(skipna, args, kwargs):
122122
ARGSORT_DEFAULTS["kind"] = "quicksort"
123123
ARGSORT_DEFAULTS["order"] = None
124124

125-
if LooseVersion(_np_version) >= LooseVersion("1.17.0"):
125+
if LooseVersion(__version__) >= LooseVersion("1.17.0"):
126126
# GH-26361. NumPy added radix sort and changed default to None.
127127
ARGSORT_DEFAULTS["kind"] = None
128128

pandas/core/algorithms.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ def _get_values_for_rank(values):
262262
return values
263263

264264

265-
def _get_data_algo(values):
265+
def get_data_algo(values):
266266
values = _get_values_for_rank(values)
267267

268268
ndtype = _check_object_for_strings(values)
@@ -491,7 +491,7 @@ def factorize_array(
491491
codes : ndarray
492492
uniques : ndarray
493493
"""
494-
hash_klass, values = _get_data_algo(values)
494+
hash_klass, values = get_data_algo(values)
495495

496496
table = hash_klass(size_hint or len(values))
497497
uniques, codes = table.factorize(
@@ -2086,7 +2086,7 @@ def sort_mixed(values):
20862086

20872087
if sorter is None:
20882088
# mixed types
2089-
hash_klass, values = _get_data_algo(values)
2089+
hash_klass, values = get_data_algo(values)
20902090
t = hash_klass(len(values))
20912091
t.map_locations(values)
20922092
sorter = ensure_platform_int(t.lookup(ordered))

pandas/core/arrays/_mixins.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from pandas.errors import AbstractMethodError
77
from pandas.util._decorators import cache_readonly, doc
88

9-
from pandas.core.algorithms import searchsorted, take, unique
9+
from pandas.core.algorithms import take, unique
1010
from pandas.core.array_algos.transforms import shift
1111
from pandas.core.arrays.base import ExtensionArray
1212

@@ -102,6 +102,9 @@ def T(self: _T) -> _T:
102102

103103
# ------------------------------------------------------------------------
104104

105+
def _values_for_argsort(self):
106+
return self._ndarray
107+
105108
def copy(self: _T) -> _T:
106109
new_data = self._ndarray.copy()
107110
return self._from_backing_data(new_data)
@@ -135,7 +138,11 @@ def _concat_same_type(cls, to_concat, axis: int = 0):
135138

136139
@doc(ExtensionArray.searchsorted)
137140
def searchsorted(self, value, side="left", sorter=None):
138-
return searchsorted(self._ndarray, value, side=side, sorter=sorter)
141+
value = self._validate_searchsorted_value(value)
142+
return self._ndarray.searchsorted(value, side=side, sorter=sorter)
143+
144+
def _validate_searchsorted_value(self, value):
145+
return value
139146

140147
@doc(ExtensionArray.shift)
141148
def shift(self, periods=1, fill_value=None, axis=0):

pandas/core/arrays/categorical.py

+14-29
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from pandas._libs import NaT, algos as libalgos, hashtable as htable, lib
1313
from pandas._typing import ArrayLike, Dtype, Ordered, Scalar
1414
from pandas.compat.numpy import function as nv
15-
from pandas.util._decorators import cache_readonly, deprecate_kwarg, doc
15+
from pandas.util._decorators import cache_readonly, deprecate_kwarg
1616
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs
1717

1818
from pandas.core.dtypes.cast import (
@@ -43,14 +43,9 @@
4343
from pandas.core import ops
4444
from pandas.core.accessor import PandasDelegate, delegate_names
4545
import pandas.core.algorithms as algorithms
46-
from pandas.core.algorithms import _get_data_algo, factorize, take_1d, unique1d
46+
from pandas.core.algorithms import factorize, get_data_algo, take_1d, unique1d
4747
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray
48-
from pandas.core.base import (
49-
ExtensionArray,
50-
NoNewAttributesMixin,
51-
PandasObject,
52-
_shared_docs,
53-
)
48+
from pandas.core.base import ExtensionArray, NoNewAttributesMixin, PandasObject
5449
import pandas.core.common as com
5550
from pandas.core.construction import array, extract_array, sanitize_array
5651
from pandas.core.indexers import check_array_indexer, deprecate_ndim_indexing
@@ -63,6 +58,7 @@
6358

6459
def _cat_compare_op(op):
6560
opname = f"__{op.__name__}__"
61+
fill_value = True if op is operator.ne else False
6662

6763
@unpack_zerodim_and_defer(opname)
6864
def func(self, other):
@@ -97,26 +93,23 @@ def func(self, other):
9793
else:
9894
other_codes = other._codes
9995

100-
f = getattr(self._codes, opname)
101-
ret = f(other_codes)
96+
ret = op(self._codes, other_codes)
10297
mask = (self._codes == -1) | (other_codes == -1)
10398
if mask.any():
104-
# In other series, the leads to False, so do that here too
105-
if opname == "__ne__":
106-
ret[(self._codes == -1) & (other_codes == -1)] = True
107-
else:
108-
ret[mask] = False
99+
ret[mask] = fill_value
109100
return ret
110101

111102
if is_scalar(other):
112103
if other in self.categories:
113104
i = self.categories.get_loc(other)
114-
ret = getattr(self._codes, opname)(i)
105+
ret = op(self._codes, i)
115106

116107
if opname not in {"__eq__", "__ge__", "__gt__"}:
117-
# check for NaN needed if we are not equal or larger
108+
# GH#29820 performance trick; get_loc will always give i>=0,
109+
# so in the cases (__ne__, __le__, __lt__) the setting
110+
# here is a no-op, so can be skipped.
118111
mask = self._codes == -1
119-
ret[mask] = False
112+
ret[mask] = fill_value
120113
return ret
121114
else:
122115
return ops.invalid_comparison(self, other, op)
@@ -1315,11 +1308,6 @@ def memory_usage(self, deep=False):
13151308
"""
13161309
return self._codes.nbytes + self.dtype.categories.memory_usage(deep=deep)
13171310

1318-
@doc(_shared_docs["searchsorted"], klass="Categorical")
1319-
def searchsorted(self, value, side="left", sorter=None):
1320-
value = self._validate_searchsorted_value(value)
1321-
return self.codes.searchsorted(value, side=side, sorter=sorter)
1322-
13231311
def isna(self):
13241312
"""
13251313
Detect missing values
@@ -1428,9 +1416,6 @@ def check_for_ordered(self, op):
14281416
"Categorical to an ordered one\n"
14291417
)
14301418

1431-
def _values_for_argsort(self):
1432-
return self._codes
1433-
14341419
def argsort(self, ascending=True, kind="quicksort", **kwargs):
14351420
"""
14361421
Return the indices that would sort the Categorical.
@@ -1879,7 +1864,7 @@ def __getitem__(self, key):
18791864
if result.ndim > 1:
18801865
deprecate_ndim_indexing(result)
18811866
return result
1882-
return self._constructor(result, dtype=self.dtype, fastpath=True)
1867+
return self._from_backing_data(result)
18831868

18841869
def __setitem__(self, key, value):
18851870
"""
@@ -2546,8 +2531,8 @@ def _get_codes_for_values(values, categories):
25462531

25472532
# Only hit here when we've already coerced to object dtypee.
25482533

2549-
hash_klass, vals = _get_data_algo(values)
2550-
_, cats = _get_data_algo(categories)
2534+
hash_klass, vals = get_data_algo(values)
2535+
_, cats = get_data_algo(categories)
25512536
t = hash_klass(len(cats))
25522537
t.map_locations(cats)
25532538
return coerce_indexer_dtype(t.lookup(vals), cats)

0 commit comments

Comments
 (0)