Skip to content

Commit db2f38a

Browse files
authored
Merge branch 'pandas-dev:master' into value_counts-with-duplicate-labels
2 parents 6b03989 + 9a4fcea commit db2f38a

File tree

108 files changed

+1365
-822
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+1365
-822
lines changed

.pre-commit-config.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
- id: absolufy-imports
1010
files: ^pandas/
1111
- repo: https://github.com/python/black
12-
rev: 21.11b1
12+
rev: 21.12b0
1313
hooks:
1414
- id: black
1515
- repo: https://github.com/codespell-project/codespell
@@ -19,7 +19,7 @@ repos:
1919
types_or: [python, rst, markdown]
2020
files: ^(pandas|doc)/
2121
- repo: https://github.com/pre-commit/pre-commit-hooks
22-
rev: v4.0.1
22+
rev: v4.1.0
2323
hooks:
2424
- id: debug-statements
2525
- id: end-of-file-fixer
@@ -49,7 +49,7 @@ repos:
4949
hooks:
5050
- id: isort
5151
- repo: https://github.com/asottile/pyupgrade
52-
rev: v2.29.1
52+
rev: v2.31.0
5353
hooks:
5454
- id: pyupgrade
5555
args: [--py38-plus]

doc/source/reference/indexing.rst

-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,6 @@ Numeric Index
170170
:toctree: api/
171171
:template: autosummary/class_without_autosummary.rst
172172

173-
NumericIndex
174173
RangeIndex
175174
Int64Index
176175
UInt64Index

doc/source/user_guide/advanced.rst

+2-35
Original file line numberDiff line numberDiff line change
@@ -852,10 +852,9 @@ Int64Index and RangeIndex
852852
~~~~~~~~~~~~~~~~~~~~~~~~~
853853

854854
.. deprecated:: 1.4.0
855-
In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
855+
In pandas 2.0, :class:`Index` will become the default index type for numeric types
856856
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
857857
are therefore deprecated and will be removed in a futire version.
858-
See :ref:`here <advanced.numericindex>` for more.
859858
``RangeIndex`` will not be removed, as it represents an optimized version of an integer index.
860859

861860
:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
@@ -870,10 +869,9 @@ Float64Index
870869
~~~~~~~~~~~~
871870

872871
.. deprecated:: 1.4.0
873-
:class:`NumericIndex` will become the default index type for numeric types in the future
872+
:class:`Index` will become the default index type for numeric types in the future
874873
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
875874
are therefore deprecated and will be removed in a future version of Pandas.
876-
See :ref:`here <advanced.numericindex>` for more.
877875
``RangeIndex`` will not be removed as it represents an optimized version of an integer index.
878876

879877
By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
@@ -971,37 +969,6 @@ If you need integer based selection, you should use ``iloc``:
971969
dfir.iloc[0:5]
972970
973971
974-
.. _advanced.numericindex:
975-
976-
NumericIndex
977-
~~~~~~~~~~~~
978-
979-
.. versionadded:: 1.4.0
980-
981-
.. note::
982-
983-
In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
984-
instead of :class:`Int64Index`, :class:`Float64Index` and :class:`UInt64Index` and those index types
985-
are therefore deprecated and will be removed in a future version.
986-
:class:`RangeIndex` will not be removed as it represents an optimized version of an integer index.
987-
988-
:class:`NumericIndex` is an index type that can hold data of any numpy int/uint/float dtype. For example:
989-
990-
.. ipython:: python
991-
992-
idx = pd.NumericIndex([1, 2, 4, 5], dtype="int8")
993-
idx
994-
ser = pd.Series(range(4), index=idx)
995-
ser
996-
997-
``NumericIndex`` works the same way as the existing ``Int64Index``, ``Float64Index`` and
998-
``UInt64Index`` except that it can hold any numpy int, uint or float dtype.
999-
1000-
Until Pandas 2.0, you will have to call ``NumericIndex`` explicitly in order to use it, like in the example above.
1001-
In the future, ``NumericIndex`` will become the default pandas numeric index type and will automatically be used where appropriate.
1002-
1003-
Please notice that ``NumericIndex`` *can not* hold Pandas numeric dtypes (:class:`Int64Dtype`, :class:`Int32Dtype` etc.).
1004-
1005972
.. _advanced.intervalindex:
1006973

1007974
IntervalIndex

doc/source/whatsnew/v0.16.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ When the function you wish to apply takes its data anywhere other than the first
6262
of ``(function, keyword)`` indicating where the DataFrame should flow. For example:
6363

6464
.. ipython:: python
65+
:okwarning:
6566
6667
import statsmodels.formula.api as sm
6768

doc/source/whatsnew/v1.4.0.rst

+57-37
Original file line numberDiff line numberDiff line change
@@ -40,55 +40,43 @@ This made it difficult to determine where the warning was being generated from.
4040
A value is trying to be set on a copy of a slice from a DataFrame.
4141

4242

43-
.. _whatsnew_140.enhancements.numeric_index:
4443

45-
More flexible numeric dtypes for indexes
46-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4744

48-
Until now, it has only been possible to create numeric indexes with int64/float64/uint64 dtypes.
49-
It is now possible to create an index of any numpy int/uint/float dtype using the new :class:`NumericIndex` index type (:issue:`41153`):
45+
.. _whatsnew_140.enhancements.ExtensionIndex:
5046

51-
.. ipython:: python
47+
Index can hold arbitrary ExtensionArrays
48+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5249

53-
pd.NumericIndex([1, 2, 3], dtype="int8")
54-
pd.NumericIndex([1, 2, 3], dtype="uint32")
55-
pd.NumericIndex([1, 2, 3], dtype="float32")
50+
Until now, passing a custom :class:`ExtensionArray` to ``pd.Index`` would cast the
51+
array to ``object`` dtype. Now :class:`Index` can directly hold arbitrary ExtensionArrays (:issue:`43930`).
5652

57-
In order to maintain backwards compatibility, calls to the base :class:`Index` will currently
58-
return :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`, where relevant.
59-
For example, the code below returns an ``Int64Index`` with dtype ``int64``:
53+
*Previous behavior*:
6054

61-
.. code-block:: ipython
55+
.. ipython:: python
6256
63-
In [1]: pd.Index([1, 2, 3], dtype="int8")
64-
Int64Index([1, 2, 3], dtype='int64')
57+
arr = pd.array([1, 2, pd.NA])
58+
idx = pd.Index(arr)
6559
66-
but will in a future version return a :class:`NumericIndex` with dtype ``int8``.
60+
In the old behavior, ``idx`` would be object-dtype:
6761

68-
More generally, currently, all operations that until now have
69-
returned :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index` will
70-
continue to so. This means, that in order to use ``NumericIndex`` in the current version, you
71-
will have to call ``NumericIndex`` explicitly. For example the below series will have an ``Int64Index``:
62+
*Previous behavior*:
7263

7364
.. code-block:: ipython
7465
75-
In [2]: ser = pd.Series([1, 2, 3], index=[1, 2, 3])
76-
In [3]: ser.index
77-
Int64Index([1, 2, 3], dtype='int64')
66+
In [1]: idx
67+
Out[1]: Index([1, 2, <NA>], dtype='object')
7868
79-
Instead, if you want to use a ``NumericIndex``, you should do:
69+
With the new behavior, we keep the original dtype:
8070

81-
.. ipython:: python
71+
*New behavior*:
8272

83-
idx = pd.NumericIndex([1, 2, 3], dtype="int8")
84-
ser = pd.Series([1, 2, 3], index=idx)
85-
ser.index
73+
.. ipython:: python
8674
87-
In a future version of Pandas, :class:`NumericIndex` will become the default numeric index type and
88-
``Int64Index``, ``UInt64Index`` and ``Float64Index`` are therefore deprecated and will
89-
be removed in the future, see :ref:`here <whatsnew_140.deprecations.int64_uint64_float64index>` for more.
75+
idx
9076
91-
See :ref:`here <advanced.numericindex>` for more about :class:`NumericIndex`.
77+
One exception to this is ``SparseArray``, which will continue to cast to numpy
78+
dtype until pandas 2.0. At that point it will retain its dtype like other
79+
ExtensionArrays.
9280

9381
.. _whatsnew_140.enhancements.styler:
9482

@@ -236,7 +224,7 @@ Other enhancements
236224
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`)
237225
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`).
238226
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
239-
-
227+
- :meth:`DataFrame.to_sql` now returns an ``int`` of the number of written rows (:issue:`23998`)
240228

241229

242230
.. ---------------------------------------------------------------------------
@@ -504,12 +492,33 @@ Deprecations
504492

505493
Deprecated Int64Index, UInt64Index & Float64Index
506494
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
495+
507496
:class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index` have been deprecated
508-
in favor of the new :class:`NumericIndex` and will be removed in Pandas 2.0 (:issue:`43028`).
497+
in favor of the base :class:`Index` class and will be removed in Pandas 2.0 (:issue:`43028`).
498+
499+
For constructing a numeric index, you can use the base :class:`Index` class instead
500+
specifying the data type (which will also work on older pandas releases):
501+
502+
.. code-block:: python
503+
504+
# replace
505+
pd.Int64Index([1, 2, 3])
506+
# with
507+
pd.Index([1, 2, 3], dtype="int64")
508+
509+
For checking the data type of an index object, you can replace ``isinstance``
510+
checks with checking the ``dtype``:
511+
512+
.. code-block:: python
513+
514+
# replace
515+
isinstance(idx, pd.Int64Index)
516+
# with
517+
idx.dtype == "int64"
509518
510519
Currently, in order to maintain backward compatibility, calls to
511520
:class:`Index` will continue to return :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`
512-
when given numeric data, but in the future, a :class:`NumericIndex` will be returned.
521+
when given numeric data, but in the future, an :class:`Index` will be returned.
513522

514523
*Current behavior*:
515524

@@ -525,9 +534,9 @@ when given numeric data, but in the future, a :class:`NumericIndex` will be retu
525534
.. code-block:: ipython
526535
527536
In [3]: pd.Index([1, 2, 3], dtype="int32")
528-
Out [3]: NumericIndex([1, 2, 3], dtype='int32')
537+
Out [3]: Index([1, 2, 3], dtype='int32')
529538
In [4]: pd.Index([1, 2, 3], dtype="uint64")
530-
Out [4]: NumericIndex([1, 2, 3], dtype='uint64')
539+
Out [4]: Index([1, 2, 3], dtype='uint64')
531540
532541
533542
.. _whatsnew_140.deprecations.frame_series_append:
@@ -603,6 +612,7 @@ Other Deprecations
603612
- Deprecated passing non boolean argument to sort in :func:`concat` (:issue:`41518`)
604613
- Deprecated passing arguments as positional for :func:`read_fwf` other than ``filepath_or_buffer`` (:issue:`41485`):
605614
- Deprecated passing ``skipna=None`` for :meth:`DataFrame.mad` and :meth:`Series.mad`, pass ``skipna=True`` instead (:issue:`44580`)
615+
- Deprecated the behavior of :func:`to_datetime` with the string "now" with ``utc=False``; in a future version this will match ``Timestamp("now")``, which in turn matches :meth:`Timestamp.now` returning the local time (:issue:`18705`)
606616
- Deprecated :meth:`DateOffset.apply`, use ``offset + other`` instead (:issue:`44522`)
607617
- Deprecated parameter ``names`` in :meth:`Index.copy` (:issue:`44916`)
608618
- A deprecation warning is now shown for :meth:`DataFrame.to_latex` indicating the arguments signature may change and emulate more the arguments to :meth:`.Styler.to_latex` in future versions (:issue:`44411`)
@@ -619,8 +629,10 @@ Other Deprecations
619629
- Deprecated ``numeric_only=None`` in :meth:`DataFrame.rank`; in a future version ``numeric_only`` must be either ``True`` or ``False`` (the default) (:issue:`45036`)
620630
- Deprecated the behavior of :meth:`Timestamp.utcfromtimestamp`, in the future it will return a timezone-aware UTC :class:`Timestamp` (:issue:`22451`)
621631
- Deprecated :meth:`NaT.freq` (:issue:`45071`)
632+
- Deprecated behavior of :class:`Series` and :class:`DataFrame` construction when passed float-dtype data containing ``NaN`` and an integer dtype ignoring the dtype argument; in a future version this will raise (:issue:`40110`)
622633
-
623634

635+
624636
.. ---------------------------------------------------------------------------
625637
626638
.. _whatsnew_140.performance:
@@ -718,6 +730,8 @@ Timedelta
718730
^^^^^^^^^
719731
- Bug in division of all-``NaT`` :class:`TimeDeltaIndex`, :class:`Series` or :class:`DataFrame` column with object-dtype arraylike of numbers failing to infer the result as timedelta64-dtype (:issue:`39750`)
720732
- Bug in floor division of ``timedelta64[ns]`` data with a scalar returning garbage values (:issue:`44466`)
733+
- Bug in :class:`Timedelta` now properly taking into account any nanoseconds contribution of any kwarg (:issue:`43764`)
734+
-
721735

722736
Timezones
723737
^^^^^^^^^
@@ -800,6 +814,7 @@ Indexing
800814
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` not handling targets of ``dtype`` 'object' with NaNs correctly (:issue:`44482`)
801815
- Fixed regression where a single column ``np.matrix`` was no longer coerced to a 1d ``np.ndarray`` when added to a :class:`DataFrame` (:issue:`42376`)
802816
- Bug in :meth:`Series.__getitem__` with a :class:`CategoricalIndex` of integers treating lists of integers as positional indexers, inconsistent with the behavior with a single scalar integer (:issue:`15470`, :issue:`14865`)
817+
- Bug in :meth:`Series.__setitem__` when setting floats or integers into integer-dtype series failing to upcast when necessary to retain precision (:issue:`45121`)
803818
-
804819

805820
Missing
@@ -870,6 +885,8 @@ Period
870885
- Bug in :meth:`PeriodIndex.to_timestamp` when the index has ``freq="B"`` inferring ``freq="D"`` for its result instead of ``freq="B"`` (:issue:`44105`)
871886
- Bug in :class:`Period` constructor incorrectly allowing ``np.timedelta64("NaT")`` (:issue:`44507`)
872887
- Bug in :meth:`PeriodIndex.to_timestamp` giving incorrect values for indexes with non-contiguous data (:issue:`44100`)
888+
- Bug in :meth:`Series.where` with ``PeriodDtype`` incorrectly raising when the ``where`` call should not replace anything (:issue:`45135`)
889+
873890
-
874891

875892
Plotting
@@ -899,6 +916,7 @@ Groupby/resample/rolling
899916
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` (:issue:`43926`)
900917
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not respecting right bound on centered datetime-like windows, if the index contain duplicates (:issue:`3944`)
901918
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
919+
- Bug in :meth:`Groupby.nunique` not respecting ``observed=True`` for Categorical grouping columns (:issue:`45128`)
902920
- Bug in :meth:`GroupBy.head` and :meth:`GroupBy.tail` not dropping groups with ``NaN`` when ``dropna=True`` (:issue:`45089`)
903921
- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`#44821`)
904922
- Bug in :meth:`Groupby.rolling` when non-monotonic data passed, fails to correctly raise ``ValueError`` (:issue:`43909`)
@@ -924,6 +942,7 @@ Reshaping
924942
- Bug in :meth:`Series.unstack` with object doing unwanted type inference on resulting columns (:issue:`44595`)
925943
- Bug in :class:`MultiIndex` failing join operations with overlapping ``IntervalIndex`` levels (:issue:`44096`)
926944
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` results is different ``dtype`` based on ``regex`` parameter (:issue:`44864`)
945+
- Bug in :meth:`DataFrame.pivot` with ``index=None`` when the :class:`DataFrame` index was a :class:`MultiIndex` (:issue:`23955`)
927946

928947
Sparse
929948
^^^^^^
@@ -940,6 +959,7 @@ ExtensionArray
940959
- Bug in :func:`array` failing to preserve :class:`PandasArray` (:issue:`43887`)
941960
- NumPy ufuncs ``np.abs``, ``np.positive``, ``np.negative`` now correctly preserve dtype when called on ExtensionArrays that implement ``__abs__, __pos__, __neg__``, respectively. In particular this is fixed for :class:`TimedeltaArray` (:issue:`43899`, :issue:`23316`)
942961
- NumPy ufuncs ``np.minimum.reduce`` ``np.maximum.reduce``, ``np.add.reduce``, and ``np.prod.reduce`` now work correctly instead of raising ``NotImplementedError`` on :class:`Series` with ``IntegerDtype`` or ``FloatDtype`` (:issue:`43923`, :issue:`44793`)
962+
- NumPy ufuncs with ``out`` keyword are now supported by arrays with ``IntegerDtype`` and ``FloatingDtype`` (:issue:`45122`)
943963
- Avoid raising ``PerformanceWarning`` about fragmented DataFrame when using many columns with an extension dtype (:issue:`44098`)
944964
- Bug in :class:`IntegerArray` and :class:`FloatingArray` construction incorrectly coercing mismatched NA values (e.g. ``np.timedelta64("NaT")``) to numeric NA (:issue:`44514`)
945965
- Bug in :meth:`BooleanArray.__eq__` and :meth:`BooleanArray.__ne__` raising ``TypeError`` on comparison with an incompatible type (like a string). This caused :meth:`DataFrame.replace` to sometimes raise a ``TypeError`` if a nullable boolean column was included (:issue:`44499`)

environment.yml

+4-1
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,9 @@ dependencies:
120120
- tabulate>=0.8.3 # DataFrame.to_markdown
121121
- natsort # DataFrame.sort_values
122122
- pip:
123-
- git+https://github.com/pydata/pydata-sphinx-theme.git@master
123+
#issue with building environment in conda on windows. Issue: https://github.com/pandas-dev/pandas/issues/45123
124+
#issue with pydata-sphix-theme on windows. Issue: https://github.com/pydata/pydata-sphinx-theme/issues/523
125+
#using previous stable version as workaround
126+
- git+https://github.com/pydata/pydata-sphinx-theme.git@41764f5
124127
- pandas-dev-flaker==0.2.0
125128
- pytest-cython

pandas/__init__.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@
7373
Index,
7474
CategoricalIndex,
7575
RangeIndex,
76-
NumericIndex,
7776
MultiIndex,
7877
IntervalIndex,
7978
TimedeltaIndex,
@@ -199,7 +198,7 @@ def __getattr__(name):
199198
warnings.warn(
200199
f"pandas.{name} is deprecated "
201200
"and will be removed from pandas in a future version. "
202-
"Use pandas.NumericIndex with the appropriate dtype instead.",
201+
"Use pandas.Index with the appropriate dtype instead.",
203202
FutureWarning,
204203
stacklevel=2,
205204
)
@@ -335,7 +334,6 @@ def __getattr__(name):
335334
"NA",
336335
"NaT",
337336
"NamedAgg",
338-
"NumericIndex",
339337
"Period",
340338
"PeriodDtype",
341339
"PeriodIndex",

pandas/_libs/index.pyx

+3-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,9 @@ from pandas._libs import (
3333
hashtable as _hash,
3434
)
3535

36+
from pandas._libs.lib cimport eq_NA_compat
3637
from pandas._libs.missing cimport (
38+
C_NA as NA,
3739
checknull,
3840
is_matching_na,
3941
)
@@ -62,7 +64,7 @@ cdef ndarray _get_bool_indexer(ndarray values, object val):
6264
if values.descr.type_num == cnp.NPY_OBJECT:
6365
# i.e. values.dtype == object
6466
if not checknull(val):
65-
indexer = values == val
67+
indexer = eq_NA_compat(values, val)
6668

6769
else:
6870
# We need to check for _matching_ NA values

pandas/_libs/lib.pxd

+5
Original file line numberDiff line numberDiff line change
@@ -1 +1,6 @@
1+
from numpy cimport ndarray
2+
3+
14
cdef bint c_is_list_like(object, bint) except -1
5+
6+
cpdef ndarray eq_NA_compat(ndarray[object] arr, object key)

pandas/_libs/lib.pyx

+21
Original file line numberDiff line numberDiff line change
@@ -3050,6 +3050,27 @@ def is_bool_list(obj: list) -> bool:
30503050
return True
30513051

30523052

3053+
cpdef ndarray eq_NA_compat(ndarray[object] arr, object key):
3054+
"""
3055+
Check for `arr == key`, treating all values as not-equal to pd.NA.
3056+
3057+
key is assumed to have `not isna(key)`
3058+
"""
3059+
cdef:
3060+
ndarray[uint8_t, cast=True] result = np.empty(len(arr), dtype=bool)
3061+
Py_ssize_t i
3062+
object item
3063+
3064+
for i in range(len(arr)):
3065+
item = arr[i]
3066+
if item is C_NA:
3067+
result[i] = False
3068+
else:
3069+
result[i] = item == key
3070+
3071+
return result
3072+
3073+
30533074
def dtypes_all_equal(list types not None) -> bool:
30543075
"""
30553076
Faster version for:

0 commit comments

Comments
 (0)