Skip to content

Commit ebf8440

Browse files
authored
Merge branch 'main' into issue-50977
2 parents cd21304 + 9277f93 commit ebf8440

File tree

139 files changed

+1699
-1496
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

139 files changed

+1699
-1496
lines changed

.github/actions/setup-conda/action.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ runs:
1818
- name: Set Arrow version in ${{ inputs.environment-file }} to ${{ inputs.pyarrow-version }}
1919
run: |
2020
grep -q ' - pyarrow' ${{ inputs.environment-file }}
21-
sed -i"" -e "s/ - pyarrow/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
21+
sed -i"" -e "s/ - pyarrow<11/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
2222
cat ${{ inputs.environment-file }}
2323
shell: bash
2424
if: ${{ inputs.pyarrow-version }}

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ repos:
135135
types: [python]
136136
stages: [manual]
137137
additional_dependencies: &pyright_dependencies
138-
138+
139139
- id: pyright_reportGeneralTypeIssues
140140
# note: assumes python env is setup and activated
141141
name: pyright reportGeneralTypeIssues

ci/code_checks.sh

-3
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
187187
pandas.show_versions \
188188
pandas.test \
189189
pandas.NaT \
190-
pandas.Timestamp.unit \
191190
pandas.Timestamp.as_unit \
192191
pandas.Timestamp.ctime \
193192
pandas.Timestamp.date \
@@ -579,13 +578,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
579578

580579
MSG='Partially validate docstrings (EX02)' ; echo $MSG
581580
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX02 --ignore_functions \
582-
pandas.DataFrame.copy \
583581
pandas.DataFrame.plot.line \
584582
pandas.DataFrame.std \
585583
pandas.DataFrame.var \
586584
pandas.Index.factorize \
587585
pandas.Period.strftime \
588-
pandas.Series.copy \
589586
pandas.Series.factorize \
590587
pandas.Series.floordiv \
591588
pandas.Series.plot.line \

ci/deps/actions-310.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ dependencies:
4242
- psycopg2
4343
- pymysql
4444
- pytables
45-
- pyarrow
45+
- pyarrow<11
4646
- pyreadstat
4747
- python-snappy
4848
- pyxlsb

ci/deps/actions-311.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ dependencies:
4242
- psycopg2
4343
- pymysql
4444
# - pytables>=3.8.0 # first version that supports 3.11
45-
- pyarrow
45+
- pyarrow<11
4646
- pyreadstat
4747
- python-snappy
4848
- pyxlsb

ci/deps/actions-38-downstream_compat.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- openpyxl
4141
- odfpy
4242
- psycopg2
43-
- pyarrow
43+
- pyarrow<11
4444
- pymysql
4545
- pyreadstat
4646
- pytables

ci/deps/actions-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- odfpy
4141
- pandas-gbq
4242
- psycopg2
43-
- pyarrow
43+
- pyarrow<11
4444
- pymysql
4545
- pyreadstat
4646
- pytables

ci/deps/actions-39.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ dependencies:
4141
- pandas-gbq
4242
- psycopg2
4343
- pymysql
44-
- pyarrow
44+
- pyarrow<11
4545
- pyreadstat
4646
- pytables
4747
- python-snappy

ci/deps/circle-38-arm64.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- odfpy
4141
- pandas-gbq
4242
- psycopg2
43-
- pyarrow
43+
- pyarrow<11
4444
- pymysql
4545
# Not provided on ARM
4646
#- pyreadstat

doc/source/development/internals.rst

+23-26
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,21 @@ Indexing
1515
In pandas there are a few objects implemented which can serve as valid
1616
containers for the axis labels:
1717

18-
* ``Index``: the generic "ordered set" object, an ndarray of object dtype
18+
* :class:`Index`: the generic "ordered set" object, an ndarray of object dtype
1919
assuming nothing about its contents. The labels must be hashable (and
2020
likely immutable) and unique. Populates a dict of label to location in
2121
Cython to do ``O(1)`` lookups.
22-
* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
23-
data, such as time stamps
24-
* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
25-
* ``MultiIndex``: the standard hierarchical index object
26-
* ``DatetimeIndex``: An Index object with ``Timestamp`` boxed elements (impl are the int64 values)
27-
* ``TimedeltaIndex``: An Index object with ``Timedelta`` boxed elements (impl are the in64 values)
28-
* ``PeriodIndex``: An Index object with Period elements
22+
* :class:`MultiIndex`: the standard hierarchical index object
23+
* :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
24+
* :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)
25+
* :class:`PeriodIndex`: An Index object with Period elements
2926

3027
There are functions that make the creation of a regular index easy:
3128

32-
* ``date_range``: fixed frequency date range generated from a time rule or
29+
* :func:`date_range`: fixed frequency date range generated from a time rule or
3330
DateOffset. An ndarray of Python datetime objects
34-
* ``period_range``: fixed frequency date range generated from a time rule or
35-
DateOffset. An ndarray of ``Period`` objects, representing timespans
31+
* :func:`period_range`: fixed frequency date range generated from a time rule or
32+
DateOffset. An ndarray of :class:`Period` objects, representing timespans
3633

3734
The motivation for having an ``Index`` class in the first place was to enable
3835
different implementations of indexing. This means that it's possible for you,
@@ -43,28 +40,28 @@ From an internal implementation point of view, the relevant methods that an
4340
``Index`` must define are one or more of the following (depending on how
4441
incompatible the new object internals are with the ``Index`` functions):
4542

46-
* ``get_loc``: returns an "indexer" (an integer, or in some cases a
43+
* :meth:`~Index.get_loc`: returns an "indexer" (an integer, or in some cases a
4744
slice object) for a label
48-
* ``slice_locs``: returns the "range" to slice between two labels
49-
* ``get_indexer``: Computes the indexing vector for reindexing / data
45+
* :meth:`~Index.slice_locs`: returns the "range" to slice between two labels
46+
* :meth:`~Index.get_indexer`: Computes the indexing vector for reindexing / data
5047
alignment purposes. See the source / docstrings for more on this
51-
* ``get_indexer_non_unique``: Computes the indexing vector for reindexing / data
48+
* :meth:`~Index.get_indexer_non_unique`: Computes the indexing vector for reindexing / data
5249
alignment purposes when the index is non-unique. See the source / docstrings
5350
for more on this
54-
* ``reindex``: Does any pre-conversion of the input index then calls
51+
* :meth:`~Index.reindex`: Does any pre-conversion of the input index then calls
5552
``get_indexer``
56-
* ``union``, ``intersection``: computes the union or intersection of two
53+
* :meth:`~Index.union`, :meth:`~Index.intersection`: computes the union or intersection of two
5754
Index objects
58-
* ``insert``: Inserts a new label into an Index, yielding a new object
59-
* ``delete``: Delete a label, yielding a new object
60-
* ``drop``: Deletes a set of labels
61-
* ``take``: Analogous to ndarray.take
55+
* :meth:`~Index.insert`: Inserts a new label into an Index, yielding a new object
56+
* :meth:`~Index.delete`: Delete a label, yielding a new object
57+
* :meth:`~Index.drop`: Deletes a set of labels
58+
* :meth:`~Index.take`: Analogous to ndarray.take
6259

6360
MultiIndex
6461
~~~~~~~~~~
6562

66-
Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
67-
integer **codes** (until version 0.24 named *labels*), and the level **names**:
63+
Internally, the :class:`MultiIndex` consists of a few things: the **levels**, the
64+
integer **codes**, and the level **names**:
6865

6966
.. ipython:: python
7067
@@ -80,13 +77,13 @@ You can probably guess that the codes determine which unique element is
8077
identified with that location at each layer of the index. It's important to
8178
note that sortedness is determined **solely** from the integer codes and does
8279
not check (or care) whether the levels themselves are sorted. Fortunately, the
83-
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
84-
if you compute the levels and codes yourself, please be careful.
80+
constructors :meth:`~MultiIndex.from_tuples` and :meth:`~MultiIndex.from_arrays` ensure
81+
that this is true, but if you compute the levels and codes yourself, please be careful.
8582

8683
Values
8784
~~~~~~
8885

89-
pandas extends NumPy's type system with custom types, like ``Categorical`` or
86+
pandas extends NumPy's type system with custom types, like :class:`Categorical` or
9087
datetimes with a timezone, so we have multiple notions of "values". For 1-D
9188
containers (``Index`` classes and ``Series``) we have the following convention:
9289

doc/source/getting_started/comparison/includes/copies.rst

-10
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,3 @@ or overwrite the original one:
1111
.. code-block:: python
1212
1313
df = df.sort_values("col1")
14-
15-
.. note::
16-
17-
You will see an ``inplace=True`` keyword argument available for some methods:
18-
19-
.. code-block:: python
20-
21-
df.sort_values("col1", inplace=True)
22-
23-
Its use is discouraged. :ref:`More information. <indexing.view_versus_copy>`

doc/source/user_guide/advanced.rst

+18-108
Original file line numberDiff line numberDiff line change
@@ -609,7 +609,7 @@ are named.
609609

610610
.. ipython:: python
611611
612-
s.index.set_names(["L1", "L2"], inplace=True)
612+
s.index = s.index.set_names(["L1", "L2"])
613613
s.sort_index(level="L1")
614614
s.sort_index(level="L2")
615615
@@ -848,125 +848,35 @@ values **not** in the categories, similarly to how you can reindex **any** panda
848848
849849
.. _advanced.rangeindex:
850850

851-
Int64Index and RangeIndex
852-
~~~~~~~~~~~~~~~~~~~~~~~~~
851+
RangeIndex
852+
~~~~~~~~~~
853853

854-
.. deprecated:: 1.4.0
855-
In pandas 2.0, :class:`Index` will become the default index type for numeric types
856-
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
857-
are therefore deprecated and will be removed in a futire version.
858-
``RangeIndex`` will not be removed, as it represents an optimized version of an integer index.
859-
860-
:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
861-
implementing an ordered, sliceable set.
862-
863-
:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects.
864-
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
865-
866-
.. _advanced.float64index:
867-
868-
Float64Index
869-
~~~~~~~~~~~~
870-
871-
.. deprecated:: 1.4.0
872-
:class:`Index` will become the default index type for numeric types in the future
873-
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
874-
are therefore deprecated and will be removed in a future version of Pandas.
875-
``RangeIndex`` will not be removed as it represents an optimized version of an integer index.
876-
877-
By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
878-
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
879-
same.
880-
881-
.. ipython:: python
882-
883-
indexf = pd.Index([1.5, 2, 3, 4.5, 5])
884-
indexf
885-
sf = pd.Series(range(5), index=indexf)
886-
sf
887-
888-
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).
854+
:class:`RangeIndex` is a sub-class of :class:`Index` that provides the default index for all :class:`DataFrame` and :class:`Series` objects.
855+
``RangeIndex`` is an optimized version of ``Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
856+
A ``RangeIndex`` will always have an ``int64`` dtype.
889857

890858
.. ipython:: python
891859
892-
sf[3]
893-
sf[3.0]
894-
sf.loc[3]
895-
sf.loc[3.0]
860+
idx = pd.RangeIndex(5)
861+
idx
896862
897-
The only positional indexing is via ``iloc``.
863+
``RangeIndex`` is the default index for all :class:`DataFrame` and :class:`Series` objects:
898864

899865
.. ipython:: python
900866
901-
sf.iloc[3]
867+
ser = pd.Series([1, 2, 3])
868+
ser.index
869+
df = pd.DataFrame([[1, 2], [3, 4]])
870+
df.index
871+
df.columns
902872
903-
A scalar index that is not found will raise a ``KeyError``.
904-
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
905-
**always** positional when using ``iloc``. The exception is when the slice is
906-
boolean, in which case it will always be positional.
907-
908-
.. ipython:: python
909-
910-
sf[2:4]
911-
sf.loc[2:4]
912-
sf.iloc[2:4]
913-
914-
In float indexes, slicing using floats is allowed.
915-
916-
.. ipython:: python
917-
918-
sf[2.1:4.6]
919-
sf.loc[2.1:4.6]
920-
921-
In non-float indexes, slicing using floats will raise a ``TypeError``.
922-
923-
.. code-block:: ipython
924-
925-
In [1]: pd.Series(range(5))[3.5]
926-
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
927-
928-
In [1]: pd.Series(range(5))[3.5:4.5]
929-
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
930-
931-
Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
932-
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could, for
933-
example, be millisecond offsets.
934-
935-
.. ipython:: python
936-
937-
dfir = pd.concat(
938-
[
939-
pd.DataFrame(
940-
np.random.randn(5, 2), index=np.arange(5) * 250.0, columns=list("AB")
941-
),
942-
pd.DataFrame(
943-
np.random.randn(6, 2),
944-
index=np.arange(4, 10) * 250.1,
945-
columns=list("AB"),
946-
),
947-
]
948-
)
949-
dfir
950-
951-
Selection operations then will always work on a value basis, for all selection operators.
952-
953-
.. ipython:: python
954-
955-
dfir[0:1000.4]
956-
dfir.loc[0:1001, "A"]
957-
dfir.loc[1000.4]
958-
959-
You could retrieve the first 1 second (1000 ms) of data as such:
960-
961-
.. ipython:: python
962-
963-
dfir[0:1000]
964-
965-
If you need integer based selection, you should use ``iloc``:
873+
A ``RangeIndex`` will behave similarly to a :class:`Index` with an ``int64`` dtype and operations on a ``RangeIndex``,
874+
whose result cannot be represented by a ``RangeIndex``, but should have an integer dtype, will be converted to an ``Index`` with ``int64``.
875+
For example:
966876

967877
.. ipython:: python
968878
969-
dfir.iloc[0:5]
879+
idx[[0, 2]]
970880
971881
972882
.. _advanced.intervalindex:

doc/source/user_guide/basics.rst

-5
Original file line numberDiff line numberDiff line change
@@ -1479,11 +1479,6 @@ you specify a single ``mapper`` and the ``axis`` to apply that mapping to.
14791479
df.rename({"one": "foo", "two": "bar"}, axis="columns")
14801480
df.rename({"a": "apple", "b": "banana", "d": "durian"}, axis="index")
14811481
1482-
1483-
The :meth:`~DataFrame.rename` method also provides an ``inplace`` named
1484-
parameter that is by default ``False`` and copies the underlying data. Pass
1485-
``inplace=True`` to rename the data in place.
1486-
14871482
Finally, :meth:`~Series.rename` also accepts a scalar or list-like
14881483
for altering the ``Series.name`` attribute.
14891484

doc/source/user_guide/categorical.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -437,9 +437,9 @@ meaning and certain operations are possible. If the categorical is unordered, ``
437437
.. ipython:: python
438438
439439
s = pd.Series(pd.Categorical(["a", "b", "c", "a"], ordered=False))
440-
s.sort_values(inplace=True)
440+
s = s.sort_values()
441441
s = pd.Series(["a", "b", "c", "a"]).astype(CategoricalDtype(ordered=True))
442-
s.sort_values(inplace=True)
442+
s = s.sort_values()
443443
s
444444
s.min(), s.max()
445445
@@ -459,7 +459,7 @@ This is even true for strings and numeric data:
459459
s = pd.Series([1, 2, 3, 1], dtype="category")
460460
s = s.cat.set_categories([2, 3, 1], ordered=True)
461461
s
462-
s.sort_values(inplace=True)
462+
s = s.sort_values()
463463
s
464464
s.min(), s.max()
465465
@@ -477,7 +477,7 @@ necessarily make the sort order the same as the categories order.
477477
s = pd.Series([1, 2, 3, 1], dtype="category")
478478
s = s.cat.reorder_categories([2, 3, 1], ordered=True)
479479
s
480-
s.sort_values(inplace=True)
480+
s = s.sort_values()
481481
s
482482
s.min(), s.max()
483483

0 commit comments

Comments
 (0)