Skip to content

DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays #51111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 2, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions doc/source/development/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ containers for the axis labels:
assuming nothing about its contents. The labels must be hashable (and
likely immutable) and unique. Populates a dict of label to location in
Cython to do ``O(1)`` lookups.
* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
data, such as time stamps
* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
* :class:`MultiIndex`: the standard hierarchical index object
* :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
* :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)
Expand Down
124 changes: 17 additions & 107 deletions doc/source/user_guide/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -848,125 +848,35 @@ values **not** in the categories, similarly to how you can reindex **any** panda

.. _advanced.rangeindex:

Int64Index and RangeIndex
~~~~~~~~~~~~~~~~~~~~~~~~~
RangeIndex
~~~~~~~~~~

.. deprecated:: 1.4.0
In pandas 2.0, :class:`Index` will become the default index type for numeric types
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
are therefore deprecated and will be removed in a futire version.
``RangeIndex`` will not be removed, as it represents an optimized version of an integer index.

:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
implementing an ordered, sliceable set.

:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects.
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.

.. _advanced.float64index:

Float64Index
~~~~~~~~~~~~

.. deprecated:: 1.4.0
:class:`Index` will become the default index type for numeric types in the future
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
are therefore deprecated and will be removed in a future version of Pandas.
``RangeIndex`` will not be removed as it represents an optimized version of an integer index.

By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same.

.. ipython:: python

indexf = pd.Index([1.5, 2, 3, 4.5, 5])
indexf
sf = pd.Series(range(5), index=indexf)
sf

Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).
:class:`RangeIndex` is a sub-class of :class:`Index` that provides the default index for all :class:`DataFrame` and :class:`Series` objects.
``RangeIndex`` is an optimized version of ``Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
A ``RangeIndex`` will always have an ``int64`` dtype.

.. ipython:: python

sf[3]
sf[3.0]
sf.loc[3]
sf.loc[3.0]
idx = pd.RangeIndex(5)
idx

The only positional indexing is via ``iloc``.
``RangeIndex`` is the default index for all :class:`DataFrame` and :class:`Series` objects:

.. ipython:: python

sf.iloc[3]
ser = pd.Series([1, 2, 3])
ser.index
df = pd.DataFrame([[1, 2], [3, 4]])
df.index
df.columns

A scalar index that is not found will raise a ``KeyError``.
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
**always** positional when using ``iloc``. The exception is when the slice is
boolean, in which case it will always be positional.

.. ipython:: python

sf[2:4]
sf.loc[2:4]
sf.iloc[2:4]

In float indexes, slicing using floats is allowed.

.. ipython:: python

sf[2.1:4.6]
sf.loc[2.1:4.6]

In non-float indexes, slicing using floats will raise a ``TypeError``.

.. code-block:: ipython

In [1]: pd.Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)

In [1]: pd.Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)

Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could, for
example, be millisecond offsets.

.. ipython:: python

dfir = pd.concat(
[
pd.DataFrame(
np.random.randn(5, 2), index=np.arange(5) * 250.0, columns=list("AB")
),
pd.DataFrame(
np.random.randn(6, 2),
index=np.arange(4, 10) * 250.1,
columns=list("AB"),
),
]
)
dfir

Selection operations then will always work on a value basis, for all selection operators.

.. ipython:: python

dfir[0:1000.4]
dfir.loc[0:1001, "A"]
dfir.loc[1000.4]

You could retrieve the first 1 second (1000 ms) of data as such:

.. ipython:: python

dfir[0:1000]

If you need integer based selection, you should use ``iloc``:
A ``RangeIndex`` will behave similarly to a :class:`Index` with an ``int64`` dtype and operations on a ``RangeIndex``,
whose result cannot be represented by a ``RangeIndex``, but should have an integer dtype, will be converted to an ``Index`` with ``int64``.
For example:

.. ipython:: python

dfir.iloc[0:5]
idx[[0, 2]]


.. _advanced.intervalindex:
Expand Down
21 changes: 20 additions & 1 deletion doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1582,8 +1582,27 @@ lookups, data alignment, and reindexing. The easiest way to create an
index
'd' in index

You can also pass a ``name`` to be stored in the index:
or using numbers:

.. ipython:: python

index = pd.Index([1, 5, 12])
index
5 in index

If no dtype is given, ``Index`` tries to infer the dtype from the data.
It is also possible to give an explicit dtype when instantiating an :class:`Index`:

.. ipython:: python

index = pd.Index(['e', 'd', 'a', 'b'], dtype="string")
index
index = pd.Index([1, 5, 12], dtype="int8")
index
index = pd.Index([1, 5, 12], dtype="float32")
index

You can also pass a ``name`` to be stored in the index:

.. ipython:: python

Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4756,7 +4756,7 @@ Selecting coordinates
^^^^^^^^^^^^^^^^^^^^^

Sometimes you want to get the coordinates (a.k.a the index locations) of your query. This returns an
``Int64Index`` of the resulting locations. These coordinates can also be passed to subsequent
``Index`` of the resulting locations. These coordinates can also be passed to subsequent
``where`` operations.

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/timedeltas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ Scalars type ops work as well. These can potentially return a *different* type o
# division can result in a Timedelta if the divisor is an integer
tdi / 2

# or a Float64Index if the divisor is a Timedelta
# or a float64 Index if the divisor is a Timedelta
tdi / tdi[0]

.. _timedeltas.resampling:
Expand Down
74 changes: 74 additions & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,79 @@ The available extras, found in the :ref:`installation guide<install.dependencies
``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).

.. _whatsnew_200.enhancements.index_can_hold_numpy numeric dtypes:

:class:`Index` can now hold numpy numeric dtypes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is now possible to use any numpy numeric dtype in a :class:`Index` (:issue:`42717`).

Previously it was only possible to use ``int64``, ``uint64`` & ``float64`` dtypes:

.. code-block:: ipython

In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Int64Index([1, 2, 3], dtype="int64")
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")

:class:`Int64Index`, :class:`UInt64Index` & :class:`Float64Index` were depreciated in pandas
version 1.4 and have now been removed. Instead :class:`Index` should be used directly, and
can it now take all numpy numeric dtypes, i.e.
``int8``/ ``int16``/``int32``/``int64``/``uint8``/``uint16``/``uint32``/``uint64``/``float32``/``float64`` dtypes:

.. ipython:: python

pd.Index(1, 2, 3], dtype=np.int8)
pd.Index(1, 2, 3], dtype=np.uint16)
pd.Index(1, 2, 3], dtype=np.float32)

The ability for ``Index`` to hold the numpy numeric dtypes has meant some changes in Pandas
functionality. In particular, operations that previously were forced to create 64-bit indexes,
can now create indexes with lower bit sizes, e.g. 32-bit indexes.

Below is a possibly non-exhaustive list of changes:

1. Instantiating using a numpy numeric array now follows the dtype of the numpy array.
Previously, all indexes created from numpy numeric arrays were forced to 64-bit. Now,
the index dtype follows the dtype of the numpy array. For example, it would for all
signed integer arrays previously return an index with ``int64`` dtype, but will now
reuse the dtype of the supplied numpy array. So ``Index(np.array([1, 2, 3]))`` will be ``int32`` on 32-bit systems.
Instantiating :class:`Index` using a list of numbers will still return 64bit dtypes,
e.g. ``Index([1, 2, 3])`` will have a ``int64`` dtype, which is the same as previously.
2. The various numeric datetime attributes of :class:`DateTimeIndex` (:attr:`~Date_TimeIndex.day`,
:attr:`~DateTimeIndex.month`, :attr:`~DateTimeIndex.year` etc.) were previously in of
dtype ``int64``, while they were ``int32`` for :class:`DatetimeArray`. They are now
``int32`` on ``DateTimeIndex`` also:

.. ipython:: python

idx = pd.date_range(start='1/1/2018', periods=3, freq='M')
idx.array.year
idx.year

3. Level dtypes on Indexes from :attr:`Series.sparse.from_coo` are now of dtype ``int32``.

.. ipython:: python

A = sparse.coo_matrix(
([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
)
ser = pd.Series.sparse.from_coo(A)
ser.index.dtype

4. :class:`Index` cannot be instantiated using a float16 dtype. Previously instantiating
an :class:`Index` using dtype ``float16`` resulted in a :class:`Float64Index` with a
``float64`` dtype. It row raises a ``NotImplementedError``:

.. ipython:: python
:okexcept:

pd.Index([1, 2, 3], dtype=np.float16)


.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_dtype_backend:

Configuration option, ``mode.dtype_backend``, to return pyarrow-backed dtypes
Expand Down Expand Up @@ -683,6 +756,7 @@ Deprecations

Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here <_whatsnew_200.enhancements.optional_dependency_management_pip>` for more information (:issue:`42717`)
- Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`)
- Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`)
- Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`)
Expand Down