Skip to content

Commit 258bf3d

Browse files
Index class deprecation enforcements (#13204)
This PR: - [x] Enforces `Index` related deprecations by removing `Float32Index`, `Float64Index`, `GenericIndex`, `Int8Index`, `Int16Index`, `Int32Index`, `Int64Index`, `StringIndex`, `UInt8Index`, `UInt16Index`, `UInt32Index`, `UInt64Index`. - [x] Cleans up the repr logic to more closely align with pandas for `<NA>` value representation incase of `string` dtype. - [x] Fixes docstring and pytests to support the removals of the above classes. This PR also fixes 202 pytests: ```bash = 267 failed, 95670 passed, 2044 skipped, 763 xfailed, 300 xpassed in 442.18s (0:07:22) = ``` On `pandas_2.0_feature_branch`: ```bash = 469 failed, 95464 passed, 2044 skipped, 763 xfailed, 300 xpassed in 469.26s (0:07:49) = ```
1 parent 16c987e commit 258bf3d

33 files changed

+284
-846
lines changed

docs/cudf/source/api_docs/index_objects.rst

-3
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,6 @@ Numeric Index
149149
:template: autosummary/class_without_autosummary.rst
150150

151151
RangeIndex
152-
Int64Index
153-
UInt64Index
154-
Float64Index
155152

156153
.. _api.categoricalindex:
157154

docs/cudf/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ def process_class_docstrings(app, what, name, obj, options, lines):
261261
from the processed docstring.
262262
"""
263263
if what == "class":
264-
if name in {"cudf.RangeIndex", "cudf.Int64Index", "cudf.UInt64Index", "cudf.Float64Index", "cudf.CategoricalIndex", "cudf.IntervalIndex", "cudf.MultiIndex", "cudf.DatetimeIndex", "cudf.TimedeltaIndex", "cudf.TimedeltaIndex"}:
264+
if name in {"cudf.RangeIndex", "cudf.CategoricalIndex", "cudf.IntervalIndex", "cudf.MultiIndex", "cudf.DatetimeIndex", "cudf.TimedeltaIndex", "cudf.TimedeltaIndex"}:
265265

266266
cut_index = lines.index('.. rubric:: Attributes')
267267
lines[:] = lines[:cut_index]

docs/cudf/source/developer_guide/library_design.md

+10-15
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Finally we tie these pieces together to provide a more holistic view of the proj
2222
% class IndexedFrame
2323
% class SingleColumnFrame
2424
% class BaseIndex
25-
% class GenericIndex
25+
% class Index
2626
% class MultiIndex
2727
% class RangeIndex
2828
% class DataFrame
@@ -42,8 +42,8 @@ Finally we tie these pieces together to provide a more holistic view of the proj
4242
% BaseIndex <|-- MultiIndex
4343
% Frame <|-- MultiIndex
4444
%
45-
% BaseIndex <|-- GenericIndex
46-
% SingleColumnFrame <|-- GenericIndex
45+
% BaseIndex <|-- Index
46+
% SingleColumnFrame <|-- Index
4747
%
4848
% @enduml
4949

@@ -89,31 +89,26 @@ While we've highlighted some exceptional cases of Indexes before, let's start wi
8989
In practice, `BaseIndex` does have concrete implementations of a small set of methods.
9090
However, currently many of these implementations are not applicable to all subclasses and will be eventually be removed.
9191

92-
Almost all indexes are subclasses of `GenericIndex`, a single-columned index with the class hierarchy:
92+
Almost all indexes are subclasses of `Index`, a single-columned index with the class hierarchy:
9393
```python
94-
class GenericIndex(SingleColumnFrame, BaseIndex)
94+
class Index(SingleColumnFrame, BaseIndex)
9595
```
9696
Integer, float, or string indexes are all composed of a single column of data.
97-
Most `GenericIndex` methods are inherited from `Frame`, saving us the trouble of rewriting them.
97+
Most `Index` methods are inherited from `Frame`, saving us the trouble of rewriting them.
9898

9999
We now consider the three main exceptions to this model:
100100

101101
- A `RangeIndex` is not backed by a column of data, so it inherits directly from `BaseIndex` alone.
102102
Wherever possible, its methods have special implementations designed to avoid materializing columns.
103-
Where such an implementation is infeasible, we fall back to converting it to an `Int64Index` first instead.
103+
Where such an implementation is infeasible, we fall back to converting it to an `Index` of `int64`
104+
dtype first instead.
104105
- A `MultiIndex` is backed by _multiple_ columns of data.
105106
Therefore, its inheritance hierarchy looks like `class MultiIndex(Frame, BaseIndex)`.
106107
Some of its more `Frame`-like methods may be inherited,
107108
but many others must be reimplemented since in many cases a `MultiIndex` is not expected to behave like a `Frame`.
108-
- Just like in pandas, `Index` itself can never be instantiated.
109-
`pandas.Index` is the parent class for indexes,
110-
but its constructor returns an appropriate subclass depending on the input data type and shape.
111-
Unfortunately, mimicking this behavior requires overriding `__new__`,
112-
which in turn makes shared initialization across inheritance trees much more cumbersome to manage.
113-
To enable sharing constructor logic across different index classes,
114-
we instead define `BaseIndex` as the parent class of all indexes.
109+
- To enable sharing constructor logic across different index classes,
110+
we define `BaseIndex` as the parent class of all indexes.
115111
`Index` inherits from `BaseIndex`, but it masquerades as a `BaseIndex` to match pandas.
116-
This class should contain no implementations since it is simply a factory for other indexes.
117112

118113

119114
## The Column layer

python/cudf/benchmarks/conftest.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2022, NVIDIA CORPORATION.
1+
# Copyright (c) 2022-2023, NVIDIA CORPORATION.
22

33
"""Defines pytest fixtures for all benchmarks.
44
@@ -40,8 +40,8 @@
4040
In addition to the above fixtures, we also provide the following more
4141
specialized fixtures:
4242
- rangeindex: Since RangeIndex always holds int64 data we cannot conflate
43-
it with index_dtype_int64 (a true Int64Index), and it cannot hold nulls.
44-
As a result, it is provided as a separate fixture.
43+
it with index_dtype_int64 (a true Index with int64 dtype), and it
44+
cannot hold nulls. As a result, it is provided as a separate fixture.
4545
"""
4646

4747
import os

python/cudf/cudf/__init__.py

-24
Original file line numberDiff line numberDiff line change
@@ -40,22 +40,10 @@
4040
BaseIndex,
4141
CategoricalIndex,
4242
DatetimeIndex,
43-
Float32Index,
44-
Float64Index,
45-
GenericIndex,
4643
Index,
47-
Int8Index,
48-
Int16Index,
49-
Int32Index,
50-
Int64Index,
5144
IntervalIndex,
5245
RangeIndex,
53-
StringIndex,
5446
TimedeltaIndex,
55-
UInt8Index,
56-
UInt16Index,
57-
UInt32Index,
58-
UInt64Index,
5947
interval_range,
6048
)
6149
from cudf.core.missing import NA
@@ -106,15 +94,8 @@
10694
"DatetimeIndex",
10795
"Decimal32Dtype",
10896
"Decimal64Dtype",
109-
"Float32Index",
110-
"Float64Index",
111-
"GenericIndex",
11297
"Grouper",
11398
"Index",
114-
"Int16Index",
115-
"Int32Index",
116-
"Int64Index",
117-
"Int8Index",
11899
"IntervalDtype",
119100
"IntervalIndex",
120101
"ListDtype",
@@ -123,13 +104,8 @@
123104
"RangeIndex",
124105
"Scalar",
125106
"Series",
126-
"StringIndex",
127107
"StructDtype",
128108
"TimedeltaIndex",
129-
"UInt16Index",
130-
"UInt32Index",
131-
"UInt64Index",
132-
"UInt8Index",
133109
"api",
134110
"concat",
135111
"crosstab",

python/cudf/cudf/_typing.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
1+
# Copyright (c) 2021-2023, NVIDIA CORPORATION.
22

33
import sys
44
from typing import TYPE_CHECKING, Any, Callable, Dict, Iterable, TypeVar, Union
@@ -37,9 +37,7 @@
3737

3838
DataFrameOrSeries = Union["cudf.Series", "cudf.DataFrame"]
3939
SeriesOrIndex = Union["cudf.Series", "cudf.core.index.BaseIndex"]
40-
SeriesOrSingleColumnIndex = Union[
41-
"cudf.Series", "cudf.core.index.GenericIndex"
42-
]
40+
SeriesOrSingleColumnIndex = Union["cudf.Series", "cudf.core.index.Index"]
4341

4442
# Groupby aggregation
4543
AggType = Union[str, Callable]

0 commit comments

Comments
 (0)