Skip to content

Commit 723a147

Browse files
committed
Merge pull request #11892 from jreback/ri
ENH: RangeIndex redux
2 parents efb2e90 + fab291b commit 723a147

22 files changed

+1826
-287
lines changed

doc/source/advanced.rst

+23-2
Original file line numberDiff line numberDiff line change
@@ -617,10 +617,20 @@ faster than fancy indexing.
617617
timeit ser.ix[indexer]
618618
timeit ser.take(indexer)
619619

620+
.. _indexing.index_types:
621+
622+
Index Types
623+
-----------
624+
625+
We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``
626+
are shown :ref:`here <timeseries.overview>`. ``TimedeltaIndex`` are :ref:`here <timedeltas.timedeltas>`.
627+
628+
In the following sub-sections we will highlite some other index types.
629+
620630
.. _indexing.categoricalindex:
621631

622632
CategoricalIndex
623-
----------------
633+
~~~~~~~~~~~~~~~~
624634

625635
.. versionadded:: 0.16.1
626636

@@ -702,10 +712,21 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
702712
In [12]: pd.concat([df2, df3]
703713
TypeError: categories must match existing categories when appending
704714
715+
.. _indexing.rangeindex:
716+
717+
Int64Index and RangeIndex
718+
~~~~~~~~~~~~~~~~~~~~~~~~~
719+
720+
``Int64Index`` is a fundamental basic index in *pandas*. This is an Immutable array implementing an ordered, sliceable set.
721+
Prior to 0.18.0, the ``Int64Index`` would provide the default index for all ``NDFrame`` objects.
722+
723+
``RangeIndex`` is a sub-class of ``Int64Index`` added in version 0.18.0, now providing the default index for all ``NDFrame`` objects.
724+
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analagous to python :ref:`range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`.
725+
705726
.. _indexing.float64index:
706727
707728
Float64Index
708-
------------
729+
~~~~~~~~~~~~
709730
710731
.. note::
711732

doc/source/timeseries.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1091,7 +1091,7 @@ An example of how holidays and holiday calendars are defined:
10911091
Using this calendar, creating an index or doing offset arithmetic skips weekends
10921092
and holidays (i.e., Memorial Day/July 4th). For example, the below defines
10931093
a custom business day offset using the ``ExampleCalendar``. Like any other offset,
1094-
it can be used to create a ``DatetimeIndex`` or added to ``datetime``
1094+
it can be used to create a ``DatetimeIndex`` or added to ``datetime``
10951095
or ``Timestamp`` objects.
10961096

10971097
.. ipython:: python

doc/source/whatsnew/v0.18.0.txt

+34
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Highlights include:
1919

2020
- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
2121
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
22+
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.
2223

2324
Check the :ref:`API Changes <whatsnew_0180.api>` and :ref:`deprecations <whatsnew_0180.deprecations>` before updating.
2425

@@ -102,6 +103,39 @@ And multiple aggregations
102103
r.agg({'A' : ['mean','std'],
103104
'B' : ['mean','std']})
104105

106+
.. _whatsnew_0180.enhancements.rangeindex:
107+
108+
Range Index
109+
^^^^^^^^^^^
110+
111+
A ``RangeIndex`` has been added to the ``Int64Index`` sub-classes to support a memory saving alternative for common use cases. This has a similar implementation to the python ``range`` object (``xrange`` in python 2), in that it only stores the start, stop, and step values for the index. It will transparently interact with the user API, converting to ``Int64Index`` if needed.
112+
113+
This will now be the default constructed index for ``NDFrame`` objects, rather than previous an ``Int64Index``. (:issue:`939`)
114+
115+
Previous Behavior:
116+
117+
.. code-block:: python
118+
119+
In [3]: s = Series(range(1000))
120+
121+
In [4]: s.index
122+
Out[4]:
123+
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
124+
...
125+
990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)
126+
127+
In [6]: s.index.nbytes
128+
Out[6]: 8000
129+
130+
131+
New Behavior:
132+
133+
.. ipython:: python
134+
135+
s = Series(range(1000))
136+
s.index
137+
s.index.nbytes
138+
105139
.. _whatsnew_0180.enhancements.other:
106140

107141
Other enhancements

pandas/core/api.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
from pandas.core.categorical import Categorical
99
from pandas.core.groupby import Grouper
1010
from pandas.core.format import set_eng_float_format
11-
from pandas.core.index import Index, CategoricalIndex, Int64Index, Float64Index, MultiIndex
11+
from pandas.core.index import (Index, CategoricalIndex, Int64Index,
12+
RangeIndex, Float64Index, MultiIndex)
1213

1314
from pandas.core.series import Series, TimeSeries
1415
from pandas.core.frame import DataFrame

pandas/core/common.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ def _check(cls, inst):
8484
ABCIndex = create_pandas_abc_type("ABCIndex", "_typ", ("index", ))
8585
ABCInt64Index = create_pandas_abc_type("ABCInt64Index", "_typ",
8686
("int64index", ))
87+
ABCRangeIndex = create_pandas_abc_type("ABCRangeIndex", "_typ",
88+
("rangeindex", ))
8789
ABCFloat64Index = create_pandas_abc_type("ABCFloat64Index", "_typ",
8890
("float64index", ))
8991
ABCMultiIndex = create_pandas_abc_type("ABCMultiIndex", "_typ",
@@ -97,7 +99,8 @@ def _check(cls, inst):
9799
ABCCategoricalIndex = create_pandas_abc_type("ABCCategoricalIndex", "_typ",
98100
("categoricalindex", ))
99101
ABCIndexClass = create_pandas_abc_type("ABCIndexClass", "_typ",
100-
("index", "int64index", "float64index",
102+
("index", "int64index", "rangeindex",
103+
"float64index",
101104
"multiindex", "datetimeindex",
102105
"timedeltaindex", "periodindex",
103106
"categoricalindex"))
@@ -1805,11 +1808,8 @@ def is_bool_indexer(key):
18051808

18061809

18071810
def _default_index(n):
1808-
from pandas.core.index import Int64Index
1809-
values = np.arange(n, dtype=np.int64)
1810-
result = Int64Index(values, name=None)
1811-
result.is_unique = True
1812-
return result
1811+
from pandas.core.index import RangeIndex
1812+
return RangeIndex(0, n, name=None)
18131813

18141814

18151815
def ensure_float(arr):

pandas/core/dtypes.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -214,5 +214,6 @@ def __eq__(self, other):
214214
if isinstance(other, compat.string_types):
215215
return other == self.name
216216

217-
return isinstance(other, DatetimeTZDtype) and self.unit == other.unit \
218-
and self.tz == other.tz
217+
return isinstance(other, DatetimeTZDtype) and \
218+
self.unit == other.unit and \
219+
str(self.tz) == str(other.tz)

pandas/core/frame.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -5325,7 +5325,7 @@ def extract_index(data):
53255325
(lengths[0], len(index)))
53265326
raise ValueError(msg)
53275327
else:
5328-
index = Index(np.arange(lengths[0]))
5328+
index = _default_index(lengths[0])
53295329

53305330
return _ensure_index(index)
53315331

@@ -5538,11 +5538,11 @@ def convert(arr):
55385538

55395539

55405540
def _get_names_from_index(data):
5541-
index = lrange(len(data))
55425541
has_some_name = any([getattr(s, 'name', None) is not None for s in data])
55435542
if not has_some_name:
5544-
return index
5543+
return _default_index(len(data))
55455544

5545+
index = lrange(len(data))
55465546
count = 0
55475547
for i, s in enumerate(data):
55485548
n = getattr(s, 'name', None)

0 commit comments

Comments
 (0)