Skip to content

ENH: RangeIndex redux #11892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 16, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -617,10 +617,20 @@ faster than fancy indexing.
timeit ser.ix[indexer]
timeit ser.take(indexer)

.. _indexing.index_types:

Index Types
-----------

We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``
are shown :ref:`here <timeseries.overview>`. ``TimedeltaIndex`` are :ref:`here <timedeltas.timedeltas>`.

In the following sub-sections we will highlite some other index types.

.. _indexing.categoricalindex:

CategoricalIndex
----------------
~~~~~~~~~~~~~~~~

.. versionadded:: 0.16.1

Expand Down Expand Up @@ -702,10 +712,21 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
In [12]: pd.concat([df2, df3]
TypeError: categories must match existing categories when appending

.. _indexing.rangeindex:

Int64Index and RangeIndex
~~~~~~~~~~~~~~~~~~~~~~~~~

``Int64Index`` is a fundamental basic index in *pandas*. This is an Immutable array implementing an ordered, sliceable set.
Prior to 0.18.0, the ``Int64Index`` would provide the default index for all ``NDFrame`` objects.

``RangeIndex`` is a sub-class of ``Int64Index`` added in version 0.18.0, now providing the default index for all ``NDFrame`` objects.
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analagous to python :ref:`range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`.

.. _indexing.float64index:

Float64Index
------------
~~~~~~~~~~~~

.. note::

Expand Down
2 changes: 1 addition & 1 deletion doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1091,7 +1091,7 @@ An example of how holidays and holiday calendars are defined:
Using this calendar, creating an index or doing offset arithmetic skips weekends
and holidays (i.e., Memorial Day/July 4th). For example, the below defines
a custom business day offset using the ``ExampleCalendar``. Like any other offset,
it can be used to create a ``DatetimeIndex`` or added to ``datetime``
it can be used to create a ``DatetimeIndex`` or added to ``datetime``
or ``Timestamp`` objects.

.. ipython:: python
Expand Down
34 changes: 34 additions & 0 deletions doc/source/whatsnew/v0.18.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Highlights include:

- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.

Check the :ref:`API Changes <whatsnew_0180.api>` and :ref:`deprecations <whatsnew_0180.deprecations>` before updating.

Expand Down Expand Up @@ -102,6 +103,39 @@ And multiple aggregations
r.agg({'A' : ['mean','std'],
'B' : ['mean','std']})

.. _whatsnew_0180.enhancements.rangeindex:

Range Index
^^^^^^^^^^^

A ``RangeIndex`` has been added to the ``Int64Index`` sub-classes to support a memory saving alternative for common use cases. This has a similar implementation to the python ``range`` object (``xrange`` in python 2), in that it only stores the start, stop, and step values for the index. It will transparently interact with the user API, converting to ``Int64Index`` if needed.

This will now be the default constructed index for ``NDFrame`` objects, rather than previous an ``Int64Index``. (:issue:`939`)

Previous Behavior:

.. code-block:: python

In [3]: s = Series(range(1000))

In [4]: s.index
Out[4]:
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)

In [6]: s.index.nbytes
Out[6]: 8000


New Behavior:

.. ipython:: python

s = Series(range(1000))
s.index
s.index.nbytes

.. _whatsnew_0180.enhancements.other:

Other enhancements
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
from pandas.core.categorical import Categorical
from pandas.core.groupby import Grouper
from pandas.core.format import set_eng_float_format
from pandas.core.index import Index, CategoricalIndex, Int64Index, Float64Index, MultiIndex
from pandas.core.index import (Index, CategoricalIndex, Int64Index,
RangeIndex, Float64Index, MultiIndex)

from pandas.core.series import Series, TimeSeries
from pandas.core.frame import DataFrame
Expand Down
12 changes: 6 additions & 6 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ def _check(cls, inst):
ABCIndex = create_pandas_abc_type("ABCIndex", "_typ", ("index", ))
ABCInt64Index = create_pandas_abc_type("ABCInt64Index", "_typ",
("int64index", ))
ABCRangeIndex = create_pandas_abc_type("ABCRangeIndex", "_typ",
("rangeindex", ))
ABCFloat64Index = create_pandas_abc_type("ABCFloat64Index", "_typ",
("float64index", ))
ABCMultiIndex = create_pandas_abc_type("ABCMultiIndex", "_typ",
Expand All @@ -97,7 +99,8 @@ def _check(cls, inst):
ABCCategoricalIndex = create_pandas_abc_type("ABCCategoricalIndex", "_typ",
("categoricalindex", ))
ABCIndexClass = create_pandas_abc_type("ABCIndexClass", "_typ",
("index", "int64index", "float64index",
("index", "int64index", "rangeindex",
"float64index",
"multiindex", "datetimeindex",
"timedeltaindex", "periodindex",
"categoricalindex"))
Expand Down Expand Up @@ -1796,11 +1799,8 @@ def is_bool_indexer(key):


def _default_index(n):
from pandas.core.index import Int64Index
values = np.arange(n, dtype=np.int64)
result = Int64Index(values, name=None)
result.is_unique = True
return result
from pandas.core.index import RangeIndex
return RangeIndex(0, n, name=None)


def ensure_float(arr):
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,5 +214,6 @@ def __eq__(self, other):
if isinstance(other, compat.string_types):
return other == self.name

return isinstance(other, DatetimeTZDtype) and self.unit == other.unit \
and self.tz == other.tz
return isinstance(other, DatetimeTZDtype) and \
self.unit == other.unit and \
str(self.tz) == str(other.tz)
6 changes: 3 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5325,7 +5325,7 @@ def extract_index(data):
(lengths[0], len(index)))
raise ValueError(msg)
else:
index = Index(np.arange(lengths[0]))
index = _default_index(lengths[0])

return _ensure_index(index)

Expand Down Expand Up @@ -5538,11 +5538,11 @@ def convert(arr):


def _get_names_from_index(data):
index = lrange(len(data))
has_some_name = any([getattr(s, 'name', None) is not None for s in data])
if not has_some_name:
return index
return _default_index(len(data))

index = lrange(len(data))
count = 0
for i, s in enumerate(data):
n = getattr(s, 'name', None)
Expand Down
Loading