Skip to content

ENH: Intervalindex #15309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions asv_bench/benchmarks/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,26 @@ def time_is_monotonic(self):
self.miint.is_monotonic


class IntervalIndexing(object):
goal_time = 0.2

def setup(self):
self.monotonic = Series(np.arange(1000000),
index=IntervalIndex.from_breaks(np.arange(1000001)))

def time_getitem_scalar(self):
self.monotonic[80000]

def time_loc_scalar(self):
self.monotonic.loc[80000]

def time_getitem_list(self):
self.monotonic[80000:]

def time_loc_list(self):
self.monotonic.loc[80000:]


class PanelIndexing(object):
goal_time = 0.2

Expand Down
33 changes: 33 additions & 0 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -850,6 +850,39 @@ Of course if you need integer based selection, then use ``iloc``

dfir.iloc[0:5]

.. _indexing.intervallindex:

IntervalIndex
~~~~~~~~~~~~~

.. versionadded:: 0.20.0

.. warning::

These indexing behaviors are provisional and may change in a future version of pandas.

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3, 4]},
index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4]))
df

Label based indexing via ``.loc`` along the edges of an interval works as you would expect,
selecting that particular interval.

.. ipython:: python

df.loc[2]
df.loc[[2, 3]]

If you select a lable *contained* within an interval, this will also select the interval.

.. ipython:: python

df.loc[2.5]
df.loc[[2.5, 3.5]]


Miscellaneous indexing FAQ
--------------------------

Expand Down
21 changes: 21 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1405,6 +1405,27 @@ Categorical Components
CategoricalIndex.as_ordered
CategoricalIndex.as_unordered

.. _api.intervalindex:

IntervalIndex
-------------

.. autosummary::
:toctree: generated/

IntervalIndex

IntervalIndex Components
~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated/

IntervalIndex.from_arrays
IntervalIndex.from_tuples
IntervalIndex.from_breaks
IntervalIndex.from_intervals

.. _api.multiindex:

MultiIndex
Expand Down
10 changes: 9 additions & 1 deletion doc/source/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -517,7 +517,15 @@ Alternatively we can specify custom bin-edges:

.. ipython:: python

pd.cut(ages, bins=[0, 18, 35, 70])
c = pd.cut(ages, bins=[0, 18, 35, 70])
c

.. versionadded:: 0.20.0

If the ``bins`` keyword is an ``IntervalIndex``, then these will be
used to bin the passed data.

pd.cut([25, 20, 50], bins=c.categories)


.. _reshaping.dummies:
Expand Down
58 changes: 58 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Highlights include:
- ``Panel`` has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_panel>`
- Improved user API when accessing levels in ``.groupby()``, see :ref:`here <whatsnew_0200.enhancements.groupby_access>`
- Improved support for UInt64 dtypes, see :ref:`here <whatsnew_0200.enhancements.uint64_support>`
- Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here <whatsnew_0200.enhancements.intervalindex>`
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
- Window Binary Corr/Cov operations return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
Expand Down Expand Up @@ -314,6 +315,63 @@ To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you

sdf.to_coo()

.. _whatsnew_0200.enhancements.intervalindex:

IntervalIndex
^^^^^^^^^^^^^

pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well as the ``Interval`` scalar type. These allow first-class support for interval
notation, specifically as a return type for the categories in ``pd.cut`` and ``pd.qcut``. The ``IntervalIndex`` allows some unique indexing, see the
:ref:`docs <indexing.intervallindex>`. (:issue:`7640`, :issue:`8625`)

Previous behavior:

.. code-block:: ipython

In [2]: pd.cut(range(3), 2)
Out[2]:
[(-0.002, 1], (-0.002, 1], (1, 2]]
Categories (2, object): [(-0.002, 1] < (1, 2]]

# the returned categories are strings, representing Intervals
In [3]: pd.cut(range(3), 2).categories
Out[3]: Index(['(-0.002, 1]', '(1, 2]'], dtype='object')

New behavior:

.. ipython:: python

c = pd.cut(range(4), bins=2)
c
c.categories

Furthermore, this allows one to bin *other* data with these same bins. ``NaN`` represents a missing
value similar to other dtypes.

.. ipython:: python

pd.cut([0, 3, 1, 1], bins=c.categories)

These can also used in ``Series`` and ``DataFrame``, and indexed.

.. ipython:: python

df = pd.DataFrame({'A': range(4),
'B': pd.cut([0, 3, 1, 1], bins=c.categories)}
).set_index('B')

Selecting a specific interval

.. ipython:: python

df.loc[pd.Interval(1.5, 3.0)]

Selecting via a scalar value that is contained in the intervals.

.. ipython:: python

df.loc[0]

.. _whatsnew_0200.enhancements.other:

Other Enhancements
Expand Down
1 change: 0 additions & 1 deletion pandas/_libs/hashtable.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ cdef extern from "Python.h":

cdef size_t _INIT_VEC_CAP = 128


include "hashtable_class_helper.pxi"
include "hashtable_func_helper.pxi"

Expand Down
Loading