PERF: Unnecessary hash table with RangeIndex #16685

chris-b1 · 2017-06-12T21:03:45Z

Example

def log_memory():
    import os
    import gc
    import psutil
    for i in range(3):
        gc.collect(i)
    process = psutil.Process(os.getpid())
    mem_usage = process.memory_info().rss / float(2 ** 20)
    print("[Memory usage] {:12.1f} MB".format(
        mem_usage
    ))

In [20]: df = pd.DataFrame({'a': np.arange(1000000)})

In [23]: log_memory()
[Memory usage]        132.4 MB

In [24]: df.loc[5, :]
Out[24]: 
a    5
Name: 5, dtype: int32

In [25]: log_memory()
[Memory usage]        172.2 MB

Rather than materializing the hash table, should directly convert labels into positions. Low priority in my opinion, atypical to be using loc with a RangeIndex.

pandas 0.20.2

The text was updated successfully, but these errors were encountered:

jreback · 2017-06-14T10:39:37Z

hmm ,this should not be creating a hash table at all (though it would if it morphed to an int64index)

chris-b1 · 2017-06-14T14:52:58Z

The _engine for RangeIndex still points at the Int64Index one, so a hash table will be created if something calls self._engine. So either need to make a simplified RangeIndexEngine or override the indexing methods.

pandas/pandas/core/indexes/range.py

Line 43 in 2e24a8f

_engine_type = libindex.Int64Engine

jreback · 2017-06-14T23:22:27Z

I think that _engine is purely for compat on some methods. might be easiest/best to create a pretty dummy RI Engine in cython.

jorisvandenbossche · 2017-12-20T19:34:57Z

I think that _engine is purely for compat on some methods

_engine is still used in eg get_loc, so often created.

Related to this, I just stumbled on the fact that currenlty RangeIndex.memory_usage() is inaccurate, as it does not take into account the engine.

And given that the actual engine takes a lot more memory as the values (for int64index it is 5x more memory on an example case 500k range-like int index), a RangeIndex is not much more memory-efficient as a Int64Index (its big selling point I thought?) once the engine is created.

chris-b1 · 2017-12-20T21:55:36Z

RangeIndex is not much more memory-efficient as a Int64Index (its big selling point I thought?) once the engine is created.

This is true, but I think it would be relatively rare for the engine to populated when a RangeIndex is being used - typically it is when someone doesn't care about the index at all, so would likely be using iloc if any indexing at all.

closes pandas-dev#16685

…dex (#27119) * TST: actually test #16877 on numeric index (not just RangeIndex) * PERF: do not instantiate IndexEngine for standard lookup over RangeIndex closes #16685

chris-b1 added Difficulty Advanced Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jun 12, 2017

chris-b1 added this to the Next Major Release milestone Jun 12, 2017

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

f1dccb1

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

7510c84

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

bc404ba

closes pandas-dev#16685

toobaz mentioned this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex #27119

Merged

4 tasks

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

493d61f

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

c3f918a

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

23934f6

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

63cd6ba

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

261f43a

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

3b0b781

closes pandas-dev#16685

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

PERF: do not instantiate IndexEngine for standard lookup over RangeIndex

e9f6f6b

closes pandas-dev#16685

jreback modified the milestones: Contributions Welcome, 0.25.0 Jun 30, 2019

jreback closed this as completed in #27119 Jun 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Unnecessary hash table with RangeIndex #16685

PERF: Unnecessary hash table with RangeIndex #16685

chris-b1 commented Jun 12, 2017

jreback commented Jun 14, 2017

chris-b1 commented Jun 14, 2017

jreback commented Jun 14, 2017

jorisvandenbossche commented Dec 20, 2017

chris-b1 commented Dec 20, 2017

PERF: Unnecessary hash table with RangeIndex #16685

PERF: Unnecessary hash table with RangeIndex #16685

Comments

chris-b1 commented Jun 12, 2017

jreback commented Jun 14, 2017

chris-b1 commented Jun 14, 2017

jreback commented Jun 14, 2017

jorisvandenbossche commented Dec 20, 2017

chris-b1 commented Dec 20, 2017