API: should the RangeIndex constructor accept start=range_index? #12067

shoyer · 2016-01-17T01:05:59Z

xref #11892 (see changes and discussion)

RangeIndex, a new Int64Index subclass that will be used as the new default index (currently) has the following signature: RangeIndex(start=None, stop=None, step=None, name=None...)

For converting an xrange/range object into a RangeIndex, we have the constructor RangeIndex.from_range.

Although it isn't documented, the current constructor supports accepting a RangeIndex instances as the start argument, in which case a copy of the original index is produced, e.g., RangeIndex(RangeIndex(0, 10, 1)) -> RangeIndex(0, 10, 1).

I think this is a mistake: it is reusing the start parameter for a different type of argument, at odds with the documented behavior and predictable function signatures. If a user wants to create another RangeIndex from a RangeIndex, they can simply use the Index construtor, e.g., Index(RangeIndex(...)).

@jreback argues that not accepting a RangeIndex as the first argument to RangeIndex would be an API break, because it would create an Index constructor that is not idempotent. It would be "a major break with the current Index design".

Thoughts from anyone else?

The text was updated successfully, but these errors were encountered:

wesm · 2016-01-17T01:49:21Z

In general constructors in pandas are "too smart", but we are stuck in a lot of cases.

I don't know that Index constructor idempotency is an API guarantee we necessarily need to make. It was part of the original effort to be similar to np.array. Further, are we expecting people to actually use pd.RangeIndex in many direct ways? I only ever thought of RangeIndex as being an Int64Index-like performance optimization. You can see my original patch here

ef878f0

For internal pandas development, we should probably start biasing toward more precise internal APIs where possible, but when user convenience is concerned (where I've typically leaned toward more "polymorphic" / intuitive constructors at the expense of purity) there's a decent argument.

So I know it is a significant improvement and lowers memory usage, but is RangeIndex something we even wish to advertise heavily, since for most users it will just "be there" incidentally through use of read_csv or DataFrame or other constructors that create default indexes?

jreback · 2016-01-17T03:18:02Z

RangeIndex was/is really an internal performance substitute for Int64Index. However we DO have an (maybe implicit) guarantee on idempotency for a Index objects.

So happy to discuss:

I think its much easier in a new issue, as I doubt that long twisting conversation (about multiple topics) was easy to follow
as discussed this is mostly internal, though we are exposing it as a range-like object, hence the default constructor
I agree with a bit too much magic in places, but I find this convenient and unambiguous, as I indicated to @shoyer , I don't find any good reason not to do this.
contrary to @shoyer 's statement, RangeIndex(RangeIndex(...)) merely returns the original object (you can of course get a copy via copy=True, but it is False by default).
many-many times internally we have to guarantee that something is an index, usually via _ensure_index, but this goes thru Index, not always what you want. I don't think this is actually used very much (e.g. almost always self._shallow_copy() is called), but I don't see any good reason to break from the: self._constructor(self) possibility. Then you have special casing all around. Something which I feel is SO important at the size of the library. Special cases just causes bugs.

@shoyer made two points: is this user-facing or a performance enhancement.

It is both, hence the intertanglement of the two.

kawochen · 2016-01-17T04:19:02Z

@shoyer range does the same (10 in range(10) is/means stop=10). Do you think adequate documentation would be a good enough compromise (current error message of RangeIndex constructor says start must be integers)?

jreback · 2016-01-17T16:31:52Z

In [1]: Index(xrange(0,10))
Out[1]: RangeIndex(start=0, stop=10, step=1)

In [5]: Index(pd.RangeIndex(0, 10))
Out[5]: RangeIndex(start=0, stop=10, step=1)

In [2]: pd.RangeIndex(pd.RangeIndex(0,10))
Out[2]: RangeIndex(start=0, stop=10, step=1)

In [3]: pd.RangeIndex.from_range(xrange(0,10))
Out[3]: RangeIndex(start=0, stop=10, step=1)

to satisfy @shoyer concern about magic and my own about idempotency/consistency

I would suggest we do this:

# [1] (though impl changes slightly),, but API the same
In [1]: Index(xrange(0,10))
Out[1]: RangeIndex(start=0, stop=10, step=1)

In [5]: Index(pd.RangeIndex(xrange(0, 10)))
Out[5]: RangeIndex(start=0, stop=10, step=1)

# [2] stays the same
In [2]: pd.RangeIndex(pd.RangeIndex(0,10))
Out[2]: RangeIndex(start=0, stop=10, step=1)

# but also accepts range (so this covers [3] from above)
In [2]: pd.RangeIndex(xrange(0,10))
Out[2]: RangeIndex(start=0, stop=10, step=1)

# make the 'range like' constructor something else (from_steps)
In [3]: pd.RangeIndex.from_steps(0, 10)
Out[3]: RangeIndex(start=0, stop=10, step=1)

so this makes RangeIndex constructor very much like the other Index constructor

def __new__(cls, data=None, dtype=None, copy=False, name=None, fastpath=False, **kwargs):

allows idempotency, and no possibility of confusion/magic
makes de-facto equal actions yield equal results (e.g. xrange/RangeIndex) have a single argument constructor from a 'range' object. These are quite similar, but very different from passing 3 integers.

if consensus, this is a pretty easy change.

shoyer · 2016-01-17T17:55:55Z

@jreback the only downside of your new proposal is that the repr now no longer matches the constructor. I suppose we could make it RangeIndex(range(0, 10, 1)) but that does look a little strange to me.

jreback · 2016-01-17T18:18:24Z

In [20]: df.index
Out[20]: RangeIndex(start=0, stop=5, step=1)

hmm, right.

could do

RangeIndex.from_steps(0, 10, 1)

bug agreed that is a bit more verbose

of course could accept start,stop,step as **kwargs and correctly handle that
(though we are then back to the original constructor that I had)

shoyer mentioned this issue Jan 17, 2016

ENH: RangeIndex redux #11892

Merged

2 tasks

jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Jan 17, 2016

toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Feb 22, 2020

mroeschke added the Enhancement label May 5, 2020

jbrockmendel mentioned this issue Mar 5, 2021

ENH: allow start=range_object in RangeIndex constructor #40241

Merged

4 tasks

jreback added this to the 1.3 milestone Mar 9, 2021

jreback closed this as completed in #40241 Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: should the RangeIndex constructor accept start=range_index? #12067

API: should the RangeIndex constructor accept start=range_index? #12067

shoyer commented Jan 17, 2016

wesm commented Jan 17, 2016

jreback commented Jan 17, 2016

kawochen commented Jan 17, 2016

jreback commented Jan 17, 2016

shoyer commented Jan 17, 2016

jreback commented Jan 17, 2016

API: should the RangeIndex constructor accept start=range_index? #12067

API: should the RangeIndex constructor accept start=range_index? #12067

Comments

shoyer commented Jan 17, 2016

wesm commented Jan 17, 2016

jreback commented Jan 17, 2016

kawochen commented Jan 17, 2016

jreback commented Jan 17, 2016

shoyer commented Jan 17, 2016

jreback commented Jan 17, 2016