Skip to content

RangeIndex as default index #9999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

RangeIndex as default index #9999

wants to merge 13 commits into from

Conversation

ARF1
Copy link

@ARF1 ARF1 commented Apr 27, 2015

WIP on making RangeIndex (#9977) the default index

@ARF1 ARF1 mentioned this pull request Apr 27, 2015
25 tasks
@ARF1
Copy link
Author

ARF1 commented Apr 27, 2015

@jreback @shoyer I tried out what happend making RangeIndex the default index: nothing good. - A whopping 60 errors and 13 test failures. - Most of which meant absolutely nothing to me.

To make the problem more manageable, I started this tree with a minimal RangeIndex implementation: no optimization at all, pure fall-back to Int64Index. In principle it should behave pretty much like the usual default index - except with a different constructor.

Still I have about 16 errors and 2 failures - most of which I have not idea how they happen. I managed to diagnose and fix one and I am working on another.

That said, I will really need help debugging this if you really want to make RangeIndex the default index: I am fairly limited because I have already exceeded the time I was granted to work on RangeIndex by far. I will of course still work on it when I can, but I can no longer do it full-time - which would be required to understand all the pandas internals to fix these widely distributed errors.

Maybe you know another pandas developer would would be willing to help out with this - or take ownership?

@shoyer
Copy link
Member

shoyer commented Apr 27, 2015

Haha, that's not too surprising :). If you keep your branch up to date, I can test it and probably identify at least a few fixes.

On Mon, Apr 27, 2015 at 4:00 AM, ARF1 [email protected] wrote:

@jreback @shoyer I tried out what happend making RangeIndex the default index: nothing good. - A whopping 60 errors and 13 test failures. - Most of which meant absolutely nothing to me.
To make the problem more manageable, I started this tree with a minimal RangeIndex implementation: no optimization at all, pure fall-back to Int64Index. In principle it should behave pretty much like the usual default index - except with a different constructor.
Still I have about 16 errors and 2 failures - most of which I have not idea how they happen. I managed to diagnose and fix one and working on another.
That said, I will really need help debugging this if you really want to make RangeIndex the default index: I am fairly limited because I have already exceeded the time I was granted to work on RangeIndex by far. I will of course still work on it when I can, but I can no longer do it full-time - which would be required to understand all the pandas internals to fix these widely distributed errors.

Maybe you know another pandas developer would would be willing to help out with this - or take ownership?

Reply to this email directly or view it on GitHub:
#9999 (comment)

@ARF1
Copy link
Author

ARF1 commented Apr 27, 2015

@shoyer That would be great. I managed to whittle the issues down to only a few with this minimal RangeIndex implementation. Now however I am a bit stumped by the error in test_set_index2 (pandas.tests.test_frame.TestDataFrame). See: https://travis-ci.org/pydata/pandas/jobs/60231079#L1173

Also, I am not quite sure what to do with the failure in test_constructor_mrecarray (pandas.tests.test_frame.TestDataFrame). (See: https://travis-ci.org/pydata/pandas/jobs/60231079#L1209) The issue seems to be that a dataframe that originally had a RangeIndex returns a dataframe with a Int64Index after some operation. That should in principle be accepted as ok, I think, but how would one adjust the test?

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance API Design labels Apr 29, 2015
@ARF1
Copy link
Author

ARF1 commented May 1, 2015

@jreback @shoyer Ok, it was truely painfuly but I managed to reduce the issues with the minimal RangeIndex implementation as default index to a single test failure. This last one though is stubbornly escaping my analysis.

See (pandas.tests.test_graphics.TestDataFramePlots.test_legend_name): https://travis-ci.org/pydata/pandas/jobs/60815604#L1173

I have narrowed it down to this line: https://github.com/ARF1/pandas/blob/rangeindex_as_default_index/pandas/tools/plotting.py#L1007

Before this line, data.columns.name == 'new', after this line, numeric_data.columns.name == None. This only happens if RangeIndex is the default index. For the life of me I cannot figure out why the name is lost in translation. - Even with side-by-side step-by-step debugging of both versions.

Any chance you could help with this?

name=self.name, fastpath=True)
else:
name = kwargs.get('name', self.name)
return self._int64index._shallow_copy(values, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're dropping the name argument here. You want to use kwargs['name'] = kwargs.get('name', self.name) or something like that (I suppose better yet would be kwargs.setdefault('name', self.name))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thank you! Now I am finally getting somewhere with this. Only remaining issues are with view() and __getitem__() using slices - which you already commented on on the other PR.

@shoyer
Copy link
Member

shoyer commented May 1, 2015

@ARF1 Tracking down bugs like this can be unforgiving, especially when it takes you on a deep dive through pandas internals. Thanks for the hard work!

@ARF1
Copy link
Author

ARF1 commented May 2, 2015

Closing in favour of and rebasing changesets on #9977.

@ARF1 ARF1 closed this May 2, 2015
@ARF1 ARF1 deleted the rangeindex_as_default_index branch May 2, 2015 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants