-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/API: Indexes on empty frames/series should be RangeIndex #49637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: Indexes on empty frames/series should be RangeIndex #49637
Conversation
@pandas-dev/pandas-core any comment? IMO, this simplify the construction logic nicely, so will be good to get in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, just got a question about a comment (which may no longer be relevant, especially with a new major release on the horizon)
I think this is also a sensible change for 2.0.0. |
this doesn't break any tests? |
@jreback , this break a lot of tests, as empty series/frames are in a lot of tests. So I wanted to know it if others approve of this general idea/change, before really going through the tests. |
I would be supportive of this change; IMO it would needs its own section in the 2.0 whatsnew |
closes #40077? |
f224dad
to
4237e62
Compare
Still need docs + some cleanup, otherwise is ready if passes the CI. |
This is ready for review now. |
9effc51
to
36f3630
Compare
Updated. |
Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Just a clarification on the whatsnew note
3ccd284
to
f6e5f0e
Compare
Rebased & updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Matthew Roeschke <[email protected]>
ba9bc75
to
131dfef
Compare
Gentle ping. I've rebased (no code changes) just in case. |
Thanks @topper-123 |
I've looked into #49572 and it seems that the fix is reasonable simple, see included code. In short, currently if the user hasn't supplied
index
orcolumns
values, then the index/columns is aRangeIndex
if the data has lenght > 0, while they'reIndex[object]
if the length is 0. After this PR, aRangeIndex
will be used in both cases. This will simplify type inference etc. testing etc.However, this fix requires changes to a lot of tests (>500), of which I've fixed about 100 (not included in this version of the PR), so it's quite a lot of work to get this fixed up. Test fixing so far seems to be only a matter of ensuring that various empty dataframes/Series have
RangeIndex
rather than aIndex[object]
, so relatively simple, but also tedious work.I didn't get much response to #49572 from the core devs, so before I spend more time on this rather big item, could you guys chime in on if you agree that this a good thing to do? IMO having the same index dtypes for series/frame of length 0 and >0 will conceptually simplify pandas and decrease the amount of surprises that users will encounter.
I've included the code that fixes the issue in this PR, as mentioned, and ATM this PR fails, because I haven't fixed the tests, but I will get them fixed if I get positive response. If you can respond either here or in #49572, I'll of course read them both.