Skip to content

[#16737] Index type for Series with empty data #32053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

SaturnFromTitan
Copy link
Contributor

@SaturnFromTitan SaturnFromTitan commented Feb 17, 2020


I picked up all the notes from #16737 where it was suggested to use Index over RangeIndex for empty data.

@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch from 738da11 to 3704069 Compare February 17, 2020 18:56
@SaturnFromTitan SaturnFromTitan changed the title 16737 Consistent Index type on empty data 16737 Consistent Index type for Series with empty data Feb 17, 2020
@SaturnFromTitan SaturnFromTitan changed the title 16737 Consistent Index type for Series with empty data [#16737] Index type for Series with empty data Feb 17, 2020
@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch 3 times, most recently from 7535447 to 20b5ebb Compare February 17, 2020 21:07
@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch from 20b5ebb to 4c8d1ea Compare February 18, 2020 21:59
@SaturnFromTitan
Copy link
Contributor Author

Can you please take another look @WillAyd. I think I addressed all your comments

@WillAyd
Copy link
Member

WillAyd commented Feb 20, 2020

@SaturnFromTitan maybe missed the comment but did we actually decide that this is the right approach? I think it's actually confusing for the index types to differ:

Here is current behaviour on master

# series
>>> pd.Series().index
Index([], dtype='object')
>>> pd.Series([]).index
RangeIndex(start=0, stop=0, step=1)
>>> pd.Series([1]).index
RangeIndex(start=0, stop=1, step=1)

# frame
>>> pd.DataFrame().index
Index([], dtype='object')
>>> pd.DataFrame([[]]).index
RangeIndex(start=0, stop=1, step=1)
>>> pd.DataFrame([[1]]).index
RangeIndex(start=0, stop=1, step=1)

Why would we only want to change the second Series case shown above?

@SaturnFromTitan
Copy link
Contributor Author

@WillAyd I went that direction as it was suggested in the comments of #16961:

@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Feb 23, 2020

I see your point though @WillAyd and agree it could lead to more confusion merging the current changes. Do you think we should...

  1. keep the current master behaviour
  2. adjust the default index types for pd.Series() and pd.DataFrame() to use a RangeIndex as well

Not sure about all the details, but option 2 seems to be the most consistent.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually a non-trivial change, but I am not sure we can easily deprecate this.

@jorisvandenbossche @TomAugspurger

@jreback
Copy link
Contributor

jreback commented Feb 23, 2020

@SaturnFromTitan also you should add the expansion test as indicated in the OP

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think several of your changes to the tests are correct. If we're still getting RangeIndexes there, then we'll need to update the code rather than the tests.

@SaturnFromTitan
Copy link
Contributor Author

Before I continue to work on this, can we please clarify if this should be changed in the first place or not?

And if it should change, then is my current implementation is the way to go or should I rather adjust the index of pd.Series()? See WillAyds comment and my reply.

I'd also be fine with closing this issue+PR as a "Won't change". It seems like any solution could yield something that's surprising to some users and expected to others.

@SaturnFromTitan
Copy link
Contributor Author

forgot the pings: @jreback @TomAugspurger @WillAyd

@TomAugspurger
Copy link
Contributor

IMO, the default index for pd.Series([]) should match pd.Index([]), an empty object-dtype index.

@jorisvandenbossche
Copy link
Member

I agree with @TomAugspurger that the default index for pd.Series([]) should match pd.Index([]), an empty object-dtype index.

We already do a deprecation warning for pd.Series([]) becoming object dtype in the future, I suppose we can add a second warning for the Index also becoming an object index in the future? Or would that be too annoying?

(@SaturnFromTitan I reopened the PR not to mean that you should continue working on this (although it is certainly welcome of course, once there is agreement on the behaviour), but that we should have something open with this discussion. But so if not, we should move the discussion to an issue)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 1, 2020 via email

@jreback jreback added Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses labels Apr 10, 2020
@SaturnFromTitan
Copy link
Contributor Author

Getting back at this, thanks for keeping it alive @jorisvandenbossche. I'll change the code to add the warning.

I'll take the same approach as in the dtype change of Series([]): First DeprecationWarning, later FutureWarning, then code change.

@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch from d6167ff to b924865 Compare April 20, 2020 09:13
@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Apr 20, 2020

I still have to fix a ton of DeprecationWarnings in the test suite. I'll take care of that in the coming days.

@pep8speaks
Copy link

pep8speaks commented Apr 20, 2020

Hello @SaturnFromTitan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-22 08:02:50 UTC

@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch from b85bb42 to 70708b3 Compare April 20, 2020 10:15
@SaturnFromTitan SaturnFromTitan force-pushed the 16737-unexpected-results-creating-an-empty-series branch from 92c15ae to 96556cd Compare April 22, 2020 10:13
@SaturnFromTitan
Copy link
Contributor Author

Please have another look @jreback @TomAugspurger @jorisvandenbossche

@WillAyd
Copy link
Member

WillAyd commented Jul 29, 2020

@SaturnFromTitan looks like a lot of merge conflicts have piled up - can you resolve?

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

Closing as I think stale but ping @SaturnFromTitan if you'd like to pick back up and can address merge conflicts

@WillAyd WillAyd closed this Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected results creating an empty Series
6 participants