Skip to content

use range in RangeIndex instead of _start etc. #26581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 5, 2019

Conversation

topper-123
Copy link
Contributor

Make RangeIndex use python's range internally, rather than the three scalars _start, _stop and _step.

Python3's range has several nice properties, that were not available in xrange in Python2. For example it's sliceable, and has .index and count methods. These can be used in Pandas, rather than maintaining Pandas-specific code, offering cleaner code and possibly faster operations.

@topper-123 topper-123 added Clean Index Related to the Index class or subclasses labels May 30, 2019
@topper-123 topper-123 added this to the 0.25.0 milestone May 30, 2019
.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start
Copy link
Contributor Author

@topper-123 topper-123 May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, fast_parquet uses _start etc. which I hadn't predicted, when I started this PR. So this proposes deprecating the attributes _start, _stop and _step, and removing them in 1.0.

This can be bit of a nuisance for downstream maintainers, so don't know if there's opposition to this, but this PR has several benefits as mentioned, so there's a trade-off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pyarrow uses them as well. IIRC recently had a discussion that they're semi-officially public for advanced users (do you recall that @jorisvandenbossche)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't recommend ._range.start, just say .start

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to deprecate these, the downstream will have to adjust

@codecov
Copy link

codecov bot commented May 30, 2019

Codecov Report

Merging #26581 into master will decrease coverage by 50.17%.
The diff coverage is 40%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #26581       +/-   ##
===========================================
- Coverage   91.84%   41.67%   -50.18%     
===========================================
  Files         174      174               
  Lines       50644    50625       -19     
===========================================
- Hits        46516    21099    -25417     
- Misses       4128    29526    +25398
Flag Coverage Δ
#multiple ?
#single 41.67% <40%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/io/packers.py 14.53% <ø> (-73.55%) ⬇️
pandas/core/series.py 45.31% <ø> (-48.31%) ⬇️
pandas/core/frame.py 34.99% <ø> (-62.13%) ⬇️
pandas/core/indexes/range.py 49.84% <35.38%> (-48.13%) ⬇️
pandas/core/dtypes/concat.py 52.19% <70%> (-44.38%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c54dd2...d947faa. Read the comment docs.

@codecov
Copy link

codecov bot commented May 30, 2019

Codecov Report

Merging #26581 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26581      +/-   ##
==========================================
- Coverage   91.87%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50672      -20     
==========================================
- Hits        46575    46554      -21     
- Misses       4117     4118       +1
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.78% <42.99%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/io/packers.py 88.08% <ø> (ø) ⬆️
pandas/core/series.py 93.61% <ø> (ø) ⬆️
pandas/core/frame.py 97% <ø> (-0.12%) ⬇️
pandas/core/dtypes/concat.py 96.58% <100%> (+0.01%) ⬆️
pandas/core/dtypes/common.py 96% <100%> (+0.14%) ⬆️
pandas/core/indexes/range.py 98.47% <100%> (+0.42%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/util/testing.py 90.91% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 454b8c5...228b560. Read the comment docs.

@topper-123 topper-123 force-pushed the clean_RangeIndex branch 5 times, most recently from 8a0750c to 93d281d Compare May 31, 2019 07:13
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for the deprecations?

.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't recommend ._range.start, just say .start

.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to deprecate these, the downstream will have to adjust

The value of the `stop` parameter

.. deprecated:: 0.25.0
Use ._range.stop or .stop instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above (you don't need to put the issue numbers in the code)

The value of the `step` parameter (or ``1`` if this was not supplied)

.. deprecated:: 0.25.0
Use ._range.step or .step instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


return RangeIndex._simple_new(start, stop, step, name=self.name)
new_range = self._range[key]
return RangeIndex.from_range(new_range, name=self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using cls everywhere? (or self.class)? (can be either just be consistent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't, but I've changed it. For constructor method like this I've done self._from_range(...).

@jreback
Copy link
Contributor

jreback commented Jun 1, 2019

also a note in the whatsnew for the deprecation

@pep8speaks
Copy link

pep8speaks commented Jun 1, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-04 09:18:53 UTC

@topper-123 topper-123 force-pushed the clean_RangeIndex branch 13 times, most recently from dee54ed to f14384c Compare June 2, 2019 13:48
@jreback
Copy link
Contributor

jreback commented Jun 2, 2019

lgtm. @TomAugspurger @jorisvandenbossche any objections?

@jreback
Copy link
Contributor

jreback commented Jun 2, 2019

Do we even need to deprecate these or can we just remove? They technically were private before so don't want to set a precedent that have to deprecate internal code going forward as well if we don't have to

@WillAyd see comments above, they are using in pyarrow & fastparquet use them. So best to deprecate.

@WillAyd
Copy link
Member

WillAyd commented Jun 2, 2019

Whoops sorry - thanks for clarifying

@TomAugspurger
Copy link
Contributor

What's the plan on warning? Should we throw a (not visible) DeprecationWarning for now, and promote that to a FutureWarning? Or just add a FutureWarning after pyarrow / fastparquet can update?

@topper-123
Copy link
Contributor Author

topper-123 commented Jun 3, 2019

My plan is to just post an issue on the issue lists for pyarrow & fastparquet with a notice that the attributes will be removed in 1.0 (unless they want us to hold on longer). I think that is sufficient as these are not publicly advertised.

EDIT: My cercern for adding a code message was that it would be visible and therefore annoying for users of those packages. I could add a DeprecationWarning, if that one is more quiet than the usual FutureWarning.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

ok let’s do a DeprecationWarning now
and then post issues should be good

@topper-123
Copy link
Contributor Author

I actually see now that DeprecationWarnings are also displayed by default. I thought they were only shown when starting Python with -Wall or similar. Am I missing something?

Uploading anyway now I've done it and there's a demand for the warnings.

@jorisvandenbossche
Copy link
Member

I actually see now that DeprecationWarnings are also displayed by default

Only if you directly use the deprecated method or attribute, but they should not show up if you eg call a function that uses them (which should be the case for libraries such as pyarrow / fastparquet).

@jorisvandenbossche
Copy link
Member

Personally, I am -1 on deprecating them (pyarrow/fastparquet only just started to use them, we literally said them it was fine, it's more annoying for them to deal with an attribute name change than for us to just keep it a couple of releases more). But if we're using a deprecation instead of future warning, I won't object if the majority is fine with deprecating.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

Personally, I am -1 on deprecating them (pyarrow/fastparquet only just started to use them, we literally said them it was fine, it's more annoying for them to deal with an attribute name change than for us to just keep it a couple of releases more). But if we're using a deprecation instead of future warning, I won't object if the majority is fine with deprecating.

right this is the purpose of the DeprecationWarning (rather than FutureWarning) as it won't show-up too much; but the deprecation should happen.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

@topper-123 can you file an issue on fastparquet and a JIRA on pyarrow.

@topper-123
Copy link
Contributor Author

topper-123 commented Jun 3, 2019

No problem. I'd like to have this merged before I write the issues, so they can see the issue by just pulling pandas master.

@topper-123 topper-123 added Refactor Internal refactoring of code and removed Clean labels Jun 3, 2019
@topper-123
Copy link
Contributor Author

Just added a small clean-up, so __new__ won't have a custom & internal ensure_int, but uses the same ensure_python_int as proposed in #26617. This simplifies the code a bit further.

If this gets accepted before #26617, I'll rebase that and add the requested type hints.

@jreback jreback merged commit 6ce7fc7 into pandas-dev:master Jun 5, 2019
@jreback
Copy link
Contributor

jreback commented Jun 5, 2019

thanks @topper-123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants