Skip to content

POC: Implement pythonic label-based index slicing NDFrame.slice(closed='left') #27060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Jun 26, 2019

Note: this began as a POC for an additional indexer loc_left, which has now been updated to become the low-level API used for providing the NDFrame.slice functionality.

WIP/POC for #27059. In short, loc was designed with unpythonic right-inclusive semantics, and for dubious considerations. It would be an improvement to treat slicing as taking a closed argument, as many pieces of pandas functionality do, such as Rolling, Interval.

  • Adds a loc_left indexer to complements loc. This could be made private.

  • Adds an NDFrame.slice method is discussed in Discussion: Why is loc label-based slicing right-inclusive?  #27059 which mimics python's slice signature, with the addition of a closed keyword. The default is 'left' to match pythonic convention.

  • The indexer reuses the loc machinery with very few changes, and passes a test suite which augments the existing tests for loc.

  • implement and test loc_left[]

  • implement and test slice(closed=left/both)

  • implement and test slice(closed=right/neither)

Note: #27209 by @simonjayhawkins enhances IndexSlice with a closed argument instead of introducing a new indexer. slice could be adapted to make use of that if desired.

Examples:

In [6]: import pandas as pd
   ...: ix= pd.DatetimeIndex(["2001-01","2001-01","2001-02","2001-02","2001-03","2001-03"])
   ...: df=pd.DataFrame(dict(a=range(6)),index=ix)
   ...: df
Out[6]: 
            a
2001-01-01  0
2001-01-01  1
2001-02-01  2
2001-02-01  3
2001-03-01  4
2001-03-01  5

In [7]: df.slice("2001-03")
Out[7]: 
            a
2001-01-01  0
2001-01-01  1
2001-02-01  2
2001-02-01  3

In [8]: df.slice("2001-03", closed="right")
Out[8]: 
            a
2001-01-01  0
2001-01-01  1
2001-02-01  2
2001-02-01  3
2001-03-01  4
2001-03-01  5

In [9]: df.slice("2001-02", "2001-03")
Out[9]: 
            a
2001-02-01  2
2001-02-01  3

In [10]: df.slice("2001-02", "2001-03",closed="right")
Out[10]: 
            a
2001-02-01  2
2001-02-01  3
2001-03-01  4
2001-03-01  5

In [11]: df.slice("2001-01", "2001-03",step=2)
Out[11]: 
            a
2001-01-01  0
2001-02-01  2

In [12]: df.slice("2001-01", "2001-03",step=2, closed="right")
Out[12]: 
            a
2001-02-01  2
2001-03-01  4

@pep8speaks
Copy link

pep8speaks commented Jun 26, 2019

Hello @pilkibun! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-12 04:13:39 UTC

@ghost ghost changed the title POC: add .locs for pythonic right-open position-based index slicing WIP: add .locs for pythonic right-open position-based index slicing Jun 26, 2019
@codecov
Copy link

codecov bot commented Jun 26, 2019

Codecov Report

Merging #27060 into master will decrease coverage by 50.16%.
The diff coverage is 80%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27060       +/-   ##
===========================================
- Coverage   92.04%   41.88%   -50.17%     
===========================================
  Files         180      180               
  Lines       50713    50715        +2     
===========================================
- Hits        46680    21243    -25437     
- Misses       4033    29472    +25439
Flag Coverage Δ
#multiple ?
#single 41.88% <80%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 53.64% <ø> (-39.85%) ⬇️
pandas/core/indexes/numeric.py 62.38% <100%> (-34.96%) ⬇️
pandas/core/indexes/datetimes.py 45.46% <100%> (-50.91%) ⬇️
pandas/core/indexes/base.py 55.59% <66.66%> (-41.34%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff50b46...c20eca7. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 26, 2019

Codecov Report

Merging #27060 into master will decrease coverage by 50.93%.
The diff coverage is 23.8%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27060       +/-   ##
===========================================
- Coverage   92.83%    41.9%   -50.94%     
===========================================
  Files         182      180        -2     
  Lines       50430    50842      +412     
===========================================
- Hits        46815    21303    -25512     
- Misses       3615    29539    +25924
Flag Coverage Δ
#multiple ?
#single 41.9% <23.8%> (-0.56%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 56.08% <ø> (-42.06%) ⬇️
pandas/core/indexes/period.py 36.34% <0%> (-53.94%) ⬇️
pandas/core/indexing.py 53.68% <0%> (-39.33%) ⬇️
pandas/core/indexes/timedeltas.py 47.59% <0%> (-43.35%) ⬇️
pandas/core/indexes/numeric.py 61.34% <100%> (-35.3%) ⬇️
pandas/core/indexes/datetimes.py 45.17% <28.57%> (-49%) ⬇️
pandas/core/indexes/base.py 55.6% <33.33%> (-41.66%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/gcs.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
... and 170 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c64c9cb...079340e. Read the comment docs.

@ghost ghost changed the title WIP: add .locs for pythonic right-open position-based index slicing POC: add .ll indexer for pythonic right-open label-based index slicing Jun 27, 2019
@ghost ghost changed the title POC: add .ll indexer for pythonic right-open label-based index slicing POC: add .ll indexer for pythonic label-based index slicing Jun 27, 2019
@jbrockmendel
Copy link
Member

What is the use case that makes this necessary? I get that matching python slicing is desirable if we were starting from scratch, but we’ve got a lot of inertia.

@ghost
Copy link
Author

ghost commented Jun 29, 2019

The indexer portion wasn't meant for merging. I couldn't find an explanation of why the choice was made, so I just went ahead and did it to see if I would hit a hurdle. Linked issue now has a suggestion of adding a PandasObject.range(start,stop,step,closed='left'). Which I'll probably add in the next few days. That relies on having implemented this.

Most pandas interval-like things (Interval, Rolling) treat all cases equally by having a closed kwd. It's exasperating to have to append foo-pd.Timedelta(nanoseconds=1)` over and over, just because 6 years ago it was decreed that right-closed intervals are cool, and right-open intervals are obscure corner cases. You can't just legislate use-cases out of existence.

@ghost ghost changed the title POC: add .ll indexer for pythonic label-based index slicing POC: Implement pythonic label-based index slicing Jul 3, 2019
@simonjayhawkins simonjayhawkins added API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 4, 2019
@ghost ghost changed the title POC: Implement pythonic label-based index slicing POC: Implement pythonic label-based index slicing using NDFrame.slice(closed='left') Jul 8, 2019
@ghost ghost changed the title POC: Implement pythonic label-based index slicing using NDFrame.slice(closed='left') POC: Implement pythonic label-based index slicing NDFrame.slice(closed='left') Jul 18, 2019
@ghost
Copy link
Author

ghost commented Jul 31, 2019

It's been a month without any response from maintainers here or in the issue. Writing on the wall.

@ghost ghost closed this Jul 31, 2019
@ghost ghost deleted the locs_pythonic_slicing branch July 31, 2019 14:23
@simonjayhawkins
Copy link
Member

@pilkibun Thanks for working on this.

It's been a month without any response from maintainers here or in the issue. Writing on the wall.

There needs to be agreement on the api design before discussing the details of the implementation.

I think there is some overlap with #27363 with regards to api design.

There has been some discussion in #27363 regarding the api design. the current implementation there uses .loc(regex=True).

IMO if that PR gets merged then it would be logical, for consistency, to use .loc(closed=..., step=...)[].

@ghost
Copy link
Author

ghost commented Aug 2, 2019

if that PR gets merged then it would be logical, for consistency,

I still agree with #27363 (comment) so neither should be merged IMHO.

@ghost ghost mentioned this pull request Aug 8, 2019
6 tasks
@ghost
Copy link
Author

ghost commented Aug 8, 2019

Suggested new alternative in #27363 (comment), converting loc into a function.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants