Skip to content

POC: add closed argument to IndexSlice #27209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins commented Jul 3, 2019

alternative to #27060 for consideration.

implementation is far from complete.

only the following cases have been implemented to start..

>>> import pandas as pd
>>>
... dates=[
... "2001-01-01 23:50",
... "2001-01-02 00:00",
... "2001-01-03 00:08"]
>>>
>>> ser=pd.Series(range(3),pd.DatetimeIndex(dates))
>>> ser
2001-01-01 23:50:00    0
2001-01-02 00:00:00    1
2001-01-03 00:08:00    2
dtype: int64
>>>
>>> ser.loc[:pd.Timestamp("2001-01-02 00:00")]
2001-01-01 23:50:00    0
2001-01-02 00:00:00    1
dtype: int64
>>>
>>> idx = pd.IndexSlice
>>> ser.loc[idx[:pd.Timestamp("2001-01-02 00:00")]]
2001-01-01 23:50:00    0
2001-01-02 00:00:00    1
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='left')
>>> ser.loc[idx[:pd.Timestamp("2001-01-02 00:00")]]
2001-01-01 23:50:00    0
dtype: int64
>>>
>>> ser.loc[:"2001-01-02"]
2001-01-01 23:50:00    0
2001-01-02 00:00:00    1
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='left')
>>> ser.loc[idx[:"2001-01-02"]]
2001-01-01 23:50:00    0
dtype: int64
>>>
>>> ser2=pd.Series(range(3),index=[50,70,80])
>>>
>>> ser2.loc[:70]
50    0
70    1
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='left')
>>> ser2.loc[idx[50:70]]
50    0
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='right')
>>> ser2.loc[idx[50:70]]
70    1
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='both')
>>> ser2.loc[idx[50:70]]
50    0
70    1
dtype: int64
>>>
>>> idx = pd.IndexSlice(closed='neither')
>>> ser2.loc[idx[50:70]]
Series([], dtype: int64)
>>>
>>> idx = pd.IndexSlice(closed='foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\indexing.py", line 97, in __call__
    return _IndexSlice(closed=closed)
  File "C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\indexing.py", line 93, in __init__
    raise ValueError(msg)
ValueError: invalid option for 'closed': foo

@simonjayhawkins simonjayhawkins added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 3, 2019
@simonjayhawkins
Copy link
Member Author

That said, I really really wish you wouldn't pursue this direction.

I'm continuing at the moment as this looks promising. This implementation requires a closed argument for all slice related methods and functions. This could prove reusable even with an alternative api design.

early days, but using the IndexSlice implementation should facilitate testing of the addition of these closed arguments.

even if the alternative closed slices are not made public, having closed arguments to the slicing methods/functions may prove useful for internal use (but not investigated that yet)

There is a suggestion on the table in the original issue, which is to add a dedicated method (called range, slice, etc') with a closed argument.

This alternative can also be used with __getitem__ and may allow setting (not looked at that yet). I hoping that it will also be applicable to MutliIndexes and DataFrames using different closed parameters in the same index operation.

Shoehorning a solution on top of loc may seem less disruptive, but it will make it impossible for the original decision to ever be truly corrected, instead there will be this inconvenient semi-invisible feature to point to.

applies to any indexer that takes one or more slices and should be 100% non-breaking.

If you really feel that IndexSlice is a cleaner way to solve this than having frame.slice(start,end), I can't stop you, but I really hope you'll reconsider.

cleaner from a code perspective... but open to suggestions for alternative api designs.

@simonjayhawkins
Copy link
Member Author

a simpler api that would still resuse some of the work done here could be pd.set_option('label_indexing', 'closed_left')

@jreback
Copy link
Contributor

jreback commented Jul 5, 2019

@simonjayhawkins I think an even better way of doing this (not quite as flexible) is that .loc() actually can take args, e.g.

.loc(axis=1)[] is a thing, so for example

.loc(axis=None, closed='both')[] would be a simple way to expand this (e.g. then easily closed='left', 'right', ;neither' would be possible, w/o expanding the namespace. This is less flexible because it would necessarily apply to all indexers for that call (whereas here you allow each indexer to be separate).

@simonjayhawkins
Copy link
Member Author

@jreback that sounds better.

This PR is effectively in two parts. pass a IndexSlice until a slice method is reached and then pass on the slice and the closed argument thereafter.

the first part could easily be changed.

for the second part, a closed argument is being added mostly to methods that already have a kind argument. I'm seeing the distant possibility of deprecating kind and allowing separation of the Index type and Indexer type. so will likely keep going down this path for the moment.

@ghost
Copy link

ghost commented Jul 5, 2019

This PR is effectively in two parts. pass a IndexSlice until a slice method is reached and then pass on
the slice and the closed argument thereafter.

Thanks for explaining, I think then I misunderstood your intentions. Absolutely, there needs to be a low-level API to back slice and perhaps IndexSlice is indeed a better way to go. My one concern is that the low-level API will end up as the user-facing API, i.e that slice won't be added because you make it possible to pass in a IndexSlice with a closed argument.

@simonjayhawkins
Copy link
Member Author

closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants