Skip to content

ENH: support Ellipsis in loc/iloc #37750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 16, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Nov 10, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

closes #10956

@WillAyd
Copy link
Member

WillAyd commented Nov 10, 2020

I realize there is still an open issue for this but it is quite old. What do you see as the main motivation to support this given we no longer have a Panel?

@WillAyd WillAyd added the Indexing Related to indexing on series/frames, not to indexes themselves label Nov 10, 2020
@jbrockmendel
Copy link
Member Author

What do you see as the main motivation to support this given we no longer have a Panel?

As @shoyer mentioned in #10956, this makes it easier to write code that is valid for both Series and DataFrame, e.g. obj.loc[..., labels]

In particular Im finding this would be useful for parametrizing indexing tests.

@jreback jreback added this to the 1.2 milestone Nov 14, 2020
@jreback
Copy link
Contributor

jreback commented Nov 14, 2020

I agree this is good for general indexing compatiblity and is unambiguous.

@jbrockmendel can you make this raise for a MI or is it unambiguous there? (I think it is more useful than the slice syntax, but that should be a dedicated PR).

cc @jorisvandenbossche

@jbrockmendel
Copy link
Member Author

can you make this raise for a MI or is it unambiguous there? (I think it is more useful than the slice syntax, but that should be a dedicated PR).

MI cases are appreciably more complicated and im planning to handle them separately

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't see myself using it, but the arguments of @shoyer on the issue certainly make sense (although they are from a time we still had multidimensional dataframes (>2d) I think)

Now, if adding it, this will need some docs.

Also, I think we would need some some more test cases?
Eg df.loc[..., ...] (in numpy this errors about "an index can only have a single ellipsis ('...')")
Or series.loc[...], or df.loc[..., 1, 2], ...

In numpy, arr[.., 1] and arr[1] actually do something different on a 1D array (numpy scalar vs 0d array; now we don't have this difference in pandas, so that's probably not an issue)

MI cases are appreciably more complicated and im planning to handle them separately

Do you mean having an Ellipsis "within" the indexer for the MultiIndexes axis?
That's indeed certainly a different topic, but just having an Ellipsis when indexing a dataframe with a MultiIndex can be tested here, I think, as it is already covered by your changes (I assume). Eg the same tests as you have now, like df.loc[..., [1]] but on a DataFrame with a MI (I suppose this doesn't take a different code path necessarily, as it just expands to a null slice, but still good to explicitly have tests with different index types I think)

# TODO: other cases? only one test gets here, and that is covered
# by _validate_key_length
return tup

def _has_valid_tuple(self, key: Tuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this method if it now actually returns a modified key?

@jorisvandenbossche jorisvandenbossche removed this from the 1.2 milestone Nov 16, 2020
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 17, 2020
@jreback jreback added this to the 1.3 milestone Dec 22, 2020
@jreback
Copy link
Contributor

jreback commented Dec 22, 2020

i think this is ok, we should be able to accept ellipsis (even if not that useful). add a whatsnew and rebase. ping on green (i think @jorisvandenbossche had a comment as well).

@jreback
Copy link
Contributor

jreback commented Dec 24, 2020

@jbrockmendel can you rebase this

@jbrockmendel
Copy link
Member Author

can you rebase this

sure, but there are still a handful of comments I havent addressed, so this isnt ready

@jbrockmendel
Copy link
Member Author

updated to flesh out tests, i think comments are addressed

@lithomas1 lithomas1 removed the Stale label May 11, 2021
@simonjayhawkins
Copy link
Member

@jbrockmendel if this is ready can you add a release note and a section on the new capability.

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 25, 2021
@jbrockmendel
Copy link
Member Author

whatsnew added + green

@simonjayhawkins
Copy link
Member

@jbrockmendel needs rebase

@jbrockmendel
Copy link
Member Author

rebased + green

@jreback jreback added this to the 1.4 milestone Oct 10, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a nice extension to support ... in both dimensions as we do for : today (followon)
(and maybe in multi-index) but i am not sure it adds a lot of value.

Actually i dont' think this is any different than : right ultimately right?

@jbrockmendel
Copy link
Member Author

Actually i dont' think this is any different than : right ultimately right?

The main thing this allows is to write code that works for either Series or DataFrame obj.loc[..., "foo"]. IIRC the original motivation was to make sharing tests easier.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2021

Actually i dont' think this is any different than : right ultimately right?

The main thing this allows is to write code that works for either Series or DataFrame obj.loc[..., "foo"]. IIRC the original motivation was to make sharing tests easier.

ahh got it. then am +1 here.

cc @phofl if any comments.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2021

can you create an issue for adding docs about this in the user guide / doc-string as appropriate.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Is
df = pd.DataFrame({"a": [1, 2, 3]}).loc[(..., ), :] valid syntax? cause this raises a KeyError

@jbrockmendel
Copy link
Member Author

df = pd.DataFrame({"a": [1, 2, 3]}).loc[(..., ), :] valid syntax?

If you mean "should it be equivalent to .loc[..., :]?" then no. But...

index = [0, Ellipsis, NotImplemented]
df = pd.DataFrame({"a": [1, 2, 3]}, index=index)

>>> df.loc[(..., ), :]
          a
Ellipsis  2

(on master, not PR branch)

@jreback jreback merged commit 2b05c91 into pandas-dev:master Oct 16, 2021
@jreback
Copy link
Contributor

jreback commented Oct 16, 2021

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the enh-indexing-ellipses branch October 16, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support Ellipsis (...) in pandas indexing
7 participants