Skip to content

BUG: loc raising KeyError for string slices in list-like indexer and DatetimeIndex #38153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 29, 2020

I looked into this and added an implementation for list-like indexers containing slices and DatetimeIndexes. If we want to support this we probably have to do something like this.

@jbrockmendel
Copy link
Member

There are a couple of issues here, first of which is that it isn't obvious that we actually should support this. It does improve internal consistency, but adds a lot of complexity.

Even if we do decide to move forward with this, id want to start with making get_indexer behave like a vectorized get_loc.

(ideally id like to get rid of of the FooIndex._convert_bar_indexer methods altogether)

@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

There are a couple of issues here, first of which is that it isn't obvious that we actually should support this. It does improve internal consistency, but adds a lot of complexity.

Even if we do decide to move forward with this, id want to start with making get_indexer behave like a vectorized get_loc.

(ideally id like to get rid of of the FooIndex._convert_bar_indexer methods altogether)

+1 here. yeah partial date indexing has only very limited support now, we don't wish to expand this, its mainly useful for interactive use.

unless other objections going to close the original issue.

thanks @phofl for the effort here.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype Timezones Timezone data dtype and removed Timezones Timezone data dtype labels Nov 29, 2020
@phofl
Copy link
Member Author

phofl commented Nov 29, 2020

No worries, I was not sure either if this would be desirable. But wanted to share a possible solution before discussing this.

@jreback jreback added the Needs Discussion Requires discussion from core team before further action label Dec 2, 2020
@jbrockmendel
Copy link
Member

did we settle on closing this?

@phofl
Copy link
Member Author

phofl commented Dec 18, 2020

Think so?

try:
for k in key:
# Convert slices to list of integers
indexer = positions[self.get_loc(k)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use .get_indexer() here? (and still then iterate over the results)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed not supporting this. Has this changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah i think we want to close this

if not isinstance(key, list):
# There are no slices, so we can dispatch back
return super()._convert_listlike_indexer(key)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you can short-circust if we don't have any slices right?

Comment on lines +242 to +250
expected = Series(
1,
index=[
Timestamp("2001-01-29"),
Timestamp("2001-01-30"),
Timestamp("2001-01-31"),
Timestamp("2001-01-30"),
],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but this might look a bit better if you define index = on its own line and then here have expected = Series(1, index=index)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jreback
Copy link
Contributor

jreback commented Dec 24, 2020

closing, though we should have a test and good error message if someone attempst this. @phofl want to repurpose this?

@phofl
Copy link
Member Author

phofl commented Dec 24, 2020

Yep will do

@phofl
Copy link
Member Author

phofl commented Dec 29, 2020

Currently

index = pd.date_range('2001-01-01', periods=100)


s = pd.Series(1, index=index)
s.loc[['2001-01']]

returns

2001-01-01    1
dtype: int64

which is obviously wrong. I would change this to raise if this is attempted.

@jbrockmendel
Copy link
Member

s.loc[['2001-01']]

I'm not sure how avoidable this is given that Timestamp("2001-01") returns 2001-01-01, but if you've got an improvement in mind, go for it

@phofl
Copy link
Member Author

phofl commented Jan 1, 2021

Yep, you are probably right. Have not found a way to avoid this. Closing this pr.

@phofl phofl closed this Jan 1, 2021
@phofl phofl deleted the 27180 branch January 1, 2021 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial date indexing only works with single key
4 participants