Skip to content

BUG: .loc with DateTimeIndex allows day first string for stop in slice, but not start #58302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
WillAyd opened this issue Apr 18, 2024 · 5 comments
Open
3 tasks done
Assignees
Labels
Bug Index Related to the Index class or subclasses

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 18, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> ser = pd.Series(range(15), index=pd.date_range(start="2024-01-01", freq="D", periods=15))
>>> ser.loc["1/10/2024":"1/14/2024"]  # ok I suppose
2024-01-10     9
2024-01-11    10
2024-01-12    11
2024-01-13    12
2024-01-14    13
Freq: D, dtype: int64

>>> ser.loc["1/10/2024":"14/1/2024"] # huh? ok...
2024-01-10     9
2024-01-11    10
2024-01-12    11
2024-01-13    12
2024-01-14    13
Freq: D, dtype: int64

>>> ser.loc["10/1/2024":"14/1/2024"]
Series([], Freq: D, dtype: int64)


### Issue Description

I am not really sure what the expectations are for handling non-ISO strings as an indexer for a DTI. In the examples above it seems acceptable for the /stop/ argument in the slice to have the day first but the /start/ argument does not assume this

### Expected Behavior

Not sure - maybe we should just disallow non-ISO string formats?

### Installed Versions

'3.0.0.dev0+681.g434fda08cf'
@WillAyd WillAyd added Bug Index Related to the Index class or subclasses labels Apr 18, 2024
@awojno-bloomberg
Copy link
Contributor

Hey @WillAyd, agree that this is confusing behavior. I took a look at the documentation and there is no clarification on how time strings should be handled in this scenario. This leads to a lot of ambiguity especially in an altered example of the one you provided above:

ser = pd.Series(range(365), index=pd.date_range(start="2024-01-01", freq="D", periods=365))
ser.loc["7/1/2024":"8/1/2024"] # could be interpreted as M/D/YYYY or D/M/YYYY

My opinion would be the same as yours, enforce iso format for date strings to limit user confusion and update documentation for clarity, although this might have backward compatibility issues as we are tightening the interface. If we are looking for a lighter compromise, we could add a warning for when non-iso string formats are used. Any thoughts?

@WillAyd
Copy link
Member Author

WillAyd commented Aug 17, 2024

I think @MarcoGorelli did a ton of work with cleaning up dayfirst usage - let's see if he has any thoughts

@MarcoGorelli
Copy link
Member

thanks for the ping - agree on moving towards this being iso-like only

it's one thing to allow flexibility when parsing csvs (say), where users don't necessarily have control over what format their data's in - but it's another when using loc, when users should be in control of what they pass in

@awojno-bloomberg
Copy link
Contributor

Great, thanks for the response. Will start working on this

@awojno-bloomberg
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses
Projects
None yet
Development

No branches or pull requests

3 participants