-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DatetimeIndex lookups tz inconsistency #18435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks related : #17920 |
@jbrockmendel Your abbreviations make it difficult for me to read what exactly you mean. Could you use more explicit variable naming for explaining? Thanks a lot! |
Woops, didn't notice at first it was the variable names where abbreviations were unclear, assumed it was vernacular.
In this case |
Migrated from #18376
So if backward-compat weren't a concern, you would be OK with raising and requiring users to pass tzaware indexers to tzaware DatetimeIndexes? If so, then I bet @jreback will agree that correctness is more important than backwards-compat in this case and we should Do It Right. (If so, the rest of this comment is redundant)
You're right that it is not part of the standard. The idea is that the "TZ" directive would be identified before the date-parsing step-- but on second thought it'd be easier just to append the offset e.g. "-0500". I wrote quickly last night; let me try to explain the suggestion more clearly. There are three issues involved here. There is the correct behavior, the convenient behavior, and the consistent behavior. The Technically Correct thing to do is to disallow tznaive/tzaware comparisons. The convenient thing to do is to Just Work when trying to index/slice a tzaware DatetimeIndex with strings. The consistent behavior is that slicing and comparison are linked, e.g. The easy cases:
The hard cases:
But for DatetimeIndex behavior to be internally consistent, that would require that comparison methods also assume that tznaive objects implicitly have the same tz as the index. And we should definitely not be making that assumption. It would better to raise in this case and require that users explicitly pass tzaware indexers/slicers. |
@jbrockmendel you need to strictly differentiate between strings and actual datetimes here. comparision with datetimes are necessarily and always will be strict w.r.t. tz-awareness. The conversion of strings can be a question, but is orthogonal. |
It would work in my use case, this assumption was done not to break other people's code unnecessarily. I agree that your solution is cleaner but for me pandas means having a lot of convenience and the convenience of today creates lots of edge cases. When you look at the documentation, you can pass a lot of string labels which do not look like a datetime-like object but instead are just the year, the year and the month or something similar. We should not assume the user to provide a ISO8601 conform label only. And how do you suggest to make a timezone aware label look like that way? A small sample: t = pd.date_range(start=pd.datetime(2000, 1, 1), periods=400, freq='d')
df = pd.DataFrame(index=t, data={"val": range(len(t))})
df.loc["2000-01":"2000-03"]
df.loc["2000-01"] I see no way in how we can add timezones to these kinds of labels which can be intuitively read. And only if all kinds of possible labels allow adding timezones, the strictness you suggest is suitable for this library. But that would be a major update for me, rather something which can be discussed for a similar but completely new library. |
@jbrockmendel Have you made a list of possible string index labels (except the string which equals an ISO8601 string) and how you want to make them timezone aware? Just create a concept before you push more commits on this issue. |
That was the idea behind adding "TZ" at the end of a non-aware string.
Agreed. My contention is that special-casing strings in this context and not others is going to create more corner cases; inconsistency will make things less predictable and therefore less convenient.
Not sure I understand this bit. This is an Issue, has no commits. |
With issue I meant the concern in a non-technical way, not the GitHub issue itself. The idea of adding "TZ" sounds like a possible concept but do other people agree on that? Is it the way to go? I somehow like the idea because it allows "2016-01TZ+01:00" and currently I don't know how to achieve that with a similar easiness.
|
I agree we should consider both string and actual datetime objects.
Yes indeed. So since for slicing we want to special case strings (I don't think we should start thinking of extending datetime string formats with kind of a 'TZ' indicator), let's do it everywhere? So also for comparisons? (which we already do in master I think) |
@jorisvandenbossche thanks for weighing in here and elsewhere. Supposing we assume that strings are implicitly localized to the same tz as a DatetimeIndex for comparisons, do we do the same for Timestamps? i.e. are we ok with What about other DatetimeIndex binary operations that take a datetime-like arg? Subtraction is the only one that comes to mind. Conditional on #17920 going in (and I'm +0.75 on it because it is convenient), there is going to be an equivalence-breakage somewhere. Right now I'm inclined to keep comparisons technically correct (well, make them technically correct with #18376) and break the |
Copied from #18376:
How to treat it in the whatsnew notes is above my pay grade. But over the course of this PR's history I've become increasingly convinced that the current comparison behavior is Just Plain Wrong and should be treated like a bug. The three options on hand are 1) this PR which makes AFAICT the main objection to 1) is that in conjunction with #17920 it breaks the equivalence between
... and most of all, I am not remotely confident that this list is complete. How many places across the code-base do comparisons with DatetimeIndex objects in one place or another? I have no idea. A tz-aware Any of the available options introduces an inconsistency somewhere. AFAICT Option1 breaks a convenience equivalency, will do so loudly, and as a result will not snowball into other inconsistencies. Special-casing string comparisons generates a whole mess of other potential (often silent) problems that can be avoided by enforcing behavior that is already canonical. |
@jbrockmendel on the list? |
I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this? |
@jreback why would 2016-01-01 be > now? Assuming you ran this recently... @jorisvandenbossche Is your last comment (the one blockquoting the 2016-01-01) about 2016-01-01? AFAICT that is unrelated to this Issue. |
@jreback didn't mean the False was wrong, but that he wants it to raise an error (different timezones), and my answer was on that aspect. |
@jreback can you confirm this is what you intended? If so I'm surprised because the scalar behavior is pretty well-established. |
See the discussion in the linked issue:
#15249
|
I think you are certainly experienced enough to have a valuable opinion about that!
What do you mean with "all other datetime-like comparisons" ?
I don't understand this one. Can you give a concrete example?
I would say: of course, if we choose for allowing strings to compare to tz aware, we do the same for Timestamp (those currenlty don't even compare to strings at all, so this equivalency already does not hold at the moment
That's true. But that is the same with slicing with strings, there this also does not hold up. So I think I would be OK with that.
The boxing is true (but for all strings, not only in the case of tz-aware), but Series with datetime values happily compares to a string at the moment.
Yes. And if a user uses strings instead of aware objects, he made the decision to see the string as 'local' time (I mean local to the data). To be clear, I am not necessarily against the disallowing of comparing strings to tz-aware data, but I am strongly opposed to comparing to strings in general, and it is not fully clear if you are arguing for that or only the former. |
Thanks, the context there helps.
I was under the mistaken impression that those comparisons were done, my mistake. This tempers the strength of my opinion somewhat, will need to think on it before deciding how much.
Choose any two: DatetimeIndex, Timestamp, datetime, datetime64-Series, datetime64-Dataframe-column.
The following is in 0.21.1 and is part of what #18376 is intended to fix:
i.e. equality is not transitive. Am I the only one who finds that troubling from a ability-to-reason-about-code perspective? |
Well, DatetimeIndex and datetime64 Series/DataFrame all allow comparison to strings when they are tz-aware. It is only
The example you give is for tz-aware timestamps strings, where indeed
Of course I "cheat" a bit as I have taken strings for a and c (so the other way around as in your example), but again, equality is not transitive. |
You are technically correct. I'm still not wild about allowing inconsistent string comparisons through, but this isn't a hill I want to die on. If I can get your blessing on #18376 in its current form (that punts on strings, requires tzawareness-compat for everything else) I'll happily spend my time elsewhere. |
I think this has been resolved. Closing. |
I what way has it been resolved? |
I thought that #17920 had been merged, apparently was wrong about that. |
TLDR: enforcing tzaware vs tznaive compat in DatetimeIndex comparisons (#18162) appears to be inconsistent with current slicing behavior.
The following example is based off of tests.series.test_indexing.test_getitem_setitem_datetimeindex:
The behavior that we want to enforce is #18162 requires that
dti < pd.Timestamp(lb)
should raise, as shoulddti < lb
. At the moment they effectively get treated as UTC. But if we change this so that it does raise, the following fromtest_getitem_setitem_datetimeindex
breaks pretty irrevocably:There is also
ts[lb:rb]
which if we're being strict should raise, but at least we could make that still work. (BTW this implicitly casts lb and rb to US/Eastern, albeit in different places. So far that appears to be a related but distinct can of worms.)The text was updated successfully, but these errors were encountered: