-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: allow casting datetime64 to int64? #45034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jorisvandenbossche can you add to the OP the 2-3 relevant responses from that thread. |
ok I reread the discussion a bit.
|
This point I find compelling. IIRC deprecating in that direction was too invasive to be feasible. A thought that didn't come up on the old thread: what happens if/when we have non-nano? Does dt64second.astype(int64) also do a .view(int64), or does it do some division? Implementation questions, some from the old thread:
|
I would think that it will return the underlying integers (no calculation), so the exact integers you get is dependent on the resolution you have. That's also one of the reasons we can't simply change the default resolution. But that's an issue anyhow, regardless of people using astype vs view for this conversion.
Long term I would say yes, but that's something we will also need to deprecate first, as currently it returns the integer representation of NaT.
Casting to int32 already raises an error. Personally, I would allow this in the future (if we error on overflow), but since it raises already there is no urgency on this aspect.
That doesn't work currently, so I think we can defer that question to a later discussion on the more general casting rules (which is now partly happening in #22384, but I need to open dedicated issue for aspects of that discussion (such as also the idea of safer casting by default detecting overflow etc).
For Categorical, you have the For Period it's a bit less clear: the scalar as |
Actually, it's a bit more complicated than that. We did that in the last release (1.3), but did raise an error already before that. But only for tz-naive, and not for tz-aware .. An overview of the behaviours: pandas 1.0 - 1.2
pandas 1.3
Code to generate the tableimport pandas as pd
import warnings
warnings.simplefilter("always")
print(pd.__version__)
results = []
for dt_dtype in ["datetime64[ns]", "datetime64[ns, Europe/Brussels]"]:
s = pd.Series(["2012-01-01", "2012-01-02", "NaT"], dtype=dt_dtype)
for end in [2, 3]:
has_nat = "no NaT"
if end == 3:
has_nat = "with NaT"
for dtype in ["int64", "int32", "uint64"]:
try:
with warnings.catch_warnings(record=True) as record:
s.iloc[:end].astype(dtype)
except Exception as err:
res = str(err)
else:
if record:
res = "works (+warning)"
else:
res = "works"
results.append((dt_dtype, dtype, has_nat, res))
df = pd.DataFrame(results, columns=["dtype", "target_dtype", "has_nat", "result"])
print(df.pivot(index=["dtype", "target_dtype"], columns=["has_nat"], values="result").to_html()) |
It gets more complicated yet: Series.astype dissallows casting to int32, but DatetimeIndex and DatetimeArray treat any numpy integer dtype as i8. |
Apparently we've got relevant astype behavior defined in DatetimeLikeArrayMixin.astype, astype_nansafe, and astype_array. Thoughts for the eventual behavior:
|
I personally don't see why such restriction is needed (if we make that cast safe, so checked for overflow). Now, for Series, that is more or less the status quo, so that is certainly OK as the starting point (except for "uint64", which currently is allowed). |
deprecation undone. |
removing milestone |
We want internal consistency between Series and DTI/DTA, which means either loosening Series or tightening DTI/DTA. I am averse to loosening Series on the grounds that all of these are semantic gibberish entirely dependent on implementation details. i8 is reasonable to allow only because we allow the conversion in the other direction. updateNever mind. I'm choosing to care about consistency and forget about the rest. |
Sidenote: I also think the "casting to non-int64" dicussion is not that important. From the table above (#45034 (comment)), it seems that in pandas 1.0-1.2 casting to int32 actually worked for tz-aware data, and it started raising an error in pandas 1.3 (for Series at least). And I can't remember any one complaining about this (of course tz-aware might only be the smaller subset of datetime usage). |
is there a missing negative here? if not and you do think its important, then i propose you open a PR and as long as the behavior is consistent ill give it a thumbs up |
Yes, certainly :) |
It also extends to non-int dtypes. For example, we allow casting from float to datetime64/timedelta64, but then casting back doesn't work: >>> pd.Series([1.0, 2.0], dtype="float64").astype("timedelta64[ns]")
0 0 days 00:00:00.000000001
1 0 days 00:00:00.000000002
dtype: timedelta64[ns]
>>> pd.Series([1.0, 2.0], dtype="float64").astype("timedelta64[ns]").astype("float64")
...
TypeError: Cannot cast TimedeltaArray to dtype float64 Which creates some inconsistency (why not allow to cast back to float if we allow casting from float?). But of course could also go towards disallowing the float->datetimelike cast to solve that inconsistency. |
side-note: We allow this for Series but not for Float64Index.
I find this reasonable in this particular case, but I'm wary of transitivity-based arguments more generally bc it would mean that since we allow Series[dt64].astype(int64).astype(td64) we should then allow Series[dt64].astype(td64) |
I think there is still a clear difference between both cases, which can be enough reason to allow the direct cast in the one case but not in the other. |
This could tie in to the "safe" discussion if coercion behavior is also lumped in to that (which on the last call seemed popular). e.g. the _from_sequence_not_strict and maybe_cast_pointwise_result are both a bit kludgy, might be handle-able by such a keyword. |
just found that trying to ArrowExtensionArray with datetime-like dtype can't be cast to int64. Only tangentially relevant, but need to write it down somewhere |
More for the pile: if we allow |
Probably yes? Unless we would move away of the idea that |
Another tangentially related datapoint: we have special-casing in
AFAICT this exists to make
I have no problem with disallowing the IntervalIndex.astype here, but it falls into the "allow all of them or none of them" category. |
Can you give an actual example? It's a bit hard to interpret those snippets out of context |
But bc this special-casing is done specifically inside Index.astype, we also have:
|
Why do you think this would only have been done for IntervalIndex? Because we only have tests for IntervalIndex that covers this? |
Most of the tests that hit this are for IntervalIndex. Now that I look at the blame, the check specific to needs_i8_conversion dtypes was added https://github.com/pandas-dev/pandas/pull/18937/files#diff-04d55a40a3293df94601d8b4aff4babebe4c1532d8174692bdef7f5bcb12c33fR315 and before that it would raise but looked like a catch-all.
Agreed. |
The original deprecation happened in #38544
This comment is from #22384 (comment), moving it here to a separate issue.
But, on the specific datetime -> integer deprecation:
astype
, but then point people to theview
instead. There are usecases where you need the integers (eg if you want to do some custom rounding, or need to feed it to a system that requires unix time as integers, ...), and personally I would rather have users go toastype
thanview
(becauseastype
is the more standard method for this, + if we would go with copy-on-write, this gets a bit a strange method ...)In addition, using
view
will actually error for non-equal size bitwidth (astype
actually as well, but that's something we can change, while forview
that is inherent to the method). Andview
can also silently overflow if converting to uint64, while forastype
we could check for that. In general, I seeview
as an advanced method you should only use if you really know what you are doing (and in general you don't really need in pandas, I think)There is then some follow-up discussion in the issue below #22384 (comment)
I would personally propose to keep allowing
astype()
for datetime64 -> int64, and not steer users toview()
for this.cc @jbrockmendel
The text was updated successfully, but these errors were encountered: