-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: Index.map adding back tz to tz-agnostic result #57475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
new_values = maybe_cast_pointwise_result( | ||
new_values, self.dtype, same_dtype=same_dtype | ||
) | ||
if self.inferred_type != "datetime64": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what other dtypes might arise here - need to check
I'm pretty sure the problem is in DatetimeArray._from_scalars, where we need to check that dont have tzawareness-mismatch. instead of
something like
is_naive_datetime_array doesn't actually exist but seems like it should. |
Thanks @jbrockmendel - this is much better. One thing I'm running into is that |
I'm trying to get rid of uses of lib.infer_dtype as it is very rarely the right tool to reach for. Seems like |
# TODO: if dtype is passed, check for tzawareness compat? | ||
# TODO: require any NAs be valid-for-DTA | ||
# TODO: if dtype is passed, check for tzawareness compat? | ||
if not lib.is_datetime64_array(scalars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need this (or something like it) because e.g. lib.is_datetime_with_singletz_array
does not check if there are datetimes:
print(lib.is_datetime_with_singletz_array(np.array([1, 2, 3])))
# True
However, I do not currently understand the behavior differences between this and the previous lib.infer_dtype
and need to look into this more. Any guidance here is much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we get that behavior from is_datetime_with_singletz_array bc it implicitly assumes it is called from maybe_convert_objects, where it will only get called if the first non-NaT it sees is a datetime with a tz.
I'm not a fan of infer_dtype bc the string returns are clunky (and hard to reason about, e.g. when do you expect "datetime" vs "datetime64") and fragile when we add new return values (e.g. "integer-na"). Also in cases when you only care about datetimes, it will not short-circuit when it sees a non-datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - makes perfect sense to move away from infer_dtype
. For the old behavior, I'm seeing the use of infer_dtype
allow the following to not raise on a non-empty input:
- If the
scalars
has the attributetype
,kind
,name
, orbase
set to one of"datetime64[ns]"
,"M"
,datetime
is_datetime64_array
returns Trueis_datetime_array
returns True
As far as I can tell, the input scalars
here must be an ndarray, Index, or ExtensionArray, but I don't feel confident about this from the code path in Index.map
.
With this, I'm wondering if the following is a good equivalent:
if (
not is_datetime64_any_dtype(scalars)
and not lib.is_datetime64_array(scalars)
and not lib.is_datetime_array(scalars)
):
raise ValueError
DataFrame.groupby
regression when group labels of typepd.Timestamp
#57192 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.It seems to me we should skip casting back to the caller's dtype when the caller dtype is datetime, since these are reliably represented as objects (unlike ints, which loses the size). But datetimes/tzs are definitely not in my wheelhouse.
cc @jbrockmendel