-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.array([timedelta_like_strings]) should infer TimedeltaArray #33558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why should it infer that? If passing strings., I think inferring a string type is best? And for conversions from string to timedelta64, we also have |
I think that'd be too magical. I think the result dtype of |
|
I don't think we necessarily need to copy "mistakes" from Series/Index constructors into BTW, |
Mmm I missed the NaT originally. That does change things, but I think it moves me to returning something like an object-dtype PandasArray. What would your proposed behavior for something like pd.array(['2 days', pd.NaT, 'a']) be? A case where a value isn't valid for to_timedelta. |
The presence of NaT could, besides the option to coerce to datetimelike as you propose, also be a reason to not infer string dtype though, if we want to differentiate different missing value sentinels. This is actually also what happens with ints. You get object dtype instead of float or datetimelike:
EDIT: so yes what Tom also said ;) |
If automatically coercing strings, the question also becomes which formats do automatically coerce and which not (apart from totally different strings as in Tom's example). For example with datetime64:
The one coerces, the other not, while both work fine in |
I'm coming at this from a high-level view of "pd.array/pd.Index/pd.Series" should share code and behavior (#27460). If there is not consensus on that, then this topic is moot. The "01/12/2012" example is a good one. That strikes me as surprising behavior in the Series constructor. |
To be clear, I am fully on board that in the long term, those should share code (and basically Index/Series should just call pd.array under the hood, or something close to that). But we could also deprecate the behaviour of inferring strings like that in Series/Index, in the course of unifying things. |
Conditional on the behavior+code converging, i have no real objection to that. |
+1 to consistency / sharing code. I think the only intentional difference is that Are we agreed that the expected dtype for these mixed cases (strings and NaT) is object? |
I think that is best. An option could also be to raise an error (I don't know how easy it is code-wise), and requiring users to specify dtype=object explicitly if they want object dtype. In the idea that, for this specific case (with people passing pd.NaT), the result wil almost never be what you want, so better to notify people of it. |
For the example with "a" in it, yes. For the datetime/timedelta-like strings, i'll need to give some thought to how id like Series/Index to behave longer-term. |
Reopening this because I think there is still agreement that the current behaviour of inferring string dtype is not the best. I think the idea would be to rather return object dtype? |
I'm not sure what the current behavior is, but if possible I'd like to keep the inference of |
See the example in the top post: there is a
If a list is provided, and not dtype is specified, isn't "inferring based on values" the only thing we can do? |
yah, i closed bc i thought there was consensus that the title of the issue was wrong; i.e. you guys convinced me. |
Ah, so Joris you reopened it to go the other way? To deprecate Series / Index's behavior of inferring? |
Sorry for the confusion here ;) (and ignore the title of the issue for a moment, that's indeed in need of an update if we agree what this issue is about) The only reason that I reopened this is because (I thought) our earlier discussion (@TomAugspurger see your comment above at #33558 (comment) which says "Are we agreed that the expected dtype for these mixed cases (strings and NaT) is object?") concluded that the current behaviour (inferring this mixed case to string) is not wanted (we want to infer mixed case to object instead).
Well, I was not actaully thinking about that, but now you say it .. ;) That's maybe indeed what we should do. There are basically two "different" behaviours to consider (and when reopening this issue I was actually only thinking about the first):
Ideally, we want to have both of those consistent, agreed? For the first one, I think we can still simply change, because |
The text was updated successfully, but these errors were encountered: