-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: revisit adding datetime-like ops in Series #7207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @rosnfeld continue the discussion here (but we'll consider for 0.14.1) |
i think |
I agree with @mtkni that adding a The problem in mindset with the datetime accessors, when you have a datetime column in a dataframe, is that you have to first make an index of it before you can access datetime fields. If you don't really know the background, this is not logical. "Why should you make an index of it? If I only need the day values?" Whether this is with I also agree that you 'expect' at first sight that a series method acts on the values, not on the index (for most methods, you have of course index-specific methods). So I think it is good for now to disable this (as you did in #7206) until we decide on the API. |
What would you actually think of having the attributes act on the Series values? (so having |
that's what I had originally; but it IS ambiguous if you have a datetime index AND datetime values on a Series (not that common but common enough I guess). |
In that case it is indeed a bit ambiguous, but I think less ambiguous as the other way around (the attributes acting on the index). And it would also be useful functionality to have such attributes on the datetime64 values of a series/column. |
ahh..you want them to act on the VALUES? hmm, don't liek the |
Just a comment on my motivation here (I can't speak for @mtkni, though curious to hear other use cases): I have worked with various datasets with multiple datetime columns, where none of them should really be the "index", so I find this to be somewhat common. E.g. humanitarian funding data might have various "state transition" dates for each entry (date funding was pledged, date it was transferred, date it was deployed towards a project, etc). Or a big dataset of projects might have start and end dates. So it's not that I'm creating standalone Series with date values, but the columns of a DataFrame often fit the bill. Maybe I want to extract the year or month from each of these columns, or maybe somebody would want to tz_localize/tz_convert. Writing Requiring conversion to an index is surprising, as I (and likely others who don't understand the code as well) don't understand what is "special" about an index (or special about datetimes). I'd be curious to know how others view this. |
@jreback yes indeed the values :-) And that having to convert it to an index is strange, is what I wanted to say in my comment above (#7207 (comment)), as also @rosnfeld argues. For me it is also rather common to have datetime values in columns, eg start and end times of measurements. I agree that having a datelike index and a non-datelike values is probably by far the more common use case, for that case it is not that much more work (and more explicit) to type To make it concrete, this would give something like this:
|
@rosnfeld ok, so you like the idea of using datetime-like ops (e.g. year,month,seconds...etc) on the VALUES of a series (e.g. rather that having them act on the index (which so going back to common methods/properties for index/series (so they act similary, a good thing). Note that their are currently several methods that DO NOT DO this, e.g. note also that |
Just to confirm, with a If so then +1 from me on operating on the values. |
Instead of adding Not having looked at DatetimeIndex internals very carefully, I'm not sure how painful this would be. Of course, better would be to try to fix datetime64 upstream in numpy... |
@shoyer not sure what u mean by datetime64 quirks? only issue is how to unambiguously do things with the Series values. (eg Series.day) or the index |
I know DatetimeIndex fixes the numpy quirks (comparison, casting, NaT, etc.) and provides some nice extra functionality (like the .day properties). My point is that most of these features are actually entirely distinct from the index part, so you can imagine exposing those in something like a "DatetimeArray" which makes the index part optional. DatetimeIndex would then be a relatively shallow wrapper around DatetimeArray. I guess this would be API breaking, but if Maybe this is not terribly useful for pandas because creating an index larger than a certain size only bothers to create the index lookup table on demand anyways. |
it's possible that I could simply have Series.values return a DatetimeIndex if the dtype is correct (in fact other dtypes eg Categorical and Sparse do this) This might work and is not really a breaking API change - not need to create another class for that reason that said it IS natural to simply use Series.day which applies to the values of the Series the only issue I have is that doing Series.day on say a float dtype series will have to raise a TypeError |
I guess the reason to (possibly) make another class would be so this same sort of thing works even with multi-dimensional datetime arrays, e.g., on a DataFrame with datetime values ( |
I am not too fond of returning a DatetimeIndex for And if you want to have these things for a whole dataframe of datetimes, you can always use |
FWIW how about s.values.asdatetime().day . For the uniinitiated the "index" information is not interesting, and I assume that if the underlying array is not datetime64 this throws an On Sat, May 24, 2014 at 2:33 PM, Joris Van den Bossche <
|
@jonblunt Unfortunately I agree that seeing a DatetimeIndex could be somewhat confusing, which is why I suggested a more generic DatetimeArray. However, |
Would it make sense to make a Series with datetime values a subclass of Series (with DatetimeIndex date-y methods?)... Or perhaps a property as datetime? That way we tab complete the methods.
|
pushing to 0.15; would be an API change |
Not sure if this is related, but #7217 just introduced |
I think easiest / best way is simply add:
any takers for a PR? |
I like There are now at least |
its also easy to make |
ok, pls take a look at #7953 which implements the |
@jreback this is at once also my comment on the PR |
I had the same reaction - it would be nice if the tab-complete just showed the timeseries-specific attributes. |
not hard to just show only datetimelike attributes.... |
|
Reading the tab completion in #7953, it would be nice if there would be a way to hide the "none-Api" methods. Kind of like a wrapper object which would created by passing in a list of attributes which are accessable (see this SO question) + adding these methods to tab completion (no idea how that works but someone mentioned it above) |
This does look nicer (latest version of #7953)
And its specific to the type of wrapped delegate
|
Should the return types be arrays? I kind of thought they would be Series, like with the 'str' methods. |
they are arrays now (that is unchanged) |
e.g. generally
versus currently
|
we could probably return an |
Why not return a new Series with the same index as the original series, as @rosnfeld suggests? That does seem a little more consistent with how most Series operations work. |
hmm. These are actually index attributes, but easy enough to return a Series. sure that makes sense. |
easy enough
|
@jreback yes, the idea is to hide the (surprising) |
+1 on returning Series. That is the point of these methods I think: it are attributes/methods that act on the Series, so should return Series. That it does a pass-by through DatetimeIndex under the hood is an implementation detail. |
@shoyer I think If you have a sketch of it, pls post. |
@jreback Here is the general idea: As I mentioned above, you could even return a I think that it would be a nice idea in theory.... but again it's not my top priority. |
I would suggest going the other way by adding a |
@JanSchulz |
@JanSchulz let's leave the Categorical stuff to #7768 |
pls have a look at updated #7953. removed a lot of the clutter from the previous impl (which did a form of redirection), and added docs. |
related is #7146, #7206
maybe provide a
to_index()
function and/or a class-like accessor.values_to_index
(too long though)The text was updated successfully, but these errors were encountered: