Skip to content

CLN: Series.asof uses reindex GH10343 #10873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bwillers
Copy link
Contributor

closes #10343 as discussed in #10345 , changed Series.asof to use reindex internally, and remove the unused leftovers from index classes. All existing tests in test_series.py:TestSeries.test_asof pass with the new implementation.

@shoyer
Copy link
Member

shoyer commented Aug 21, 2015

nice work!

@@ -2461,6 +2461,10 @@ def asof(self, where):

If there is no good value, NaN is returned.

Note that this is really just a convenient shorthand for `Series.reindex`,
and is equivalent to `s.dropna().reindex(where, method='ffill')` for
an array of dates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array of dates" -> is this now still restricted to dates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string to me means something like: When it's an array of dates, that long formula is an equivalent way of doing it. I'm not familiar with the issue itself tho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with the new implementation this equivalence holds for all index types I think.

The only difference is that asof tries to parse strings as dates, and reindex not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine we should either check that it's a date index, or do some more checking around the string coercion - otherwise you could get some funky errors if you have a string based index and you use pass in a string as where

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianR That's a good catch, thanks! Previously it was date only, so the conversion was straightforward, but as you point out we do now have to check for the possibility of a string index.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2015

I'd like to see this put in core/base.py/IndexOpsMixIn as this mixes into Series so would be no change there, and could replace the Index.asof as well.

Tests should then move to tests/test_base.py

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Clean labels Aug 21, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 21, 2015
@bwillers
Copy link
Contributor Author

@jreback Ok, I'll see what I can do.

@jreback
Copy link
Contributor

jreback commented Sep 1, 2015

can you update according to comments

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Sep 1, 2015
@bwillers
Copy link
Contributor Author

bwillers commented Sep 2, 2015

@jreback Yup, will get it done this week, sorry about the delay.

@bwillers
Copy link
Contributor Author

bwillers commented Sep 3, 2015

So I'm having a little difficulty combining this with Index.asof in the mixin, Index.asof seems to support some odd partial string matching for a DatetimeIndex, in particular (from test_index.py:TestIndex.test_asof_datetime_partial:

ipdb> idx
DatetimeIndex(['2010-01-31', '2010-02-28'], dtype='datetime64[ns]', freq='M', tz=None)
ipdb> idx.asof('2010-02')
Timestamp('2010-02-28 00:00:00', offset='M')

I don't think it's possible to get that result using reindex, (and I must admit I'm a little confused as to the use case where this is the desired behavior).

I can of course still move the Series and Index asof implementation into a mixin with an isinstance check, but there isn't really much common between them at present.

Incidently, I was unable to find anywhere outside its own tests where Index.asof is used in the code base.

@max-sixty
Copy link
Contributor

Where is the code that is creating that oddness? Happy to have a look. I couldn't find it in DatetimeIndex

@bwillers
Copy link
Contributor Author

bwillers commented Sep 3, 2015

@MaximilianR Far as I've been able to figure out the call chain, it goes Index.asof -> DatetimeIndex.get_loc -> DatetimeIndex._get_string_slice.

Presumably the intended behavior since it's explicitly tested for (see test_asof_datetime_partial in test_index.py). It is however somewhat jarring since the result from passing a string and passing a timestamp generated from that string differ:

In [6]: idx
Out[6]: DatetimeIndex(['2010-01-31', '2010-02-28'], dtype='datetime64[ns]', freq='M', tz=None)

In [7]: idx.asof('2010-02')
Out[7]: Timestamp('2010-02-28 00:00:00', offset='M')

In [8]: idx.asof(pd.Timestamp('2010-02'))
Out[8]: Timestamp('2010-01-31 00:00:00', offset='M')

@jreback
Copy link
Contributor

jreback commented Sep 3, 2015

@bwillers hmm, I don't think .asof should accept a non-qualified date, e.g. it must be the same freq as the dates or raise.

@bwillers
Copy link
Contributor Author

bwillers commented Sep 4, 2015

@jreback I think its a valid asof question ("what's the most recent value before or on 2010-02-01") but the fact that calling with a string returns a backfilled rather than forward filled value seems wrong. But then, there's an explicit test requiring this, so it seems likely this was added for some particular purpose I'm unaware of.

@shoyer
Copy link
Member

shoyer commented Sep 4, 2015

See here for discussion on asof's handling of partial string labels: #9258 (comment)

@max-sixty
Copy link
Contributor

@bwillers I can't understand how that example is intended behavior. Do you (anyone?) know why that might be intended behavior?

I think it's fine to drop the functionality of string coercion. If it didn't add material complication I would vote to keep it (i.e. whatever you pass into __getitem__ would be permissible), but not a strong view.

I can have a look at this in more detail this w/e if helpful.

@bwillers
Copy link
Contributor Author

bwillers commented Sep 4, 2015

@MaximilianR I do not know, I only discovered it a few days ago. Any insight is appreciated.

@jreback
Copy link
Contributor

jreback commented Oct 18, 2015

closing, pls reopen if you can fix according to comments

@jreback jreback closed this Oct 18, 2015
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: Series.asof
6 participants