Skip to content

CLN: Series.asof uses reindex GH10343 #10873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 0 additions & 16 deletions pandas/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1316,22 +1316,6 @@ def asof(self, label):
loc = loc.indices(len(self))[-1]
return self[loc]

def asof_locs(self, where, mask):
"""
where : array of timestamps
mask : array of booleans where data is not NA

"""
locs = self.values[mask].searchsorted(where.values, side='right')

locs = np.where(locs > 0, locs - 1, 0)
result = np.arange(len(self))[mask].take(locs)

first = mask.argmax()
result[(locs == 0) & (where < self.values[first])] = -1

return result

def order(self, return_indexer=False, ascending=True):
"""
Return sorted copy of Index
Expand Down
37 changes: 15 additions & 22 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2461,6 +2461,10 @@ def asof(self, where):

If there is no good value, NaN is returned.

Note that this is really just a convenient shorthand for `Series.reindex`,
and is equivalent to `s.dropna().reindex(where, method='ffill')` for
an array of dates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array of dates" -> is this now still restricted to dates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string to me means something like: When it's an array of dates, that long formula is an equivalent way of doing it. I'm not familiar with the issue itself tho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with the new implementation this equivalence holds for all index types I think.

The only difference is that asof tries to parse strings as dates, and reindex not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine we should either check that it's a date index, or do some more checking around the string coercion - otherwise you could get some funky errors if you have a string based index and you use pass in a string as where

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianR That's a good catch, thanks! Previously it was date only, so the conversion was straightforward, but as you point out we do now have to check for the possibility of a string index.


Parameters
----------
where : date or array of dates
Expand All @@ -2476,29 +2480,18 @@ def asof(self, where):
if isinstance(where, compat.string_types):
where = datetools.to_datetime(where)

values = self.values

if not hasattr(where, '__iter__'):
start = self.index[0]
if isinstance(self.index, PeriodIndex):
where = Period(where, freq=self.index.freq).ordinal
start = start.ordinal

if where < start:
return np.nan
loc = self.index.searchsorted(where, side='right')
if loc > 0:
loc -= 1
while isnull(values[loc]) and loc > 0:
loc -= 1
return values[loc]

if not isinstance(where, Index):
where = Index(where)

locs = self.index.asof_locs(where, notnull(values))
new_values = com.take_1d(values, locs)
return self._constructor(new_values, index=where).__finalize__(self)
is_scalar = True
where = [where]
else:
is_scalar = False

ret = self.dropna().reindex(where, method='ffill')

if is_scalar:
return ret.iloc[0]
else:
return ret

def to_timestamp(self, freq=None, how='start', copy=True):
"""
Expand Down
20 changes: 0 additions & 20 deletions pandas/tseries/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,26 +318,6 @@ def _to_embed(self, keep_tz=False):
def _formatter_func(self):
return lambda x: "'%s'" % x

def asof_locs(self, where, mask):
"""
where : array of timestamps
mask : array of booleans where data is not NA

"""
where_idx = where
if isinstance(where_idx, DatetimeIndex):
where_idx = PeriodIndex(where_idx.values, freq=self.freq)

locs = self.values[mask].searchsorted(where_idx.values, side='right')

locs = np.where(locs > 0, locs - 1, 0)
result = np.arange(len(self))[mask].take(locs)

first = mask.argmax()
result[(locs == 0) & (where_idx.values < self.values[first])] = -1

return result

def _array_values(self):
return self.asobject

Expand Down