Skip to content

df.at_time NotImplementedError {asof] #7873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hdascapital opened this issue Jul 29, 2014 · 12 comments
Closed

df.at_time NotImplementedError {asof] #7873

hdascapital opened this issue Jul 29, 2014 · 12 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Enhancement
Milestone

Comments

@hdascapital
Copy link

The asof functionality int the at_time function gives a NotImplemented Error

df1 = df.at_time(time(15, 0), asof=True)

asof is very useful. If implemented it would allow us to use the at_time functionality and work with data that might not be complete, or don't have homogeneous time stamps. It would enable us to get the "closest" observation to the desired time for subsetting.

Thanks

Hernan

NotImplementedError Traceback (most recent call last) <ipython-input-28-3c10b7e963c9> in <module>() ----> 1 df2 = df1.at_time(time(15, 0), asof=True) C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.pyc in at_time(self, time, asof) 2767 """ 2768 try: -> 2769 indexer = self.index.indexer_at_time(time, asof=asof) 2770 return self.take(indexer, convert=False) 2771 except AttributeError: C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tseries\index.pyc in indexer_at_time(self, time, asof) 1689 1690 if asof: -> 1691 raise NotImplementedError 1692 1693 if isinstance(time, compat.string_types): NotImplementedError:

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

this is a dupe of #3004.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

pls give a complete copy-pastable example

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

@hdascapital
you can easily do this by:

  • reindex to the full range (create whatever freq you want with date_range)
  • ffill
  • indexer_between_time

something like

df = DataFrame({'value' : [1,2,3],index=pd.DatetimeIndex(['20130101 2:00','20130101 4:00', '20130101 6:00']))
df = df.reindex(index=date_range('20130101',periods=24,freq='H').ffill()
df[df.index.indexer_between_time('3:00','3:00')]

@hdascapital
Copy link
Author

If I reindex I will destroy the value of the timestamps, nevermind the connection I have with other data liked to that index.

Let me complement my request. Sorry for the form, but this is my first time here.

Let's think of the dataframe
df = DataFrame({'value' : [1,2,3,4,5,6],index=pd.DatetimeIndex(['20130101 2:00', '20130101 3:00','20130102 2:00', '20130102 3:00', '20130103 1:58', '20130104 2:00']))

Let's say you want to get the the values at 2 pm for each day

df1 = df.at_time(time(2, 0), asof=False)

Then you would get 1, 3, 6,

However, imagine that you have a dataframe like the one in the example, but with all the data with the minutes in between, except for some reason you don't have the timestamp '20130103 2:00'. I was thinking that asof=True would help me get the closest non NaN value to the timestamp I was looking for. In this silly example, let's say the function would have picked up '20130103 1:58' as the closest and the answer would have been

1, 3, 5, 6.

Thanks for your help

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

not sure what you mean by reindex destroys the timestamps

In [14]: df
Out[14]: 
                     value
2013-01-01 02:00:00      1
2013-01-01 03:00:00      2
2013-01-02 02:00:00      3
2013-01-02 03:00:00      4
2013-01-03 01:58:00      5
2013-01-04 02:00:00      6

In [15]: x = df.reindex(date_range('20130101','20130105',freq='T')).ffill()

In [16]: x.iloc[x.index.indexer_between_time('2:00','2:00')]
Out[16]: 
                     value
2013-01-01 02:00:00      1
2013-01-02 02:00:00      3
2013-01-03 02:00:00      5
2013-01-04 02:00:00      6

@hdascapital
Copy link
Author

I got it. Amazing. Thanks a lot.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

that said, this could be implemented (essentially like that)

@hdascapital
Copy link
Author

@jreback One consideration, if you are going to implement this the way you wrote it. If you are handling financial data, some of it is business days, others is business days + sunday afternoon. When you do df.reindex, it doesn't take that into account and that distorts your data. I don't know if this is clear. Let me know. Thanks

@jreback
Copy link
Contributor

jreback commented Jul 30, 2014

Have a look here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects

you can construct the reindexer any way you want.

and this would be implemented internally (and I might actually do a subtraction of values instead), might be faster.

But this is way way down on the priority list. So, if you could like to take a stab, go for it.

@hayd
Copy link
Contributor

hayd commented Jul 30, 2014

Is asof supposed to get the nearest or the next? (is ffill only next?)

@hdascapital Like I said on ML if you can fix this at indexer_at_time it will propagate. This would make a good first PR :)

@jreback
Copy link
Contributor

jreback commented Jul 30, 2014

might be (or prob should be) a keyword for before/after/nearest

@jreback
Copy link
Contributor

jreback commented Nov 18, 2014

closing in favor of master issue #8845

@jreback jreback closed this as completed Nov 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants