Skip to content

BUG: Fix pandas.io.data.Options for change in format of Yahoo Option page #8631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 5, 2014

Conversation

davidastephens
Copy link
Contributor

This should fix #8612.
Will need a sample of a during the week HTML to make sure that the underlying price and quote time work with that format. I'll update for that on Monday.

@jreback jreback added this to the 0.15.1 milestone Oct 25, 2014
return expiry

@staticmethod
def _third_saturday(date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this type of calendar functinaility in pandas/tseries/holidays.py. pls use that (maybe create a 'regular' calendar for this).

@BMeridian
Copy link

Yahoo has weekly and month end options. So, please dont assume that expiry happen on the third friday of the month.

@davidastephens
Copy link
Contributor Author

I'm not assuming that. If you provide only a month and year, it defaults to the third friday expiry. If you provide an expiry date, it will find the closest one to the date you provide.

@BMeridian
Copy link

Before year-month, gave you all expirations within that month.

From:
http://stackoverflow.com/questions/5493514/webscraping-with-beautifulsoup-or-lxml-html

data['models']['applet_model']['data']['optionData']['expirationDates']

gives you all expirations... These need Unix time stamp, Then no expiration dates are missed..

@davidastephens
Copy link
Contributor Author

I guess its an API question. Maybe giving a month and year should provide all the expiries for that month. On the other hand, month and year have had a deprecation warning for a long time, so maybe this should be the catalyst to drop it. We could add another method that gets all options for a given month.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2014

@dstephens99 can you put a release note in v0.15.1.txt (I guess its a Bug fix). pls squash as well.

@davidastephens
Copy link
Contributor Author

Will do. Any thoughts on whether we should put a note in the API change section? Calling methods using a year and month reference used to provide all the data for expiries within that year / month. Now, with Yahoo splitting it on different pages, I default it to the standard (third Saturday) expiry.

However, using year and month has been deprecated for a year and a half. At what point do we remove deprecated options from the code base? This may be a good catalyst for removing it.

It may be worthwhile to add either a method of an parameter which get all data for a given month. I'm happy to add that if someone wants it.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2014

does it show a Deprecation warning now?
can u show a small example of what u mean? u r defaulting to the next expiry of no month is specified?

@davidastephens davidastephens force-pushed the issue8612 branch 3 times, most recently from f0e7530 to 54ce3f2 Compare October 30, 2014 04:50
@davidastephens
Copy link
Contributor Author

Yes, it shows a deprecation warning now.

It defaults to the next expiry if no date arguments are given. If you give a date, it gives the you next expiry after that date. If you give a month and year, it gives you the 'standard' third saturday expiry.

Here is an example:

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
aapl.get_call_data(month=11,year=2014).iloc[0:5,0:5]
pandas/io/data.py:939: FutureWarning: month, year arguments are deprecated, use expiry instead
  " instead", FutureWarning)
Out[4]: 
                                             Last    Bid    Ask  Chg PctChg
Strike Expiry     Type Symbol                                              
47.5   2014-11-22 call AAPL141122C00047500  58.65  58.85  60.90    0  0.00%
50.0   2014-11-22 call AAPL141122C00050000  53.90  56.35  58.40    0  0.00%
55.0   2014-11-22 call AAPL141122C00055000  44.50  51.35  53.05    0  0.00%
60.0   2014-11-22 call AAPL141122C00060000  36.85  46.35  48.05    0  0.00%
65.0   2014-11-22 call AAPL141122C00065000  34.95  41.35  43.05    0  0.00%



aapl.get_call_data().iloc[0:5,0:5]
Out[6]: 
                                             Last    Bid    Ask   Chg   PctChg
Strike Expiry     Type Symbol                                                 
75     2014-10-31 call AAPL141031C00075000  31.15  32.15  32.55  0.00    0.00%
80     2014-10-31 call AAPL141031C00080000  26.25  27.15  27.50  0.00    0.00%
85     2014-10-31 call AAPL141031C00085000  22.20  22.20  22.45  0.63   +2.92%
86     2014-10-31 call AAPL141031C00086000  19.00  20.35  22.40  0.00    0.00%
87     2014-10-31 call AAPL141031C00087000  20.15  20.15  20.45  1.95  +10.71%

import datetime
expiry = datetime.date(2014,11,10)

aapl.get_call_data(expiry=expiry).iloc[0:5,0:5]
Out[8]: 
                                             Last    Bid    Ask  Chg PctChg
Strike Expiry     Type Symbol                                              
80     2014-11-14 call AAPL141114C00080000  22.75  26.35  28.40    0  0.00%
84     2014-11-14 call AAPL141114C00084000  20.99  22.35  24.40    0  0.00%
85     2014-11-14 call AAPL141114C00085000  21.30  21.35  23.40    0  0.00%
86     2014-11-14 call AAPL141114C00086000  16.08  20.35  22.45    0  0.00%
87     2014-11-14 call AAPL141114C00087000  18.10  19.35  21.45    0  0.00%

@jreback
Copy link
Contributor

jreback commented Oct 30, 2014

I know this might be a bit annoying, but is it possible to return the data for all expiry dates if a year/month is specified (e.g. return more data rather than less)?

Is their an easy way to get expiry dates? (in a particular date range)?

@jreback jreback modified the milestones: 0.15.1, 0.15.2 Oct 30, 2014
@davidastephens
Copy link
Contributor Author

Yes, we can do that. It will just require up to 5 page loads, so a bit slower.

I have a private method that gets the listed expiry dates from the yahoo page, I will make that one public.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2014

hmm ok what you have is prob reasonable then

can u rebase and maybe out a note in the whatsnew (just that format changed) and that the year month behavior is now slightly I different (but is deprecated anyhow) (and out same note in remote.rst)

@jreback
Copy link
Contributor

jreback commented Oct 31, 2014

@jorisvandenbossche ?

@jorisvandenbossche
Copy link
Member

@dstephens99 I am not familiar with this part of pandas, but is there now an actual change in behaviour for some cases? (different output than before with certain arguments)
As the whatsnew entry now only says it fixed the yahoo format change

@davidastephens
Copy link
Contributor Author

@jorisvandenbossche
Yes, there is a difference in behaviour. I've updated the whatsnew entry.

@davidastephens davidastephens force-pushed the issue8612 branch 2 times, most recently from e2dc97d to f7eb832 Compare November 2, 2014 01:55
@@ -146,6 +146,27 @@ API changes

s.dt.hour

- As a result of a change in Yahoo's option page layout, ``Options`` methods now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a ::note, and pls put a reference to the remote_data docs section as well (e.g. see docs :ref:.....``)

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

@dstephens99 minor docs changes, otherwise looking good. pls png when green

@davidastephens
Copy link
Contributor Author

No prob. I refactored it already so the function that does all the work takes lists. Will get a new version out today.

@davidastephens davidastephens force-pushed the issue8612 branch 3 times, most recently from 48108f4 to 8e4eb7e Compare November 2, 2014 23:41
@davidastephens
Copy link
Contributor Author

@jreback
Green with discussed new features. Note that this also closes #8052.

@@ -758,8 +716,9 @@ def get_call_data(self, month=None, year=None, expiry=None):

Parameters
----------
expiry: datetime.date, optional(default=None)
The date when options expire (defaults to current month)
Expiry: Option expiry, datetime.date or list-like object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be lowercase

…page

ENH: Automatically choose next expiry if expiry date isn't valid

COMPAT: Remove dictionary comprehension to pass 2.6 test

BUG: Add check that tables were downloaded.

ENH: Replace third_saturday function with Pandas offset

BUG: Check to make sure enough option tables are downloaded.

TST: Add sample options page during market open.

DOC: Add bug fix report to v0.15.1.txt for data.Options

BUG: Ensure that the value used for chopping is within the range of strikes.

TST: Add test for requesting out of range chop

BUG: Fix missing underlying price and quote time in first process data

BUG: Fix AM/PM on quote time

ENH: Refactor to expose available expiry dates

DOC: Update documentation for Options given Yahoo change.

DOC: Update documentation for Options given Yahoo change.

BUG: Undo accidental deletion of privat emethods

ENH: Add ability to use list as expiry parameter.

Also has year & month return all data for the specified year and month.

TST: Remove tests for warnings that are no longer in use.

DOC: Update docstrings of Options class.
@davidastephens
Copy link
Contributor Author

@jreback
Fixed those docstring comments, also pd.to_datetime in the validate expiry function.
I had a typo in my previous comment - this closes #8052, not #8053.

data for the selected month.

The ``month`` and ``year`` parameters have been undeprecated and can be used to get all
options data for a given month.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add this to the docs in remote_data.rst (the fact that this is possible, not that it is undeprecated).

Add documentation of month and year in remote_data.rst
@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

@jorisvandenbossche anything final on this?

@jorisvandenbossche
Copy link
Member

good for me!

jreback added a commit that referenced this pull request Nov 5, 2014
BUG: Fix pandas.io.data.Options for change in format of Yahoo Option page
@jreback jreback merged commit 5a58f04 into pandas-dev:master Nov 5, 2014
@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

thanks!

pls look at built docs on dev site
takes about 1 hr

@jorisvandenbossche
Copy link
Member

@dstephens99 There are errors in the docs (all the same): RemoteDataError: Unable to determine time of quote for page http://finance.yahoo.com/q/op?s=AAPL&date=1415318400

@immerrr
Copy link
Contributor

immerrr commented Nov 5, 2014

@jreback @jorisvandenbossche
Guys, did you consider moving some sub-packages to separate libraries? I mean, it will come at a cost of increased maintenance overhead, but untying release cycles should simplify bugfixes and speed up their propagation to users. In this particular case, given that yahoo may decide to change their UI arbitrarily, the users at any moment may be left with a choice either to wait for up to three months till the next release or to switch to master branch.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

@immerrr are you offering??

sure I think some parts of remote_data access could certainly be outsourced to a separate repo

@jorisvandenbossche
Copy link
Member

@immerrr exactly my thought also (eg now, yahoo broke right after 0.15 was released, so fix has to wait untill the next release and all yahoo related code does not work with 0.15). But let's discuss this is a separate issue

@jorisvandenbossche
Copy link
Member

@dstephens99 Small question about the expiry_dates that @jreback raised. It does now return a list of datetime.date values. But another option is to return Timestamps?
I am not having a real preference, and I don't know if there are any precedences in pandas.

@davidastephens
Copy link
Contributor Author

I used dates because there is no time associated with the expiry. When I'm comparing what was input to what dates are available, I need to compare dates with dates, not with datetimes or timestamps, unless I was sure the datetimes and timestamps had consistent times.

I'm happy to change it to timestamps, I'm not sure if there is a benefit/drawback either way, but would likely change it to dates in the comparison logic.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2014

@dstephens99 its not a big deal either way. Timestamps can represent both (dates and datetimes). And are generally used within pandas. But datetimes are converted whenever used anyhow. Its fine.

@jorisvandenbossche
Copy link
Member

indeed, it does not matter that much. I just wanted to raise it here for the case there would be a strong reason to use one of both, but as this is not really the case, we can just leave it as it is I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.io.data.Options is broken due to Yahoo options page change
5 participants