Skip to content

Google Finance DataReader returns columns with object type instead of float64 #8980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
femtotrader opened this issue Dec 3, 2014 · 7 comments · Fixed by #9025
Closed

Google Finance DataReader returns columns with object type instead of float64 #8980

femtotrader opened this issue Dec 3, 2014 · 7 comments · Fixed by #9025
Labels
Milestone

Comments

@femtotrader
Copy link

Hello,

Google Finance DataReader returns columns with object type instead of float64

In [112]: import pandas.io.data as web

In [113]: import datetime

In [114]: start = datetime.datetime(2010, 1, 1)

In [115]: end = datetime.datetime(2013, 1, 27)

In [116]: f=web.DataReader("F", 'google', start, end)

In [117]: f.dtypes
Out[117]:
Open       object
High       object
Low        object
Close     float64
Volume      int64
dtype: object
@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

so #8792 validated this for Close. Didn't realize that OHL were still having issues. (Its because of a minus sign that I think is messed up). So will mark as a bug. Appreciate a PR to convert all of this data automatically on import.

@jreback jreback added this to the 0.16.0 milestone Dec 3, 2014
@femtotrader
Copy link
Author

I understand that you could be interested by a PR but I have done a totally different approach about DataReader and pandas/io/data.py needs a lot of change. So I don't know if it can be accepted.

I wrote a DataReaderBase class.

DataReaderGoogleFinanceDaily inherits from DataReaderBase
(so others DataReader's like DataReaderGoogleFinanceIntraday, DataReaderYahooFinanceDaily, DataReaderFRED...)

I used a factory pattern using DataReaderFactory class where every DataReaders are "registered"

API changed slightly and I'm sure you will not like this

it looks like

data = MyDataReader("GoogleFinanceDaily").get(symbol, start, end)

I can probably work to unify to actual DataReader API.

With this kind of idea it's now very easy to use requests (and have cache for queries).
see #8713

About this particular issue I just defined this function

def to_float(x):
    try:
        return(float(x))
    except:
        return(np.nan)

and did

DATE_COL = 'Date'

OPEN_COL = 'Open'
HIGH_COL = 'High'
LOW_COL = 'Low'
CLOSE_COL = 'Close'

VOLUME_COL = 'Volume'

LST_PRICE_COLS = [OPEN_COL, HIGH_COL, LOW_COL, CLOSE_COL]

for col in LST_PRICE_COLS:
    df[col] = df[col].map(to_float)

I need to work now on classes DataReaderFamaFrench, DataReaderYahooFinanceOptions, DataReaderWorldBank

I'm comparing my code returns to Pandas DataReader returns using unit tests (nosetests) and that's the reason why I'm catching some bugs around DataReader.

@jorisvandenbossche
Copy link
Member

@femtotrader Have you seen #8961 ?
It seems like you would probably be interested, given your work and ideas you outline above!

@femtotrader
Copy link
Author

Thanks for this link to this issue

I can send what I have done.

I have done a datareaders directory (with __init__.py) where base.py is stored (DataReaderBase)
in this directory there is a file per source (Google Finance, Yahoo Finance, FRED, ...)
and a tools.py (everything else)

As every DataReader object inherits from DataReaderBase it's quite easy to decide whether we fetch data using requests and so requests_cache or urlopen.

@femtotrader
Copy link
Author

An other solution to fix this issue is to pass na_values='-'to pd.read_csv

@femtotrader
Copy link
Author

@davidastephens
Copy link
Contributor

The problem is that F has some missing data:

2012-08-01 - - - 9.24 0

I'll submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants