Skip to content

DataReader with google source returns DataFrame with incorrect index name #8967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
femtotrader opened this issue Dec 2, 2014 · 1 comment
Labels
Bug Unicode Unicode strings
Milestone

Comments

@femtotrader
Copy link

Hello,

import pandas.io.data as web
import datetime

symbol = "AAPL"
end = datetime.datetime.now()
start = end - datetime.timedelta(days=10)
df = web.DataReader("AAPL", 'google', start, end)

df.index.name

is '\xef\xbb\xbfDate' it should be only 'Date'

\xef\xbb\xbf characters are invisible !

A quick and dirty solution is to change

def _retry_read_url(url, retry_count, pause, name):
    for _ in range(retry_count):
        time.sleep(pause)

        # kludge to close the socket ASAP
        try:
            with urlopen(url) as resp:
                lines = resp.read()
        except _network_error_classes:
            pass
        else:
            rs = read_csv(StringIO(bytes_to_str(lines)), index_col=0,
                          parse_dates=True)[::-1]
            # Yahoo! Finance sometimes does this awesome thing where they
            # return 2 rows for the most recent business day
            if len(rs) > 2 and rs.index[-1] == rs.index[-2]:  # pragma: no cover
                rs = rs[:-1]
            return rs

    raise IOError("after %d tries, %s did not "
                  "return a 200 for url %r" % (retry_count, name, url))

to

def _retry_read_url(url, retry_count, pause, name):
    for _ in range(retry_count):
        time.sleep(pause)

        # kludge to close the socket ASAP
        try:
            with urlopen(url) as resp:
                lines = resp.read()
        except _network_error_classes:
            pass
        else:
            rs = read_csv(StringIO(bytes_to_str(lines)), index_col=0,
                          parse_dates=True)[::-1]
            rs.index.name = 'Date'
            # Yahoo! Finance sometimes does this awesome thing where they
            # return 2 rows for the most recent business day
            if len(rs) > 2 and rs.index[-1] == rs.index[-2]:  # pragma: no cover
                rs = rs[:-1]
            return rs

    raise IOError("after %d tries, %s did not "
                  "return a 200 for url %r" % (retry_count, name, url))

so adding rs.index.name = 'Date'
fix the problem but it could have side effect for other datasource

Kind regards

@femtotrader femtotrader changed the title DataReader with google source return DataFrame with incorrect index name DataReader with google source returns DataFrame with incorrect index name Dec 2, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 2, 2014
@jreback jreback modified the milestones: 0.15.2, 0.16.0 Dec 6, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback
Copy link
Contributor

jreback commented Mar 7, 2015

closed by #9026

@jreback jreback closed this as completed Mar 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Unicode Unicode strings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants