Skip to content

BUG/API: data readers should return missing data as NaN rather than warn #8433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gayodeji opened this issue Oct 1, 2014 · 3 comments · Fixed by #8743
Closed

BUG/API: data readers should return missing data as NaN rather than warn #8433

gayodeji opened this issue Oct 1, 2014 · 3 comments · Fixed by #8743
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@gayodeji
Copy link

gayodeji commented Oct 1, 2014

I have pandas version: pandas-0.13.1-py2.7-win32.egg

Routine pandas.io.data._dl_mult_symbols has an "except IOError" where the following is executed:
stocks[sym] = np.nan

The next thing that happens is:
try:
return Panel(stocks).swapaxes('items', 'minor')

However, if any of the values in stocks equals nan, then "return Panel(stocks).swapaxes('items', 'minor')" throws an error that is not handled and terminates the script.

I discovered this by calling pandas.io.data.get_data_yahoo(['EQQQ.F'],'20140930','20140930',3,0.001,False,False,5)

@jreback
Copy link
Contributor

jreback commented Oct 1, 2014

in 0.14.1 and > this doesn't return any data (and returns a nice exception).
It works just fine for example with symbol of ['IBM']
I do recall a similiar issue, but cannot reproduce.

Pls reopen / comment if this is still an issue.

@gayodeji
Copy link
Author

gayodeji commented Oct 2, 2014

Thanks, jreback for the very quick reply.
I suspect you may have misunderstood the issue. In any case, in v0.14.1 I still see the same problem.

[Note, this is my 1st time working with github or reporting an issue, so bear with me if there's any conventions I am unaware of.]

The problem is that in stocks, if there is even only 1 value equal to nan, then the Entire set of results is discarded and an error is returned.
What would be better, is if the good data was returned, and only the nan data was discarded.

For example, the below works. I indicate lines I changed compared to v0.14.1.

def _dl_mult_symbols(symbols, start, end, chunksize, retry_count, pause,
                     method):
    stocks = {}
    for sym_group in _in_chunks(symbols, chunksize):
        for sym in sym_group:
            try:
                stocks[sym] = method(sym, start, end, retry_count, pause)
            except IOError:
                warnings.warn('Failed to read symbol: {0!r}, omitting from '    # <---
                              'data set.'.format(sym), SymbolWarning)           # <---
                pass # <-- More a hack than a proper fix, 
                         # but I would rather leave this to experienced people than suggest amateur solutions.

    try:
        return Panel(stocks).swapaxes('items', 'minor')
    except AttributeError:
        # cannot construct a panel with just 1D nans indicating no data
        raise RemoteDataError("No data fetched using "
                              "{0!r}".format(method.__name__))

@jreback
Copy link
Contributor

jreback commented Oct 2, 2014

@gayodeji actually this reproduces correctly by passing ['IBM','EQQQ.F'] iow at least 1 valid symbol (these raise occurs if all symbols are invalid, but that's a separate issue).

Ok, thought we had an issue for this somewhere.

Going to fix the title. Your soln is not actually what we want here, rather returning a Panel with missing values for the non-returned data.

Pls feel free to work on this.

@jreback jreback reopened this Oct 2, 2014
@jreback jreback added this to the 0.16 milestone Oct 2, 2014
@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 2, 2014
@jreback jreback changed the title pandas.io.data._dl_mult_symbols doesn't handle its own errors well BUG/API: data readers should return missing data as NaN rather than warn Oct 2, 2014
@jreback jreback modified the milestones: 0.16, 0.15.1 Oct 7, 2014
davidastephens added a commit to davidastephens/pandas that referenced this issue Nov 6, 2014
@jreback jreback modified the milestones: 0.15.1, 0.16.0 Nov 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants