Skip to content

more yahoo test errors #4182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Jul 9, 2013 · 12 comments · Fixed by #4184
Closed

more yahoo test errors #4182

cpcloud opened this issue Jul 9, 2013 · 12 comments · Fixed by #4184
Assignees
Labels
IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Jul 9, 2013

i love a good game of whack-a-mole.

https://travis-ci.org/pydata/pandas/jobs/8900734

@ghost ghost assigned cpcloud Jul 9, 2013
@jtratner
Copy link
Contributor

jtratner commented Jul 9, 2013

Is there a reason to actually download the data each time? Would it make
sense to store the data and then check that it generates:

  1. The appropriate url for the request.
  2. The appropriate output given various data (output from wrong symbol,
    output with correct symbol, etc)

Might require some refactoring of io/DataReader to use classes conforming
to a consistent interface but could be worth it both to guide new
contributors and to make it easier to test.

Could probably shrink it to two required functions generated request,
process response and then abstract actual request handling to TV with
base class.
On Jul 9, 2013 4:47 PM, "Phillip Cloud" [email protected] wrote:

i love a good game of whack-a-mole.

https://travis-ci.org/pydata/pandas/jobs/8900734


Reply to this email directly or view it on GitHubhttps://github.com//issues/4182
.

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

@jtratner

i hadn't seen u around these parts in a while; was wondering where ya went :)

you're right on all points, want to put up a PR? i would be eternally grateful. i'm so tired of dealing with these errors and i never use this functionality.

but...what you're describing is a major rewrite of data.py. in the long term is great

these fails are actually "bugs" (sort of)...if all are nan Panel cannot convert e.g., Panel({'a': np.nan}) will generate these errors. nans only propagate if there's at least a single 2D non nan value in the dict.

@jreback do you have any suggestions here? this issue will show up again since when no data are found there's a Panel from a "1D" dict which rightly fails.

i'm a torn between wanting to get the release out and fixing this for real™, the for real fix is going to be something similar to @jtratner's ideas but will probably take a few iterations to get right...

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

what is the Panel that fails? can you show an example?

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

d = {}
keys = 'a', 'b', 'c' # pretend these are stocks
for k in keys:
    d[k] = np.nan
p = Panel(d)  # fails because there are no 2D items in the dict

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

error message is bit cryptic, but i'm ignoring that issue for now

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

code that fails is not that different from my example

    stocks = {}
    for sym_group in _in_chunks(symbols, chunksize):
        for sym in sym_group:
            try:
                stocks[sym] = method(sym, start, end, retry_count, pause)
            except IOError:
                warnings.warn('Failed to read symbol: {0!r}, replacing with '
                              'NaN.'.format(sym), SymbolWarning)
                stocks[sym] = np.nan

    return Panel(stocks).swapaxes('items', 'minor')

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

yep...you could do it like this

Panel(stocks,major_axis=major_axis,minor_axis=minor_axis

the items are the keys in the dict (the syms)

but a-prior you prob don't know the major_axis (dates) nor minor_axis (fields)

so maybe

any(not np.isnan(v) for k,v in stocks.items())

or just keep a flag if there are any valid stocks

or catch the construction error on the Panel

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

i'll throw a RemoteDataError, joy

@jtratner
Copy link
Contributor

jtratner commented Jul 9, 2013

@cpcloud haha, yeah...definitely been on the lighter side recently. I have to work and am taking CS classes on the side, so I have a busy schedule (+ I'm waiting for 0.12 to come out to push some of those PRs 😄).

I'd be happy to work something up in the long-term, I just don't have time this week (or next week).

Personally, I think you should monkey-patch this to work for now (e.g., by having it catch the failure with Panel construction and throw a graceful error) and get the release out. My guess is that io/data is going to be a constantly moving target anyways until we piece apart the bigger functions and force 'em to do smaller tasks.

@jtratner
Copy link
Contributor

jtratner commented Jul 9, 2013

On a similar note, I'm always struck by how much time you and @jreback put into pandas. pretty neat.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

@jtratner its magic! :)

@cpcloud
Copy link
Member Author

cpcloud commented Jul 9, 2013

@jtratner @jreback indeed! pandas FTW!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants