Skip to content

One file per datareader #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
femtotrader opened this issue Aug 21, 2015 · 7 comments
Closed

One file per datareader #58

femtotrader opened this issue Aug 21, 2015 · 7 comments
Milestone

Comments

@femtotrader
Copy link
Contributor

Hello,

data.py is a very (too) big file.

I think we should have one file per datareader.

Moreover this will help #48

I've done on my fork a branch where each datareader is a file.

$ nosetests -s -v

shows

Ran 58 tests in 83.806s
FAILED (SKIP=7, errors=10, failures=3)

that's exactly same result with this master branch.

I will send a PR.

Kind regards

@bashtage
Copy link
Contributor

The splitting seems too fine grained IMO. Why not split by data vendor only?

@bashtage
Copy link
Contributor

I think the utility functions, which is anything that isn't vendor specific, should also be collected in shared, rather than having one file per function.

@femtotrader
Copy link
Contributor Author

what utility function will you put in "shared" ?
I will personally move function to "shared" only if they are actually shared.

I used this fine grained split because it will be easier for me to add caching (see #48 ) this way like I did in https://github.com/femtotrader/pandas_datareaders_unofficial .

Moreover it will be easier to see what is done and what need to be done.

@femtotrader
Copy link
Contributor Author

A vendor can decide to deprecate one API and not other ones.
Is Yahoo Finance Components deprecated ?
With one file per reader it will be IMHO easier and could also avoid spaghettis.

@bashtage
Copy link
Contributor

I've never some across a package that has one function per file (or anything close). With modern editors it doesn't matter how large the files is since you can always use folding to hide anything you don't want to see.

In terms of caching, is there some reason now to use django-style @cache decorator that assumes the exact same set of inputs produces the same output, so that only the final result is cached, and the reader doesn't need to be modified?

@davidastephens
Copy link
Member

I'm fine with splitting by API. I agree with @bashtage though, I don't think we need one file per shared function. Just one shared file works.

Any benefit of using subpackages by vendor?
Ie: pandas_datareader/yahoo/options.py
Ie: pandas_datareader/yahoo/actions.py

@bashtage
Copy link
Contributor

@davidastephens subpackage suggestion looks reasonable and seems to be more canonical Python packaging.

So something like pdr/vendor/datatype plus pdr/common.py where common is anything that isn't vendor specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants