-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_html and read_json should provide a cache #6456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a nice feature, but I don't think should be part of core pandas. A cookbook recipie or a small section in the io.rst docs would acceptable. |
Lot's of people are using Pandas inside IPython notebook (which is a great tool) so they don't care of request limits of an API (number of time you send a request to a server) but when you are using a script that you are modifying that's quite different. And in such a case a cache could be very useful. Sometimes cache can be dangerous because you will get obsolete data but that's the risk. About requests it's a great library I like how it simplifies things but I also understand that adding a dep must be a deliberate decision. I also wonder if you know others Pandas functions that could take benefits from such a feature. |
Instead of executing the script repeatedly in a new kernel, I would suggest using
runs the file in a |
Thanks for this tips but you should admit that putting only |
@c0indev3l I think your solution above is great. |
closing as not in pandas purview, though @c0indev3l if you'd like to doc your soln would be more than happy to put it in the cookbook |
A quite clean solution to this could be to provide an other parameter (a so we could have
instead of
so if no session is passed classic
That's just my own experiment with https://github.com/femtotrader/pandas_datareaders I added cache mechanism to nearly every Python Pandas DataReaders I think this issue should be linked to #8961 . |
I see this is closed, but for someone else seeking the solution: Better to separate concerns. import requests
import pandas as pd
try:
import requests_cache
requests_cache.install_cache("my_cache", expire_after=7*86400) # cache any requests for 1 week
except ImportError:
print("Warning: requests-cache not installed, NOT using cache")
def get_text(url: str) -> str:
# use of explicit requests library, so that it automatically uses `requests_cache` if set up
response = requests.get(url)
response.raise_for_status()
return response.text
df = pd.read_html(get_text(my_url)) |
Hello,
pandas.io.html.read_html
should provide a cache such as what requests_cache provideshttps://requests-cache.readthedocs.org/en/latest/
persistence into sqlite, mongodb, redis... is very convenient.
Maybe some other pandas functions could also use this cache mechanism.
pandas.io.json.read_json
for exampleThis is what I do
Kind regards
The text was updated successfully, but these errors were encountered: