Skip to content

use requests when it is installed #28874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

ocefpaf
Copy link

@ocefpaf ocefpaf commented Oct 9, 2019

closes #16716
closes #28825
closes #28826
solves #16910

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Continuation of #21504 and #17087

This PR is not ready but I'd appreciate any early comment/feedback.

@@ -336,10 +336,10 @@ def read_excel(


class _BaseExcelReader(metaclass=abc.ABCMeta):
def __init__(self, filepath_or_buffer):
def __init__(self, filepath_or_buffer, session=None):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost on how to pass the session object to this class, right now it defaults to None b/c it does not get the value from read_excel.

@@ -20,7 +20,7 @@
_stringify_path,
_validate_header_arg,
get_filepath_or_buffer,
urlopen,
_urlopen,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the non-private version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

r.raise_for_status()
content = r.content
r.close()
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use pandas.compat._optional.import_optional_dependency here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I got it right.

@simonjayhawkins simonjayhawkins added API Design Enhancement IO CSV read_csv, to_csv IO Excel read_excel, to_excel IO HTML read_html, to_html, Styler.apply, Styler.applymap IO JSON read_json, to_json, json_normalize labels Oct 9, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 10, 2019

Haven't read through the attached issue but this is a little more complicated than I would have hoped for. Is there a reason why you think this is in scope of pandas? Pandas can operate for the most part on items that adhered to the IOBase definition.

I don't think dealing with the requests.models.Response object is within scope

@@ -342,6 +342,9 @@
values. The options are `None` for the ordinary converter,
`high` for the high-precision converter, and `round_trip` for the
round-trip converter.
session : requests.Session
object with the a requests session configuration for remote file.
(requires the requests library)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a version added tag

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means.

compression = None
content_encoding = None
try:
import requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use the pandas.compat._optional.import_optional_dependency

@ocefpaf
Copy link
Author

ocefpaf commented Oct 10, 2019

Haven't read through the attached issue but this is a little more complicated than I would have hoped for. Is there a reason why you think this is in scope of pandas? Pandas can operate for the most part on items that adhered to the IOBase definition.

I don't think dealing with the requests.models.Response object is within scope

I'm assuming you are taking about the session object, right? If so what did you hope for and what do you recommend me to modify to improve this?

Some background. At NOAA we need the session object in order to read URLs that are under password protected servers. I guess we can try the .netrc approach but it would be nicer to do this without relying on local files.

@jbrockmendel
Copy link
Member

IIRC pandas_datareader supports requests Sessions

@ocefpaf
Copy link
Author

ocefpaf commented Oct 10, 2019

IIRC pandas_datareader supports requests Sessions

I know. That is what inspired the original poster of this PR.

I'm traveling back home today and I'll address the review comments during the weekend. Thanks!

@WillAyd
Copy link
Member

WillAyd commented Oct 10, 2019

I guess I'm not clear on what Sessions are doing here. Neither of the issues that this closes reference Session objects anywhere - are you sure this closes them?

Can you add test cases?

Copy link
Author

@ocefpaf ocefpaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not clear on what Sessions are doing here. Neither of the issues that this closes reference Session objects anywhere - are you sure this closes them?

The original issue was #16716 by @skynss and the original PR used session. See #17087

Allow us to pass user/password, for example, for password protected servers.
One can also use a local .netrc file. Let's remove the session for now to see if we can make progress.

Can you add test cases?

Sure. Writing those, just wanted to be sure this will get a green signal before investing more time on it.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we've been having multiple conversations about how users pass through kwargs from a read like pd.read_csv(...) to the "file open" call. Sometimes it's just the builtin open(path), but in this case it might be requests.get(...) At some point, I'd like to unify these into a backend_kwargs or open_kwargs dict that gets passed through (but that's for a separate issue. I'll try to open one).

r = urllib.request.urlopen(*args, **kwargs)
content = r.read()
content_encoding = r.headers.get("Content-Encoding", None)
if content_encoding == "gzip":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs to be under the except ImportError? From what I can tell, requests .content will automatically decode gzip: https://requests.readthedocs.io/en/master/user/quickstart/#binary-response-content

@WillAyd
Copy link
Member

WillAyd commented Dec 17, 2019

@ocefpaf still active? Can you address comments and check CI failures?

@ocefpaf
Copy link
Author

ocefpaf commented Dec 17, 2019

I'll probably have some time for this during the end of the year break. If someone wants to pick it up from here before that please feel free to do so.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@ocefpaf if you can merge master and see if you can get passing would be great.

@MarcoGorelli
Copy link
Member

Hi @ocefpaf - sorry to chase you up, just wanted to ask whether you're still working on this :)
Seems that there's some test failures, but I can't check them as it says "build not found"

@WillAyd
Copy link
Member

WillAyd commented Feb 6, 2020

Closing as I think stale, but certainly ping if you'd like to continue

@WillAyd WillAyd closed this Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement IO CSV read_csv, to_csv IO Excel read_excel, to_excel IO HTML read_html, to_html, Styler.apply, Styler.applymap IO JSON read_json, to_json, json_normalize
Projects
None yet
7 participants