-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Use python-requests
library, if installed, to support broad http(s) scenarios including Basic Authentication
#17087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,6 +63,56 @@ levels <merging.merge_on_columns_and_levels>` documentation section. | |
left.merge(right, on=['key1', 'key2']) | ||
|
||
|
||
.. _whatsnew_0220.enhancements.read_csv: | ||
|
||
``read_csv`` use `python-requests` (if installed) to support basic auth and much more | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
If `python-requests` library is installed try to use it first. If not, continue using urllib | ||
The :meth:`DataFrame.read_csv`, :meth:`DataFrame.read_html`, :meth:`DataFrame.read_json`, | ||
:meth:`DataFrame.read_excel` now allow optional param of ``http_params`` to pass in | ||
parameters for basic auth, disable ssl strict check or even a requests.Session() object | ||
|
||
|
||
.. ipython:: python | ||
import pandas as pd | ||
|
||
# http_params is optional parameter. If it is non-empty, it attempts to use python-requests library | ||
df = pd.read_csv('https://uname:[email protected]/bb.csv', http_params= {'auth': None} ) # now url can contain username and pwd | ||
# Note - all basic auth scenarios require python-requests library | ||
|
||
# Basic Auth | ||
df = pd.read_csv('https://aa.com/bb.csv', http_params={ 'auth': ('john', 'pwd') } ) # now url can contain username and pwd | ||
|
||
# Basic Auth And disable verification of SSL certificate eg: testing | ||
up = { 'auth': ('john', 'pwd') , 'verify' : False} | ||
df = pd.read_csv('https://aa.com/bb.csv', http_params=up ) # now url can contain username and pwd | ||
|
||
# Optionally, A requests.Session() can also be passed into http_params | ||
import requests | ||
s = requests.Session() | ||
s.auth = MyAuthProvider('secret-key') # custom auth provider supported by requests | ||
df = pd.read_csv(url, http_params=s) | ||
|
||
# For advanced users, this may provide extensibility. However, testing on pandas side is limited to basic scenarios | ||
# here is an example of advanced scenario | ||
s = Session() | ||
s.auth = ('darth', 'l0rd') # if user wants to perform basic auth Skip if url itself contains username and pwd | ||
s.timeout = (3.05, 27) # if user wants to modify timeout | ||
s.verify = False # if user wants to disable ssl cert verification | ||
s.headers.update( {'User-Agent': 'Custom user agent'} ) # extensible to set any custom header needed | ||
s.proxies = { 'http': 'http://a.com:100'} # if user has proxies | ||
s.cert = '/path/client.cert' # if custom cert is needed | ||
df = pd.read_csv( 'https://aa.com/bbb.csv', http_params=s) | ||
|
||
def print_http_status(r, *args, **kwargs): | ||
print(r.status_code) | ||
print(r.headers['Content-Length']) | ||
s = Session() | ||
s.hooks = dict(response=print_http_status) | ||
df = pd.read_csv( 'https://aa.com/bbb.csv', http_params=s) | ||
|
||
|
||
.. _whatsnew_0220.enhancements.other: | ||
|
||
Other Enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,13 @@ | |
]) | ||
|
||
|
||
try: | ||
import requests | ||
_REQUESTS_INSTALLED = True | ||
except ImportError: | ||
_REQUESTS_INSTALLED = False | ||
|
||
|
||
if compat.PY3: | ||
from urllib.request import urlopen, pathname2url | ||
_urlopen = urlopen | ||
|
@@ -168,8 +175,87 @@ def _stringify_path(filepath_or_buffer): | |
return filepath_or_buffer | ||
|
||
|
||
def _is_handled_by_requests(o): | ||
return _is_url(o) and parse_url(o).scheme in ['http', 'https'] | ||
|
||
|
||
def gen_session(http_params): | ||
""" | ||
Generate python-requests session from http_params dict | ||
""" | ||
s = None | ||
if http_params and type(http_params) is requests.sessions.Session: | ||
s = http_params | ||
else: | ||
s = requests.Session() | ||
s.stream = True | ||
# Setting accept-encoding to None for backwards compatibility with | ||
# urlopen. ideally we want to allow gzip download | ||
# urlopen doesnt decompress automatically, requests does. | ||
s.headers.update({'Accept-Encoding': None}) | ||
if http_params and type(http_params) is dict: | ||
if http_params.get('auth', None) and not s.auth: | ||
s.auth = http_params.get('auth') | ||
if http_params.get('verify', True) is False and s.verify is not False: | ||
s.verify = False | ||
return s | ||
|
||
|
||
def fetch_url(url, http_params=None, skip_requests=False): | ||
""" | ||
If url is url, first try python-requests else try urllib. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just call this requests |
||
Note if requests library is used, auto gunzip is | ||
disabled for backwards compatibility of code with urlopen | ||
|
||
Parameters | ||
---------- | ||
url : str | ||
Could be: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. format according to numpydoc |
||
'http://cnn.com' | ||
'file:///home/sky/aaa.csv' | ||
|
||
http_params : dict or requests.Session(), default None | ||
A python dict containing: | ||
'auth': tuple (str, str) eg (username, password) | ||
'auth': Any other auth object accepted by requests | ||
'verify': boolean, default True | ||
If False, allow self signed and invalid SSL cert for https | ||
or | ||
A python requests.Session object if http(s) path to enable basic auth | ||
and many other scenarios that requests allows | ||
|
||
.. versionadded:: 0.22.0 | ||
|
||
skip_requests : boolean, default False | ||
for testing - disable `requests` library Internal use only | ||
|
||
.. versionadded:: 0.22.0 | ||
Raises | ||
------ | ||
ValueError if http_params specified without installed python-requests pkg | ||
""" | ||
if not http_params: | ||
skip_requests = True | ||
if (not skip_requests) and \ | ||
_REQUESTS_INSTALLED and \ | ||
_is_handled_by_requests(url): | ||
s = gen_session(http_params) | ||
resp = s.get(url) | ||
resp.raise_for_status() | ||
content_bytes = resp.content | ||
else: | ||
if http_params and (skip_requests or not _REQUESTS_INSTALLED): | ||
msg = 'To utilize http_params, python-requests library is ' + \ | ||
'required but not detected' | ||
raise ValueError(msg) | ||
resp = _urlopen(url) | ||
content_bytes = resp.read() | ||
return resp, content_bytes | ||
|
||
|
||
def get_filepath_or_buffer(filepath_or_buffer, encoding=None, | ||
compression=None): | ||
compression=None, http_params=None, | ||
skip_requests=False): | ||
""" | ||
If the filepath_or_buffer is a url, translate and return the buffer. | ||
Otherwise passthrough. | ||
|
@@ -180,19 +266,45 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None, | |
or buffer | ||
encoding : the encoding to use to decode py3 bytes, default is 'utf-8' | ||
|
||
compression : str, default None | ||
indicate the compression such as 'gzip'. | ||
|
||
http_params : dict or requests.Session(), default None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of repeating these doc-strings. use a template and Appender |
||
A python dict containing: | ||
'auth': tuple (str, str) eg (unae, pwd) | ||
'auth': Any other auth object accepted by requests | ||
'verify': boolean, default True | ||
If False, allow self signed and invalid SSL cert for https | ||
or | ||
A python requests.Session object if http(s) path to enable basic auth | ||
and many other scenarios that requests allows | ||
|
||
.. versionadded:: 0.22.0 | ||
|
||
skip_requests : boolean, default False | ||
for testing - disable `requests` library Internal use only | ||
|
||
.. versionadded:: 0.22.0 | ||
|
||
Returns | ||
------- | ||
a filepath_or_buffer, the encoding, the compression | ||
|
||
Raises | ||
------ | ||
ValueError if http_params specified without installed python-requests pkg | ||
""" | ||
filepath_or_buffer = _stringify_path(filepath_or_buffer) | ||
|
||
if _is_url(filepath_or_buffer): | ||
req = _urlopen(filepath_or_buffer) | ||
req, content_bytes = fetch_url(filepath_or_buffer, | ||
http_params, | ||
skip_requests) | ||
reader = BytesIO(content_bytes) | ||
content_encoding = req.headers.get('Content-Encoding', None) | ||
if content_encoding == 'gzip': | ||
# Override compression based on Content-Encoding header | ||
compression = 'gzip' | ||
reader = BytesIO(req.read()) | ||
return reader, encoding, compression | ||
|
||
if _is_s3_url(filepath_or_buffer): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to assert that _REQUESTS_INSTALLED here as its a private function and the reader doesn't know this.
pls expand the doc-string a bit.