wrap urlopen with requests #21504

ocefpaf · 2018-06-15T21:46:27Z

closes read_csv from HTTPs + basic-auth + custom port throws an error (urlopen error) #16716
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Alternative to #17087

I'll write the whatsnew entry, tests, and the session option for html, json, and excel if this path is OK with the devs.

Ping @skynss who is the original author of #17087 and @gfyoung who reviewed it.

Note that the main difference in this approach is that I made the option slightly simpler by allowing only the session instead of http_params.

You can see an example of this in action in:

http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/ocefpaf/d6e9ab2c3569ff8fa181fc7885b6524d/raw/5d868fd4cbe2b81f84ccab7760b80251ef6e4651/pandas_test.ipynb

ocefpaf · 2018-06-15T21:48:24Z

pandas/io/common.py

    @contextmanager
    def urlopen(*args, **kwargs):
-        with closing(_urlopen(*args, **kwargs)) as f:
+        with closing(urlopen2(*args, **kwargs)) as f:


I'm not sure this workaround is used anywhere.
Maybe we can remove this?

I would not remove this for now. Investigate in another PR.

NP. I'll leave it there. Just renamed the urlopen variants to try to make it a little bit clear of what is going on.

ocefpaf · 2018-06-15T21:50:31Z

pandas/io/common.py

+    compression = None
+    content_encoding = None
+    try:
+        import requests


There is no need to check if requests is installed when using a session object b/c requests is needed to create the object, this is slightly simpler than the approach in #17087 and introduces less code to maintain.

ocefpaf · 2018-06-15T21:53:22Z

pandas/io/common.py

-            compression = 'gzip'
-        reader = BytesIO(req.read())
-        req.close()
+        reader, compression = _urlopen(filepath_or_buffer, session=session)


compression is not mentioned in the docs and I wonder if session should.
If so we can use the template/Appender approach, just not sure how to do that in a clean way.

compression should be mentioned in the docs. Would be great to add that.

Given my first point, session should be as well.

And yes, template / Appender approach sounds like a good idea!

I need some guidance on how to do this when crossing modules, common.py in get_filepath_or_buffer, and parsers.py for all the parsers defined in _parser_params.

The current situation is:

options/module-function get_filepath_or_buffer _parser_params

filepath_or_buffer present NA

mode present NA

encoding present present

compression missing (added in this PR) present

session missing present

It is unclear to me how to fix this. Some questions I have:

Should the compression and encoding docstring be standardized?

If yes on the question above what is the best strategy for this? Should encoding, compression, and session be defined in common.py and imported in parsers.py for appending in _parser_params? Or the other way around?

My guess is that I should do get the common options from parser.py and add it to common.py like:

_common_params = r""" Parameters ---------- encoding : str, default None Encoding to use for UTF when reading/writing (ex. 'utf-8'). `List of Python standard encodings <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ session : requests.Session object with the a requests session configuration for remote file. (requires the requests library) compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer' For on-the-fly decompression of on-disk data. If 'infer' and `filepath_or_buffer` is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no decompression). If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression. .. versionadded:: 0.18.1 support for 'zip' and 'xz' compression. """

Then compose the get_filepath_or_buffer with the missing filepath_or_buffer and mode, and later import _common_params in parsers.py to compose _parser_params. Does that sound OK?

PS: if that is correct I'd prefer if a pandas doc expert did this instead of me, I'm kind of lost in the docstrings and Appenders 😄

gfyoung · 2018-06-16T07:34:45Z

pandas/io/common.py

+            r = session.get(url)
+        else:
+            r = requests.get(url)
+        r.raise_for_status


What does this line do?

It will raise an exception if there is any issue reading the URL (any non-200 OK status).

But don't you need to actually call the function? i.e. r.raise_for_status() (with parentheses)

Yep. I totally forgot that when copying-and-pasting 😄

codecov · 2018-06-18T13:02:12Z

Codecov Report

Merging #21504 into master will decrease coverage by 0.28%.
The diff coverage is 26.47%.

@@            Coverage Diff             @@
##           master   #21504      +/-   ##
==========================================
- Coverage   92.29%   92.01%   -0.29%     
==========================================
  Files         161      169       +8     
  Lines       51497    50797     -700     
==========================================
- Hits        47530    46740     -790     
- Misses       3967     4057      +90

Flag	Coverage Δ
#multiple	`90.42% <26.47%> (-0.27%)`	⬇️
#single	`42.21% <14.7%> (-0.22%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/json.py	`92.47% <ø> (-0.63%)`	⬇️
pandas/io/parsers.py	`95.48% <100%> (+0.18%)`	⬆️
pandas/io/common.py	`66.78% <4.16%> (-6.08%)`	⬇️
pandas/io/html.py	`89.48% <77.77%> (-1.72%)`	⬇️
pandas/io/parquet.py	`71.79% <0%> (-12.83%)`	⬇️
pandas/io/formats/console.py	`65.15% <0%> (-10.61%)`	⬇️
pandas/core/arrays/base.py	`88% <0%> (-9.36%)`	⬇️
pandas/io/formats/html.py	`88.81% <0%> (-8.87%)`	⬇️
pandas/errors/__init__.py	`92.3% <0%> (-7.7%)`	⬇️
pandas/core/dtypes/base.py	`92.68% <0%> (-7.32%)`	⬇️
... and 138 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5c90e5...b46dd04. Read the comment docs.

rsignell-usgs · 2018-06-18T17:55:21Z

Tested this on a username/password protected CSV endpoint and it's working!

import requests
session = requests.sessions.Session()
session.auth = ('Nope', 'NoWay!')
url = 'https://cgoms.coas.oregonstate.edu/erddap/tabledap/CE01ISSM-BUOY-001-GPS.csv'

import pandas as pd
df = pd.read_csv(url, session=session)

pep8speaks · 2018-06-18T20:02:52Z

Hello @ocefpaf! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 25, 2018 at 15:30 Hours UTC

jreback · 2018-06-18T21:49:42Z

pandas/io/excel.py

@@ -304,7 +304,8 @@ def read_excel(io,
               **kwds):

    if not isinstance(io, ExcelFile):
-        io = ExcelFile(io, engine=engine)
+        session = kwds.get('session', None)


just list session as a kwarg in read_excel (and in ExcelFile), then just pass it in

jreback · 2018-06-18T21:50:16Z

pandas/io/json/json.py

@@ -406,6 +406,7 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
    compression = _infer_compression(path_or_buf, compression)
    filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer(
        path_or_buf, encoding=encoding, compression=compression,
+        session=session,


add a versionadded tag when you add session

jreback · 2018-06-18T21:50:24Z

pandas/io/parsers.py

@@ -297,6 +297,9 @@
    If a filepath is provided for `filepath_or_buffer`, map the file object
    directly onto memory and access the data directly from there. Using this
    option can improve performance because there is no longer any I/O overhead.
+session : requests.Session
+    object with the a requests session configuration for remote file.
+    (requires the requests library)


add a versionadded tag

WillAyd · 2018-08-17T17:16:04Z

@ocefpaf can you address latest comments and merge in changes from master?

ocefpaf · 2018-08-17T17:21:20Z

@ocefpaf can you address latest comments and merge in changes from master?

I'm in transit now but I'll have some time next week to check this. However, if someone wants to take it over please do.

I do recall asking about a documenting strategy to avoid the repetition and the comments above. It would seems that to address some of the comments I would touch current docs, and I'm not really comfortable with that. I'll re-visit this soon...

WillAyd · 2018-11-23T03:49:10Z

Can you merge master?

ocefpaf · 2018-11-23T11:27:20Z

Can you merge master?

Done! But I'm not sure of I got it right, git rebases are not my forte 😬

BTW I'm happy to iterate on the docs questions in https://github.com/pandas-dev/pandas/pull/21504/files#r196079159 but I'm quite confused where I should start (it is my first time contributing here).

WillAyd · 2018-11-23T12:49:22Z

You are better off doing the following on your branch locally:

git fetch upstream
git merge upstream/master

That retains the history of comments on GitHub so makes it easier to track.

Looks like you have some test failures with the latest push - if you could take a look would be great

ocefpaf · 2018-11-23T12:52:23Z

You are better off doing the following on your branch locally:
git fetch upstream
git merge upstream/master

That is what I did :-/

Looks like you have some test failures with the latest push - if you could take a look would be great

I'll take a look ASAP.

WillAyd · 2018-11-23T12:53:27Z

That is what I did :-/

Hmm OK. Saw the force push here so that's why I thought you had done something else. Shouldn't need the -f flag when you do it this way

datapythonista · 2018-12-19T12:02:07Z

@ocefpaf CI is red, can you fix any problem, and push again so there are no tests failures.

ocefpaf · 2018-12-20T12:44:55Z

@ocefpaf CI is red, can you fix any problem, and push again so there are no tests failures.

I believe that when I first submitted this PR the with-statement wrapper was not used or untested, that is why it passed. I tried to fix that but my Python knowledge and time is limited to do it before the end of the year. If anyone has an idea please let me know. (IMHO I would remove the wrapper b/c it will be removed anyway when pandas drop py27.)

datapythonista · 2018-12-31T00:12:51Z

Not sure what the problem is. I'll close this for now, as we can't merge with the conflicts and the CI in red, but if you find a solution for what you said, feel free to reopen.

ocefpaf · 2019-02-11T15:05:22Z

Not sure what the problem is. I'll close this for now, as we can't merge with the conflicts and the CI in red, but if you find a solution for what you said, feel free to reopen.

Would you be interested in a Python 3 only version of this PR? That is working and I guess that all the Python 2.7 code will be removed soon anyway.

WillAyd · 2019-02-11T16:13:18Z

@ocefpaf we drop Py2 support for 0.25 so sure I think Python 3 only is fine for that milestone onwards

ocefpaf commented Jun 15, 2018

View reviewed changes

gfyoung reviewed Jun 16, 2018

View reviewed changes

gfyoung added IO CSV read_csv, to_csv Enhancement labels Jun 16, 2018

ocefpaf force-pushed the use_request branch from e15dbc6 to 8faeac1 Compare June 18, 2018 13:37

WillAyd mentioned this pull request Jun 18, 2018

fixed unicode issue (21499) using requests library #21532

Closed

3 tasks

ocefpaf force-pushed the use_request branch from bdf6e15 to f034e02 Compare June 18, 2018 20:42

jreback requested changes Jun 18, 2018

View reviewed changes

ocefpaf force-pushed the use_request branch from f034e02 to b9c2446 Compare August 25, 2018 15:30

ocefpaf added 5 commits November 23, 2018 09:19

wrap urlopen with requests

153c58c

add compression docstring

f2c9540

fix call

e507a90

implement session for excel, html, and json

999bf24

fix lints

b23ac1c

ocefpaf force-pushed the use_request branch from b9c2446 to b23ac1c Compare November 23, 2018 11:23

ocefpaf force-pushed the use_request branch from 9bbed37 to b23ac1c Compare November 23, 2018 18:59

hopefully this fixes the remaning failures

b46dd04

TomAugspurger mentioned this pull request Dec 4, 2018

CI: remove failing line #24092

Merged

datapythonista closed this Dec 31, 2018

WillAyd mentioned this pull request Oct 8, 2019

BUG: Patch for skipping seek() when loading Excel files from urllib3.response.HTTPResponse object #28826

Closed

4 tasks

ocefpaf mentioned this pull request Oct 9, 2019

use requests when it is installed #28874

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrap urlopen with requests #21504

wrap urlopen with requests #21504

ocefpaf commented Jun 15, 2018

ocefpaf Jun 15, 2018

gfyoung Jun 16, 2018

ocefpaf Jun 18, 2018

ocefpaf Jun 15, 2018 •

edited by gfyoung

Loading

ocefpaf Jun 15, 2018

gfyoung Jun 16, 2018

ocefpaf Jun 18, 2018 •

edited

Loading

gfyoung Jun 16, 2018

ocefpaf Jun 18, 2018

gfyoung Jun 18, 2018

ocefpaf Jun 18, 2018

codecov bot commented Jun 18, 2018 •

edited

Loading

rsignell-usgs commented Jun 18, 2018

pep8speaks commented Jun 18, 2018 •

edited

Loading

jreback Jun 18, 2018

jreback Jun 18, 2018

jreback Jun 18, 2018

WillAyd commented Aug 17, 2018

ocefpaf commented Aug 17, 2018

WillAyd commented Nov 23, 2018

ocefpaf commented Nov 23, 2018

WillAyd commented Nov 23, 2018

ocefpaf commented Nov 23, 2018

WillAyd commented Nov 23, 2018

datapythonista commented Dec 19, 2018

ocefpaf commented Dec 20, 2018

datapythonista commented Dec 31, 2018

ocefpaf commented Feb 11, 2019

WillAyd commented Feb 11, 2019

options/module-function	`get_filepath_or_buffer`	`_parser_params`
`filepath_or_buffer`	present	NA
`mode`	present	NA
`encoding`	present	present
`compression`	missing (added in this PR)	present
`session`	missing	present

wrap urlopen with requests #21504

wrap urlopen with requests #21504

Conversation

ocefpaf commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocefpaf Jun 15, 2018 • edited by gfyoung Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocefpaf Jun 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 18, 2018 • edited Loading

Codecov Report

rsignell-usgs commented Jun 18, 2018

pep8speaks commented Jun 18, 2018 • edited Loading

Comment last updated on August 25, 2018 at 15:30 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Aug 17, 2018

ocefpaf commented Aug 17, 2018

WillAyd commented Nov 23, 2018

ocefpaf commented Nov 23, 2018

WillAyd commented Nov 23, 2018

ocefpaf commented Nov 23, 2018

WillAyd commented Nov 23, 2018

datapythonista commented Dec 19, 2018

ocefpaf commented Dec 20, 2018

datapythonista commented Dec 31, 2018

ocefpaf commented Feb 11, 2019

WillAyd commented Feb 11, 2019

ocefpaf Jun 15, 2018 •

edited by gfyoung

Loading

ocefpaf Jun 18, 2018 •

edited

Loading

codecov bot commented Jun 18, 2018 •

edited

Loading

pep8speaks commented Jun 18, 2018 •

edited

Loading