Skip to content

BUG: Patch for skipping seek() when loading Excel files from urllib3.response.HTTPResponse object #28826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

onsimon
Copy link

@onsimon onsimon commented Oct 7, 2019

Discovered that 0.25.x reintroduces #20437. Proposing the same fix here.

@simonjayhawkins simonjayhawkins added IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version labels Oct 7, 2019
@simonjayhawkins simonjayhawkins added this to the 0.25.2 milestone Oct 7, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK thanks. I think this was intentionally removed here https://github.com/pandas-dev/pandas/pull/26233/files#r279220473 so @tdamsma might have thoughts as well.

In any case is there a way to add a test for this?

@tdamsma
Copy link
Contributor

tdamsma commented Oct 7, 2019

I just tested with 25.1 (on osx), can't reproduce. I took this from #20437:

import pandas

url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/test1.xls"
df = pandas.read_excel(url)
print(df)

I believe there are tests for downloading from a url included in the test suite.

@WillAyd
Copy link
Member

WillAyd commented Oct 7, 2019

Thanks @tdamsma ! I actually was not able to reproduce this on master either. @hellbe can you try there?

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Oh, slight difference in use actually. The reason is that I downloaded first using requests. The following code reproduces:

import pandas
import requests
url = "https://raw.github.com/pandas-dev/pandas/master/pandas/tests/io/data/test1.xls"
r = requests.get(url, stream=True)
df = pandas.read_excel(r.raw)
print(df)

Throws UnsupportedOperation: seek

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

So in my case I can easily fix my code by passing the url directly instead. However, there might be other situations where extra functionality from requests is needed, e.g. for authentication.

https://requests.kennethreitz.org/en/master/user/quickstart/#raw-response-content

So excel_read() is fed an urllib3.response.HTTPResponse object which has a read() but not a a seek() method.

So I think the patch is still valid, but maybe there's a nicer way to mitigate?

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Updated the issue with the above information.

@onsimon onsimon changed the title BUG: Patch for skipping seek() when loading Excel files from url BUG: Patch for skipping seek() when loading Excel files from urllib3.response.HTTPResponse object Oct 8, 2019
@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Updated the PR commit to match above and refer to new issue

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Rebased

@onsimon
Copy link
Author

onsimon commented Oct 8, 2019

Surely possible to create a test case specifically for reading urllib3's HTTPResponse objects but I'm not sure if it's meaningful tbh

@WillAyd
Copy link
Member

WillAyd commented Oct 8, 2019

@ocefpaf would this fall under the umbrella of what you were trying to accomplish in #21504? Wonder if it would be better to address comprehensively in a common func rather than one-off for excel files

@ocefpaf
Copy link

ocefpaf commented Oct 8, 2019

@ocefpaf would this fall under the umbrella of what you were trying to accomplish in #21504? Wonder if it would be better to address comprehensively in a common func rather than one-off for excel files

Probably. Let me check. I still have the old PR here waiting for me. (I'm at a code sprint at the moment, Might tackle it again this week.)

@ocefpaf
Copy link

ocefpaf commented Oct 9, 2019

@WillAyd I can confirm that #21504 (or the new incarnation that I'll push in a few hours) will "fix this" b/c pandas will use requests and wrap it in a BytesIO object, with seek, if requests is installed.

@ocefpaf ocefpaf mentioned this pull request Oct 9, 2019
4 tasks
@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Thanks for the PR @hellbe but I think this is getting more comprehensively addressed in #28874 by @ocefpaf so let's track this there

@WillAyd WillAyd closed this Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnsupportedOperation 'seek' when loading excel files from urllib3.response.HTTPResponse object
5 participants