Unable to open an S3 object with # in the URL #25945

swt2c · 2019-04-01T14:49:00Z

import pandas as pd
df = pd.read_csv('s3://bucket/key#1.csv')
df = pd.read_csv('s3://bucket/key%231.csv')

Problem description

Pandas can't open an object from S3 if it has a # sign in the URL, both in the case where the URL path is percent encoded and not. The reason is that urllib.parse.urlparse(), which is used in io/s3.py to parse the URL, treats the # sign as the beginning of the URL fragment, and thus it is lost (in the case of not percent encoded).

I see two possible solutions to the problem, but I'm not sure which one is best, since there does not seem to be a 'specification' for the S3 URL scheme (at least that I can find):

Use allow_fragments=False when calling urllib.parse.urlparse(). This would allow the non-percent-encoded case to work, but seems slightly wrong.
Call urllib.parse.unquote() on S3 paths before passing to s3fs. s3fs seems to want just a bucket/key as input, so pandas would have to remove the percent encoding. This would allow the percent-encoded case to work. It seems a bit more correct, but it might change some existing behavior where users could be loading URLs with % characters in them.

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-04-01T16:39:52Z

Not certain myself - @TomAugspurger may have thoughts but otherwise if you have a proposal in mind feel free to submit a PR

yackoa · 2019-04-03T01:41:59Z

Based on the analysis by @swt2c , I think the option#1 is the way to go (feel free to correct me if I am wrong).
I can start working on this issue & submit a PR with the changed code soon if that's ok.

WillAyd added the IO Data IO issues that don't fit into a more specific label label Apr 1, 2019

swt2c mentioned this issue Apr 4, 2019

BUG: Fix loading files from S3 with # characters in URL (GH25945) #25992

Merged

4 tasks

gfyoung added the Bug label Apr 4, 2019

jreback added this to the 0.25.0 milestone Apr 9, 2019

WillAyd closed this as completed in #25992 Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to open an S3 object with # in the URL #25945

Unable to open an S3 object with # in the URL #25945

swt2c commented Apr 1, 2019

WillAyd commented Apr 1, 2019

yackoa commented Apr 3, 2019

Unable to open an S3 object with # in the URL #25945

Unable to open an S3 object with # in the URL #25945

Comments

swt2c commented Apr 1, 2019

Problem description

WillAyd commented Apr 1, 2019

yackoa commented Apr 3, 2019