Skip to content

Commit f2b5529

Browse files
committed
BUG: Fix loading files from S3 with # characters in URL (GH25945)
This fixes loading files with URLs such as s3://bucket/key#1.csv. The part from the # on was being lost because it was considered to be a URL fragment. The fix disables URL fragment parsing as it doesn't make sense for S3 URLs.
1 parent 70773d9 commit f2b5529

File tree

3 files changed

+8
-1
lines changed

3 files changed

+8
-1
lines changed

doc/source/whatsnew/v0.25.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@ I/O
356356
- Bug in :func:`read_hdf` not properly closing store after a ``KeyError`` is raised (:issue:`25766`)
357357
- Bug in ``read_csv`` which would not raise ``ValueError`` if a column index in ``usecols`` was out of bounds (:issue:`25623`)
358358
- Improved :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` to read incorrectly formatted 118 format files saved by Stata (:issue:`25960`)
359+
- Fixed bug in loading objects from S3 that contain # characters in the URL (:issue:`25945`)
359360

360361
Plotting
361362
^^^^^^^^

pandas/io/s3.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
def _strip_schema(url):
1212
"""Returns the url without the s3:// part"""
13-
result = parse_url(url)
13+
result = parse_url(url, allow_fragments=False)
1414
return result.netloc + result.path
1515

1616

pandas/tests/io/test_s3.py

+6
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from pandas import read_csv
66

77
from pandas.io.common import is_s3_url
8+
from pandas.io.s3 import _strip_schema
89

910

1011
class TestS3URL(object):
@@ -27,3 +28,8 @@ def test_streaming_s3_objects():
2728
for el in data:
2829
body = StreamingBody(BytesIO(el), content_length=len(el))
2930
read_csv(body)
31+
32+
33+
def test_parse_s3_url_with_pound_sign():
34+
# GH25945
35+
assert _strip_schema('s3://bucket/key#1.csv') == 'bucket/key#1.csv'

0 commit comments

Comments
 (0)