Skip to content

Commit 05bf507

Browse files
committed
BUG: Fix loading files from S3 with # characters in URL (GH25945)
This fixes loading files with URLs such as s3://bucket/key#1.csv. The part from the # on was being lost because it was considered to be a URL fragment. The fix disables URL fragment parsing as it doesn't make sense for S3 URLs.
1 parent 70773d9 commit 05bf507

File tree

4 files changed

+9
-1
lines changed

4 files changed

+9
-1
lines changed

doc/source/whatsnew/v0.25.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@ I/O
356356
- Bug in :func:`read_hdf` not properly closing store after a ``KeyError`` is raised (:issue:`25766`)
357357
- Bug in ``read_csv`` which would not raise ``ValueError`` if a column index in ``usecols`` was out of bounds (:issue:`25623`)
358358
- Improved :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` to read incorrectly formatted 118 format files saved by Stata (:issue:`25960`)
359+
- Fixed bug in loading objects from S3 that contain ``#`` characters in the URL (:issue:`25945`)
359360

360361
Plotting
361362
^^^^^^^^

pandas/io/s3.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
def _strip_schema(url):
1212
"""Returns the url without the s3:// part"""
13-
result = parse_url(url)
13+
result = parse_url(url, allow_fragments=False)
1414
return result.netloc + result.path
1515

1616

pandas/tests/io/conftest.py

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ def s3_resource(tips_file, jsonl_file):
5959
moto = pytest.importorskip('moto')
6060

6161
test_s3_files = [
62+
('tips#1.csv', tips_file),
6263
('tips.csv', tips_file),
6364
('tips.csv.gz', tips_file + '.gz'),
6465
('tips.csv.bz2', tips_file + '.bz2'),

pandas/tests/io/parser/test_network.py

+6
Original file line numberDiff line numberDiff line change
@@ -198,3 +198,9 @@ def test_read_csv_chunked_download(self, s3_resource, caplog):
198198
read_csv("s3://pandas-test/large-file.csv", nrows=5)
199199
# log of fetch_range (start, stop)
200200
assert ((0, 5505024) in {x.args[-2:] for x in caplog.records})
201+
202+
def test_read_s3_with_hash_in_key(self, tips_df):
203+
df = read_csv('s3://pandas-test/tips#1.csv')
204+
assert isinstance(df, DataFrame)
205+
assert not df.empty
206+
tm.assert_frame_equal(tips_df, df)

0 commit comments

Comments
 (0)