Skip to content

Commit 2f6b90a

Browse files
swt2cWillAyd
authored andcommitted
BUG: Fix loading files from S3 with # characters in URL (GH25945) (#25992)
1 parent af0ecbe commit 2f6b90a

File tree

4 files changed

+8
-1
lines changed

4 files changed

+8
-1
lines changed

doc/source/whatsnew/v0.25.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,7 @@ I/O
361361
- Bug in ``read_csv`` which would not raise ``ValueError`` if a column index in ``usecols`` was out of bounds (:issue:`25623`)
362362
- Improved the explanation for the failure when value labels are repeated in Stata dta files and suggested work-arounds (:issue:`25772`)
363363
- Improved :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` to read incorrectly formatted 118 format files saved by Stata (:issue:`25960`)
364+
- Fixed bug in loading objects from S3 that contain ``#`` characters in the URL (:issue:`25945`)
364365

365366
Plotting
366367
^^^^^^^^

pandas/io/s3.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
def _strip_schema(url):
1212
"""Returns the url without the s3:// part"""
13-
result = parse_url(url)
13+
result = parse_url(url, allow_fragments=False)
1414
return result.netloc + result.path
1515

1616

pandas/tests/io/conftest.py

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ def s3_resource(tips_file, jsonl_file):
5959
moto = pytest.importorskip('moto')
6060

6161
test_s3_files = [
62+
('tips#1.csv', tips_file),
6263
('tips.csv', tips_file),
6364
('tips.csv.gz', tips_file + '.gz'),
6465
('tips.csv.bz2', tips_file + '.bz2'),

pandas/tests/io/parser/test_network.py

+5
Original file line numberDiff line numberDiff line change
@@ -198,3 +198,8 @@ def test_read_csv_chunked_download(self, s3_resource, caplog):
198198
read_csv("s3://pandas-test/large-file.csv", nrows=5)
199199
# log of fetch_range (start, stop)
200200
assert ((0, 5505024) in {x.args[-2:] for x in caplog.records})
201+
202+
def test_read_s3_with_hash_in_key(self, tips_df):
203+
# GH 25945
204+
result = read_csv('s3://pandas-test/tips#1.csv')
205+
tm.assert_frame_equal(tips_df, result)

0 commit comments

Comments
 (0)