Skip to content

Commit c82cb17

Browse files
authored
BUG: AttributeError when read_sas used with GCS (#33070)
* BUG: AttributeError when read_sas used with GCS With GCSFS (and possibly other connectors), the output from they file read is `bytes`, not a Unicode `str`. The `encode` call is unnecessary in this case. * remove unnecessary try/catch for Python 3 * fix: remove unnecessary BytesIO conversion * cln: remove unused bytesio import
1 parent 467e1c2 commit c82cb17

File tree

3 files changed

+15
-8
lines changed

3 files changed

+15
-8
lines changed

doc/source/whatsnew/v1.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,7 @@ I/O
426426
- Bug in :meth:`read_csv` was causing a segfault when there were blank lines between the header and data rows (:issue:`28071`)
427427
- Bug in :meth:`read_csv` was raising a misleading exception on a permissions issue (:issue:`23784`)
428428
- Bug in :meth:`read_csv` was raising an ``IndexError`` when header=None and 2 extra data columns
429+
- Bug in :meth:`read_sas` was raising an ``AttributeError`` when reading files from Google Cloud Storage (issue:`33069`)
429430
- Bug in :meth:`DataFrame.to_sql` where an ``AttributeError`` was raised when saving an out of bounds date (:issue:`26761`)
430431
- Bug in :meth:`read_excel` did not correctly handle multiple embedded spaces in OpenDocument text cells. (:issue:`32207`)
431432

pandas/io/sas/sas_xport.py

+3-8
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
"""
1010
from collections import abc
1111
from datetime import datetime
12-
from io import BytesIO
1312
import struct
1413
import warnings
1514

@@ -263,13 +262,9 @@ def __init__(
263262
if isinstance(filepath_or_buffer, (str, bytes)):
264263
self.filepath_or_buffer = open(filepath_or_buffer, "rb")
265264
else:
266-
# Copy to BytesIO, and ensure no encoding
267-
contents = filepath_or_buffer.read()
268-
try:
269-
contents = contents.encode(self._encoding)
270-
except UnicodeEncodeError:
271-
pass
272-
self.filepath_or_buffer = BytesIO(contents)
265+
# Since xport files include non-text byte sequences, xport files
266+
# should already be opened in binary mode in Python 3.
267+
self.filepath_or_buffer = filepath_or_buffer
273268

274269
self._read_header()
275270

pandas/tests/io/sas/test_xport.py

+11
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ def setup_method(self, datapath):
2626
self.dirpath = datapath("io", "sas", "data")
2727
self.file01 = os.path.join(self.dirpath, "DEMO_G.xpt")
2828
self.file02 = os.path.join(self.dirpath, "SSHSV1_A.xpt")
29+
self.file02b = open(os.path.join(self.dirpath, "SSHSV1_A.xpt"), "rb")
2930
self.file03 = os.path.join(self.dirpath, "DRXFCD_G.xpt")
3031
self.file04 = os.path.join(self.dirpath, "paxraw_d_short.xpt")
3132

@@ -119,6 +120,16 @@ def test2(self):
119120
data = read_sas(self.file02)
120121
tm.assert_frame_equal(data, data_csv)
121122

123+
def test2_binary(self):
124+
# Test with SSHSV1_A.xpt, read as a binary file
125+
126+
# Compare to this
127+
data_csv = pd.read_csv(self.file02.replace(".xpt", ".csv"))
128+
numeric_as_float(data_csv)
129+
130+
data = read_sas(self.file02b, format="xport")
131+
tm.assert_frame_equal(data, data_csv)
132+
122133
def test_multiple_types(self):
123134
# Test with DRXFCD_G.xpt (contains text and numeric variables)
124135

0 commit comments

Comments
 (0)