Skip to content

Commit 5ad2452

Browse files
committed
Add test file
1 parent 98d248c commit 5ad2452

File tree

3 files changed

+9
-1
lines changed

3 files changed

+9
-1
lines changed

doc/source/whatsnew/v1.5.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -850,7 +850,8 @@ I/O
850850
- Bug in :func:`read_sas` returned ``None`` rather than an empty DataFrame for SAS7BDAT files with zero rows (:issue:`18198`)
851851
- Bug in :class:`StataWriter` where value labels were always written with default encoding (:issue:`46750`)
852852
- Bug in :class:`StataWriterUTF8` where some valid characters were removed from variable names (:issue:`47276`)
853-
- Bug in :func:`read_sas` that caused an "Unexpected non-zero end_of_first_byte" error when reading certain SAS7BDAT files.
853+
- Bug in :func:`read_sas` with RLE-compressed SAS7BDAT files that contain 0x00 control bytes (:issue:`47099`)
854+
-
854855

855856
Period
856857
^^^^^^
Binary file not shown.

pandas/tests/io/sas/test_sas7bdat.py

+7
Original file line numberDiff line numberDiff line change
@@ -381,3 +381,10 @@ def test_exception_propagation_rle_decompress(tmp_path, datapath):
381381
tmp_file.write_bytes(data)
382382
with pytest.raises(ValueError, match="unknown control byte"):
383383
pd.read_sas(tmp_file)
384+
385+
386+
def test_0x00_control_byte(datapath):
387+
# GH 47099
388+
fname = datapath("io", "sas", "data", "0x00controlbyte.sas7bdat.gz")
389+
df = next(pd.read_sas(fname, chunksize=11_000))
390+
assert df.shape == (1, 20)

0 commit comments

Comments
 (0)