Reading a SAS file with 0 rows gives None #18198

AbdealiLoKo · 2017-11-09T18:11:40Z

Code Sample, a copy-pastable example if possible

In [2]: import pandas as pd

In [3]: pd.read_csv("zero_observations.csv")
Out[3]: 
Empty DataFrame
Columns: [a, b, c, d]
Index: []

In [4]: pd.read_sas("zero_observations.sas7bdat")

In [5]:

This is a.csv:

$ cat zero_observations.csv
a,b,c,d

Problem description

When reading a SAS file with 0 records, it gives None. But when reading a CSV file with 0 records it gives an empty dataframe.

In SAS, we can atleast identify the datatypes correctly and make an empty dataframe with this.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 23.0.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.4.9
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 0.9.8
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-09T18:17:52Z

is this just an issue for #18184 ?

normally we do open issues first but since u opened the pr we actually don’t need an issue (fyi for the future)

AbdealiLoKo · 2017-11-09T18:25:20Z

This is actually an issue for 0 rows and not the 0 variables case.
The PR #18184 handles the case for 0 variables. Im not very sure how to fix the one for 0 rows hence created a issue for that

AbdealiLoKo · 2017-11-10T03:28:08Z

When going through this, I realized there may be some confusion in expected behavior.

In read_csv() if I give a file with 0 rows and iterator, it gives:

In [30]: for i, idf in enumerate(pd.read_csv("/tmp/a.csv", iterator=True)):
    ...:     print("IterCount:", i)
    ...:     print(idf)
    ...:     
IterCount: 0
Empty DataFrame
Columns: [a, b, c, d]
Index: []

This seems non-intuitive because even though there are 0 rows, the iterator runs atleast once. While in read_sas() the equivalent iterator runs 0 times.
In read_csv() when I give a file with 2 records, and chunksize=2 it gives an iterator with 1 item. Not two items (First one with 2 rows, Second with 0 rows)

In [32]: for i, idf in enumerate(pd.read_csv("/tmp/b.csv", iterator=True, chunksize=2)):
    ...:     print("IterCount:", i)
    ...:     print(idf)
    ...:     
IterCount: 0
   a  b  c  d
0  1  2  3  4
1  5  6  7  8

This is definitely a bit of a corner case but I'd like to know what is the expected behavior.

I believe the following makes sense:

If it is an iterator, give None so that the iterator can stop correctly
If it it not an iterator and the whole dataframe is being read, identify EmptyDataFrames with header names and column types and return the empty data frame

AbdealiLoKo · 2017-11-29T18:36:27Z

Any thoughts on this one ?

mukesh5 · 2019-01-12T03:46:24Z

Is this issue still open? Can I pick it up?

jbrockmendel · 2019-12-19T00:04:35Z

@AbdealiJK this shouldn't be too hard to fix, but in order to test will need a zero-observation sas7bdat file. can you provide one?

paul-lilley · 2020-08-31T14:39:50Z

Hi

data one_row (compress=no);
	char_field = "abc";
	num_field = 123.4;
run;
data zero_rows_no_compress (compress=no);
	set one_row (obs=0);
run;

The above SAS code created the attached SAS file with 0 rows. Hope this helps @jbrockmendel @mukesh5

zero_rows_no_compress.zip

AbdealiLoKo · 2020-08-31T16:05:13Z

Hi Sorry @jbrockmendel - seems like I missed your ping earlier.
I don't have access to a SAS environment any more for the past year, so haven't been able to follow up on this, nor provide an example file

ofajardo · 2020-12-09T11:15:42Z

just in case pyreadstat can read this file OK (if this is what you are expecting):

>>> import pyreadstat
>>> df, meta = pyreadstat.read_sas7bdat("zero_rows_no_compress.sas7bdat")
>>> df
Empty DataFrame
Columns: [char_field, num_field]
Index: []

So one option could be to add pyreadstat as backend for read_sas (it is already used in read_spss). It would also help solving other issues: #37088, #35545, #22720

Condielj · 2022-03-28T18:18:08Z

take

jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas IO SAS SAS: read_sas labels Nov 10, 2017

jreback added this to the Next Major Release milestone Nov 10, 2017

jreback added good first issue and removed good first issue Difficulty Novice labels Dec 15, 2017

Alexandreae mentioned this issue Aug 13, 2019

Issues de sala de aula Insper/open-dev#45

Closed

jbrockmendel removed the Effort Low label Oct 21, 2019

mroeschke added the Bug label Apr 5, 2020

This was referenced Dec 9, 2020

Reading SAS time variables #22720

Open

Potential bug in reading SAS files with CHAR (RLE) compression and many repeated characters #31243

Closed

pranava1709 mentioned this issue Aug 31, 2021

error resolution in sas file #43326

Closed

4 tasks

github-actions bot assigned frehler Jan 9, 2022

frehler removed their assignment Jan 14, 2022

github-actions bot assigned Condielj Mar 28, 2022

jonashaag mentioned this issue May 25, 2022

Fix reading SAS7BDAT files with zero rows #47116

Merged

4 tasks

mroeschke closed this as completed in #47116 May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a SAS file with 0 rows gives None #18198

Reading a SAS file with 0 rows gives None #18198

AbdealiLoKo commented Nov 9, 2017

jreback commented Nov 9, 2017

AbdealiLoKo commented Nov 9, 2017

AbdealiLoKo commented Nov 10, 2017 •

edited

Loading

AbdealiLoKo commented Nov 29, 2017

mukesh5 commented Jan 12, 2019

jbrockmendel commented Dec 19, 2019

paul-lilley commented Aug 31, 2020 •

edited

Loading

AbdealiLoKo commented Aug 31, 2020

ofajardo commented Dec 9, 2020

Condielj commented Mar 28, 2022

Reading a SAS file with 0 rows gives None #18198

Reading a SAS file with 0 rows gives None #18198

Comments

AbdealiLoKo commented Nov 9, 2017

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

jreback commented Nov 9, 2017

AbdealiLoKo commented Nov 9, 2017

AbdealiLoKo commented Nov 10, 2017 • edited Loading

AbdealiLoKo commented Nov 29, 2017

mukesh5 commented Jan 12, 2019

jbrockmendel commented Dec 19, 2019

paul-lilley commented Aug 31, 2020 • edited Loading

AbdealiLoKo commented Aug 31, 2020

ofajardo commented Dec 9, 2020

Condielj commented Mar 28, 2022

Output of `pd.show_versions()`

AbdealiLoKo commented Nov 10, 2017 •

edited

Loading

paul-lilley commented Aug 31, 2020 •

edited

Loading