Skip to content

Commit 23810e5

Browse files
ksheddenjreback
authored andcommitted
ENH: Support for reading SAS7BDAT files
closes #4052 closes #12015
1 parent e39f63a commit 23810e5

39 files changed

+1328
-91
lines changed

LICENSES/SAS7BDAT_LICENSE

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2015 Jared Hobbs
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of
4+
this software and associated documentation files (the "Software"), to deal in
5+
the Software without restriction, including without limitation the rights to
6+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7+
of the Software, and to permit persons to whom the Software is furnished to do
8+
so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.

asv_bench/benchmarks/packers.py

+19-1
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,24 @@ def remove(self, f):
318318
pass
319319

320320

321+
class packers_read_sas7bdat(object):
322+
323+
def setup(self):
324+
self.f = 'data/test1.sas7bdat'
325+
326+
def time_packers_read_sas7bdat(self):
327+
pd.read_sas(self.f, format='sas7bdat')
328+
329+
330+
class packers_read_xport(object):
331+
332+
def setup(self):
333+
self.f = 'data/paxraw_d_short.xpt'
334+
335+
def time_packers_read_xport(self):
336+
pd.read_sas(self.f, format='xport')
337+
338+
321339
class packers_write_csv(object):
322340
goal_time = 0.2
323341

@@ -854,4 +872,4 @@ def remove(self, f):
854872
try:
855873
os.remove(self.f)
856874
except:
857-
pass
875+
pass

doc/source/io.rst

+18-15
Original file line numberDiff line numberDiff line change
@@ -2554,7 +2554,7 @@ both on the writing (serialization), and reading (deserialization).
25542554
+----------------------+------------------------+
25552555
| 0.18 | >= 0.18 |
25562556
+======================+========================+
2557-
2557+
25582558
Reading (files packed by older versions) is backward-compatibile, except for files packed with 0.17 in Python 2, in which case only they can only be unpacked in Python 2.
25592559

25602560
.. ipython:: python
@@ -4198,7 +4198,7 @@ Authenticating with user account credentials is as simple as following the promp
41984198
which will be automatically opened for you. You will be authenticated to the specified
41994199
``BigQuery`` account using the product name ``pandas GBQ``. It is only possible on local host.
42004200
The remote authentication using user account credentials is not currently supported in Pandas.
4201-
Additional information on the authentication mechanism can be found
4201+
Additional information on the authentication mechanism can be found
42024202
`here <https://developers.google.com/identity/protocols/OAuth2#clientside/>`__.
42034203

42044204
Authentication with service account credentials is possible via the `'private_key'` parameter. This method
@@ -4564,24 +4564,25 @@ easy conversion to and from pandas.
45644564

45654565
.. _io.sas_reader:
45664566

4567-
SAS Format
4568-
----------
4567+
SAS Formats
4568+
-----------
45694569

45704570
.. versionadded:: 0.17.0
45714571

4572-
The top-level function :func:`read_sas` currently can read (but
4573-
not write) SAS xport (.XPT) format files. Pandas cannot currently
4574-
handle SAS7BDAT files.
4572+
The top-level function :func:`read_sas` can read (but not write) SAS
4573+
`xport` (.XPT) and `SAS7BDAT` (.sas7bdat) format files (v0.18.0).
45754574

4576-
XPORT files only contain two value types: ASCII text and double
4577-
precision numeric values. There is no automatic type conversion to
4578-
integers, dates, or categoricals. By default the whole file is read
4579-
and returned as a ``DataFrame``.
4575+
SAS files only contain two value types: ASCII text and floating point
4576+
values (usually 8 bytes but sometimes truncated). For xport files,
4577+
there is no automatic type conversion to integers, dates, or
4578+
categoricals. For SAS7BDAT files, the format codes may allow date
4579+
variables to be automatically converted to dates. By default the
4580+
whole file is read and returned as a ``DataFrame``.
45804581

4581-
Specify a ``chunksize`` or use ``iterator=True`` to obtain an
4582-
``XportReader`` object for incrementally reading the file. The
4583-
``XportReader`` object also has attributes that contain additional
4584-
information about the file and its variables.
4582+
Specify a ``chunksize`` or use ``iterator=True`` to obtain reader
4583+
objects (``XportReader`` or ``SAS7BDATReader``) for incrementally
4584+
reading the file. The reader objects also have attributes that
4585+
contain additional information about the file and its variables.
45854586

45864587
Read a SAS XPORT file:
45874588

@@ -4602,6 +4603,8 @@ web site.
46024603

46034604
.. _specification: https://support.sas.com/techsup/technote/ts140.pdf
46044605

4606+
No official documentation is available for the SAS7BDAT format.
4607+
46054608
.. _io.perf:
46064609

46074610
Performance Considerations

doc/source/whatsnew/v0.18.0.txt

+8
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Highlights include:
2424
since 0.14.0. This will now raise a ``TypeError``, see :ref:`here <whatsnew_0180.float_indexers>`.
2525
- The ``.to_xarray()`` function has been added for compatibility with the
2626
`xarray package <http://xarray.pydata.org/en/stable/>`__, see :ref:`here <whatsnew_0180.enhancements.xarray>`.
27+
- The ``read_sas`` function has been enhanced to read ``sas7bdat`` files, see :ref:`here <whatsnew_0180.enhancements.sas>`.
2728
- Addition of the :ref:`.str.extractall() method <whatsnew_0180.enhancements.extract>`,
2829
and API changes to the :ref:`.str.extract() method <whatsnew_0180.enhancements.extract>`
2930
and :ref:`.str.cat() method <whatsnew_0180.enhancements.strcat>`.
@@ -403,6 +404,13 @@ For example, if you have a jupyter notebook you plan to convert to latex using n
403404
Options ``display.latex.escape`` and ``display.latex.longtable`` have also been added to the configuration and are used automatically by the ``to_latex``
404405
method. See the :ref:`options documentation<options>` for more info.
405406

407+
.. _whatsnew_0180.enhancements.sas:
408+
409+
SAS7BDAT files
410+
^^^^^^^^^^^^^^
411+
412+
Pandas can now read SAS7BDAT files, including compressed files. The files can be read in entirety, or incrementally. For full details see :ref:`here <io.sas>`. (issue:`4052`)
413+
406414
.. _whatsnew_0180.enhancements.other:
407415

408416
Other enhancements

pandas/io/api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from pandas.io.json import read_json
1212
from pandas.io.html import read_html
1313
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
14-
from pandas.io.sas import read_sas
14+
from pandas.io.sas.sasreader import read_sas
1515
from pandas.io.stata import read_stata
1616
from pandas.io.pickle import read_pickle, to_pickle
1717
from pandas.io.packers import read_msgpack, to_msgpack

pandas/io/sas/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)