Skip to content

SAS xport file reader #9711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 14, 2015
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,15 @@ HDFStore: PyTables (HDF5)
HDFStore.get
HDFStore.select

SAS
~~~

.. autosummary::
:toctree: generated/

read_sas
XportReader

SQL
~~~

Expand Down
41 changes: 41 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ object.
* :ref:`read_html<io.read_html>`
* :ref:`read_gbq<io.bigquery>` (experimental)
* :ref:`read_stata<io.stata_reader>`
* :ref:`read_sas<io.sas_reader>`
* :ref:`read_clipboard<io.clipboard>`
* :ref:`read_pickle<io.pickle>`

Expand Down Expand Up @@ -4120,6 +4121,46 @@ easy conversion to and from pandas.

.. _xray: http://xray.readthedocs.org/

.. _io.sas:

SAS Format
----------

.. versionadded:: 0.17.0

The top-level function :function:`read_sas` currently can read (but
not write) SAS xport (.XPT) format files. Pandas cannot currently
handle SAS7BDAT files.

XPORT files only contain two value types: ASCII text and double
precision numeric values. There is no automatic type conversion to
integers, dates, or categoricals. By default the whole file is read
and returned as a ``DataFrame``.

Specify a ``chunksize`` or use ``iterator=True`` to obtain an
``XportReader`` object for incrementally reading the file. The
``XportReader`` object also has attributes that contain additional
information about the file and its variables.

Read a SAS XPORT file:

.. code-block:: python

df = pd.read_sas('sas_xport.xpt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need .. ipython: :python around each of these blocks to have it exectue, or probably better .. code-block to have it look like code but not execute


Obtain an iterator and read an XPORT file 100,000 lines at a time:

.. code-block:: python

rdr = pd.read_sas('sas_xport.xpt', chunk=100000)
for chunk in rdr:
do_something(chunk)

The specification_ for the xport file format is available from the SAS
web site.

.. _specification: https://support.sas.com/techsup/technote/ts140.pdf

.. _io.perf:

Performance Considerations
Expand Down
9 changes: 8 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Highlights include:
if they are all ``NaN``, see :ref:`here <whatsnew_0170.api_breaking.hdf_dropna>`
- Support for ``Series.dt.strftime`` to generate formatted strings for datetime-likes, see :ref:`here <whatsnew_0170.strftime>`
- Development installed versions of pandas will now have ``PEP440`` compliant version strings (:issue:`9518`)
- Support for reading SAS xport files, see :meth:`~pandas.io.read_sas`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a link to the section below

Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.

Expand All @@ -37,7 +38,6 @@ New features
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)
- Enable reading gzip compressed files via URL, either by explicitly setting the compression parameter or by inferring from the presence of the HTTP Content-Encoding header in the response (:issue:`8685`)


.. _whatsnew_0170.gil:

Releasing the GIL
Expand Down Expand Up @@ -94,6 +94,13 @@ Other enhancements

- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)

- :meth:`~pandas.io.read_sas` provides support for reading SAS XPORT format files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separate sub-section (use ^^^^^). put a ref so you can refer from the highlites above. Add in a link the actual docs as well (io.sas)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this in the highlites section (at the top) as well (you can put a link to the actualy docs there as well)

df = pd.read_sas('sas_xport.xpt')

It is also possible to obtain an iterator and read an XPORT file
incrementally.

- ``DatetimeIndex`` can be instantiated using strings contains ``NaT`` (:issue:`7599`)
- The string parsing of ``to_datetime``, ``Timestamp`` and ``DatetimeIndex`` has been made consistent. (:issue:`7599`)

Expand Down
1 change: 1 addition & 0 deletions pandas/io/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from pandas.io.json import read_json
from pandas.io.html import read_html
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
from pandas.io.sas import read_sas
from pandas.io.stata import read_stata
from pandas.io.pickle import read_pickle, to_pickle
from pandas.io.packers import read_msgpack, to_msgpack
Expand Down
Loading