Allow reading SAS files from archives #47154

jonashaag · 2022-05-27T23:16:29Z

closes #xxxx (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

jreback · 2022-06-05T22:54:50Z

cc @bashtage @twoertwein if any comments

pandas/io/sas/sasreader.py

twoertwein

Looks good to me! One small optional comment.

bashtage

Mostly straightforward. Don't fully see the point of the big docstring refactor since no real reuse.

bashtage · 2022-06-06T17:43:38Z

pandas/io/sas/sasreader.py


 from pandas.io.common import stringify_path

 if TYPE_CHECKING:
    from pandas import DataFrame


+_doc_read_sas = r"""


Seems strange to pull this out just to use Appender. Why not leave it in-line and use Substitution?

See, e.g.,

pandas/pandas/core/groupby/groupby.py

Line 2425 in c5e32b5

@Substitution(name="groupby")

Thanks, I copy pasted that from some other place.

@bashtage now we have the problem that the substituted snippet is dedented:

... iterator : bool, defaults to False If True, returns an iterator for reading the file incrementally. .. versionchanged:: 1.2 ``TextFileReader`` is a context manager. compression : str or dict, default 'infer' For on-the-fly decompression of on-disk data. If 'infer' and '%s' is ...

Last line should be indented.

Any pointers how to deal with this?

Maybe use the @doc decorator as in here

pandas/pandas/io/xml.py

Line 49 in fa7e31b

@doc(

bashtage · 2022-06-06T17:46:30Z

pandas/tests/io/sas/test_sas.py

+def test_sas_archive(datapath):
+    fname_uncompressed = datapath("io", "sas", "data", "airline.sas7bdat")
+    df_uncompressed = read_sas(fname_uncompressed)
+    fname_compressed = datapath("io", "sas", "data", "airline.sas7bdat.gz")


It is worth adding a gzip'd file? Could the gzip file be created on the fly during the test? This would also let you test other supported formats, e.g., zip

We do have .gz files in the repo so I thought that's the way to go. Can also gzip the file in the test. Let me know what's your preference.

I don't think it is essential to use this method - I just wonder about the longer term future since it isn't strictly necessary to add compressed versions of files to the main repo. I'm happy to sign off on it as it is now.

My 2c. I would prefer these files to by gzipped on the fly in the test such that files in the main repo are absolutely necessary. Doesn't have to be for this PR but a followup would be appreciated

bashtage · 2022-06-13T07:43:36Z

pandas/io/sas/sasreader.py

@@ -107,6 +121,7 @@ def read_sas(
        .. versionchanged:: 1.2

            ``TextFileReader`` is a context manager.
+    %(decompression_options)s


No indent here for the %(...)

See question in other comment thread. I think your suggestion is wrong.

mroeschke · 2022-06-15T19:57:59Z

Thanks @jonashaag

* Allow reading SAS files from archives * Add missing file * Review feedback * Fix Co-authored-by: Jeff Reback <[email protected]>

jonashaag added 3 commits May 28, 2022 01:14

Allow reading SAS files from archives

1b720e9

Merge branch 'main' into sas-archive

6448ed2

Add missing file

b260806

jreback added the IO SAS SAS: read_sas label Jun 5, 2022

jreback added this to the 1.5 milestone Jun 5, 2022

Merge branch 'main' into sas-archive

043dd96

twoertwein reviewed Jun 6, 2022

View reviewed changes

pandas/io/sas/sasreader.py Show resolved Hide resolved

twoertwein approved these changes Jun 6, 2022

View reviewed changes

bashtage requested changes Jun 6, 2022

View reviewed changes

jonashaag added 2 commits June 12, 2022 19:32

Review feedback

67f3632

Merge branch 'main' into sas-archive

52d79f4

jonashaag mentioned this pull request Jun 12, 2022

Fix SAS7BDAT run-length encoding formula #47099

Merged

4 tasks

bashtage approved these changes Jun 12, 2022

View reviewed changes

bashtage reviewed Jun 13, 2022

View reviewed changes

jonashaag added 3 commits June 13, 2022 10:39

Fix

2306289

Merge branch 'main' into sas-archive

e917a3b

Merge branch 'main' into sas-archive

11b9bf6

mroeschke approved these changes Jun 15, 2022

View reviewed changes

mroeschke merged commit 7397adc into pandas-dev:main Jun 15, 2022

alastair mentioned this pull request Jul 6, 2022

DOC: fix shared compression_options and decompression_options #47609

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reading SAS files from archives #47154

Allow reading SAS files from archives #47154

jonashaag commented May 27, 2022

jreback commented Jun 5, 2022

twoertwein left a comment

bashtage left a comment

bashtage Jun 6, 2022

bashtage Jun 6, 2022

jonashaag Jun 9, 2022

jonashaag Jun 13, 2022

bashtage Jun 13, 2022

bashtage Jun 6, 2022

jonashaag Jun 9, 2022

jonashaag Jun 12, 2022

bashtage Jun 12, 2022

mroeschke Jun 15, 2022

bashtage Jun 13, 2022

jonashaag Jun 13, 2022

mroeschke commented Jun 15, 2022

Allow reading SAS files from archives #47154

Allow reading SAS files from archives #47154

Conversation

jonashaag commented May 27, 2022

jreback commented Jun 5, 2022

twoertwein left a comment

Choose a reason for hiding this comment

bashtage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Jun 15, 2022