-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Allow reading SAS files from archives #47154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @bashtage @twoertwein if any comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! One small optional comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly straightforward. Don't fully see the point of the big docstring refactor since no real reuse.
pandas/io/sas/sasreader.py
Outdated
|
||
from pandas.io.common import stringify_path | ||
|
||
if TYPE_CHECKING: | ||
from pandas import DataFrame | ||
|
||
|
||
_doc_read_sas = r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems strange to pull this out just to use Appender
. Why not leave it in-line and use Substitution
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See, e.g.,
pandas/pandas/core/groupby/groupby.py
Line 2425 in c5e32b5
@Substitution(name="groupby") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I copy pasted that from some other place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bashtage now we have the problem that the substituted snippet is dedented:
...
iterator : bool, defaults to False
If True, returns an iterator for reading the file incrementally.
.. versionchanged:: 1.2
``TextFileReader`` is a context manager.
compression : str or dict, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer' and '%s' is
...
Last line should be indented.
Any pointers how to deal with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the @doc
decorator as in here
Line 49 in fa7e31b
@doc( |
def test_sas_archive(datapath): | ||
fname_uncompressed = datapath("io", "sas", "data", "airline.sas7bdat") | ||
df_uncompressed = read_sas(fname_uncompressed) | ||
fname_compressed = datapath("io", "sas", "data", "airline.sas7bdat.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth adding a gzip'd file? Could the gzip file be created on the fly during the test? This would also let you test other supported formats, e.g., zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have .gz
files in the repo so I thought that's the way to go. Can also gzip the file in the test. Let me know what's your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is essential to use this method - I just wonder about the longer term future since it isn't strictly necessary to add compressed versions of files to the main repo. I'm happy to sign off on it as it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2c. I would prefer these files to by gzipped on the fly in the test such that files in the main repo are absolutely necessary. Doesn't have to be for this PR but a followup would be appreciated
pandas/io/sas/sasreader.py
Outdated
@@ -107,6 +121,7 @@ def read_sas( | |||
.. versionchanged:: 1.2 | |||
|
|||
``TextFileReader`` is a context manager. | |||
%(decompression_options)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No indent here for the %(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See question in other comment thread. I think your suggestion is wrong.
Thanks @jonashaag |
* Allow reading SAS files from archives * Add missing file * Review feedback * Fix Co-authored-by: Jeff Reback <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.