-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add class to write dta format 117 files #20844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
d54541a
ENH: Add class to write dta format 117 files
bashtage 900c9f7
DOC: Clean up doc strings
bashtage a5f1653
BUG: Fix bugs in stata
bashtage 4397ae7
CLN: Fix imports
bashtage 2d54ded
Merge remote-tracking branch 'upstream/master' into bashtage-strl-sup…
TomAugspurger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1769,27 +1769,28 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', | |
|
||
def to_stata(self, fname, convert_dates=None, write_index=True, | ||
encoding="latin-1", byteorder=None, time_stamp=None, | ||
data_label=None, variable_labels=None): | ||
data_label=None, variable_labels=None, version=114, | ||
convert_strl=None): | ||
""" | ||
A class for writing Stata binary dta files from array-like objects | ||
Export Stata binary dta files. | ||
|
||
Parameters | ||
---------- | ||
fname : str or buffer | ||
String path of file-like object | ||
String path of file-like object. | ||
convert_dates : dict | ||
Dictionary mapping columns containing datetime types to stata | ||
internal format to use when writing the dates. Options are 'tc', | ||
'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer | ||
or a name. Datetime columns that do not have a conversion type | ||
specified will be converted to 'tc'. Raises NotImplementedError if | ||
a datetime column has timezone information | ||
a datetime column has timezone information. | ||
write_index : bool | ||
Write the index to Stata dataset. | ||
encoding : str | ||
Default is latin-1. Unicode is not supported | ||
Default is latin-1. Unicode is not supported. | ||
byteorder : str | ||
Can be ">", "<", "little", or "big". default is `sys.byteorder` | ||
Can be ">", "<", "little", or "big". default is `sys.byteorder`. | ||
time_stamp : datetime | ||
A datetime to use as file creation date. Default is the current | ||
time. | ||
|
@@ -1801,6 +1802,23 @@ def to_stata(self, fname, convert_dates=None, write_index=True, | |
|
||
.. versionadded:: 0.19.0 | ||
|
||
version : {114, 117} | ||
Version to use in the output dta file. Version 114 can be used | ||
read by Stata 10 and later. Version 117 can be read by Stata 13 | ||
or later. Version 114 limits string variables to 244 characters or | ||
fewer while 117 allows strings with lengths up to 2,000,000 | ||
characters. | ||
|
||
.. versionadded:: 0.23.0 | ||
|
||
convert_strl : list, optional | ||
List of column names to convert to string columns to Stata StrL | ||
format. Only available if version is 117. Storing strings in the | ||
StrL format can produce smaller dta files if strings have more than | ||
8 characters and values are repeated. | ||
|
||
.. versionadded:: 0.23.0 | ||
|
||
Raises | ||
------ | ||
NotImplementedError | ||
|
@@ -1814,6 +1832,12 @@ def to_stata(self, fname, convert_dates=None, write_index=True, | |
|
||
.. versionadded:: 0.19.0 | ||
|
||
See Also | ||
-------- | ||
pandas.read_stata : Import Stata data files | ||
pandas.io.stata.StataWriter : low-level writer for Stata data files | ||
pandas.io.stata.StataWriter117 : low-level writer for version 117 files | ||
|
||
Examples | ||
-------- | ||
>>> data.to_stata('./data_file.dta') | ||
|
@@ -1832,12 +1856,23 @@ def to_stata(self, fname, convert_dates=None, write_index=True, | |
>>> writer = StataWriter('./date_data_file.dta', data, {2 : 'tw'}) | ||
>>> writer.write_file() | ||
""" | ||
from pandas.io.stata import StataWriter | ||
writer = StataWriter(fname, self, convert_dates=convert_dates, | ||
kwargs = {} | ||
if version not in (114, 117): | ||
raise ValueError('Only formats 114 and 117 supported.') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to include the user passed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can push in a little bit. |
||
if version == 114: | ||
if convert_strl is not None: | ||
raise ValueError('strl support is only available when using ' | ||
'format 117') | ||
from pandas.io.stata import StataWriter as statawriter | ||
else: | ||
from pandas.io.stata import StataWriter117 as statawriter | ||
kwargs['convert_strl'] = convert_strl | ||
|
||
writer = statawriter(fname, self, convert_dates=convert_dates, | ||
encoding=encoding, byteorder=byteorder, | ||
time_stamp=time_stamp, data_label=data_label, | ||
write_index=write_index, | ||
variable_labels=variable_labels) | ||
variable_labels=variable_labels, **kwargs) | ||
writer.write_file() | ||
|
||
def to_feather(self, fname): | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should these be strings? Is this exposed anywhere in stats itself? Do they use integers? (when I see version number, I think string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, https://www.stata.com/support/faqs/data-management/save-for-previous-version/ seems to suggest that stata uses integers?
version(13)
. OK then, let's follow that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use 10 and 13 which are the Stata release versions.