-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add Stata 119 writer #30959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add Stata 119 writer #30959
Conversation
The rename should be easy now since it isn't released yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good a few minor things
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -224,7 +224,7 @@ Other enhancements | |||
- :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` have gained ``ignore_index`` keyword to be able to reset index after sorting (:issue:`30114`) | |||
- :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` have gained ``ignore_index`` keyword to reset index (:issue:`30114`) | |||
- :meth:`DataFrame.drop_duplicates` has gained ``ignore_index`` keyword to reset index (:issue:`30114`) | |||
- Added new writer for exporting Stata dta files in version 118, ``StataWriter118``. This format supports exporting strings containing Unicode characters (:issue:`23573`) | |||
- Added new writer for exporting Stata dta files in version 118 and 119, ``StataWriterUTF8``. This format supports exporting strings containing Unicode characters (:issue:`23573`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already released 1.0 so will want to move to the 1.1 whatsnew when that comes available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that. This is unfortunate. Are deprecation cycles still OK since this renames a class to avoid creating a trivial class for the 119 version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so you want to rename what was added in 1.0 to this right? Maybe OK as an exception if you would just immediately deprecate anyway but cc @TomAugspurger for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the idea -- it is a very trivial change to enable the new format, but StataWriter118 doesn't make sense when it supports format 119 too.
I've gone down the deprecation path to 2.0 deprecation. TBH I don't think (m)any will use StataWriter118 directly since to_stata
contains virtually all of the functionality.
I can revert this if quick deprecation/renaming is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's simpler I think OK to just change since this was just released in the RC and not do a deprecation cycle, but @jreback as well for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can get a restored version back pretty quickly. I think avoiding deprecation would be good if it isn't too controversial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, StataWriter118
was new in 1.0? If so, then I think we're OK to change the name for 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger yes, it is new in 1.0.0. I have moved back to changing the class name for 1.0.0 with no reference for 1.1.0, and so no deprecation.
>>> writer.write_file() | ||
""" | ||
|
||
_encoding = "utf-8" | ||
_dta_version = 118 | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you annotate new functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at what I used and see if it looks right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations are not blocking here, nice to have sure
699b44f
to
287ca23
Compare
I can squash/rebase if there is a consensus that targetting 1.0.0 is a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments; do you have sufficient tests coverage for all of the added cases?
pandas/core/frame.py
Outdated
if version == 114: | ||
if convert_strl is not None: | ||
raise ValueError("strl is not supported in format 114") | ||
from pandas.io.stata import StataWriter as statawriter | ||
else: | ||
if version == 117: | ||
from pandas.io.stata import StataWriter117 as statawriter | ||
else: | ||
from pandas.io.stata import StataWriter118 as statawriter | ||
else: # versions 118 and 119 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this just if/elif/else and not a nested if/else
>>> writer.write_file() | ||
""" | ||
|
||
_encoding = "utf-8" | ||
_dta_version = 118 | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations are not blocking here, nice to have sure
I think ok to do this for 1.0.0 |
is there an issue that this would close? if so can you update the top |
Add support for writing Stata 119 format files Improve exception message for unsupported files Fix small issues in to_stata missed in 118 Add annotations
287ca23
to
e4eaecd
Compare
Coverage of new code ( |
There is no issue open for this. I cross-referenced #23573 which is the closest, but is already closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. i think its worth providing a fixture for version. ping on green.
Thanks @bashtage |
Co-authored-by: Kevin Sheppard <[email protected]>
Add support for writing Stata 119 format files
Rename new writer to StataWriterUTF8 since no longer version-specific
Improve exception message for unsupported files
Fix small issues in to_stata missed in 118
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff