Skip to content

ENH: Add Stata 119 writer #30959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2020
Merged

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Jan 13, 2020

Add support for writing Stata 119 format files
Rename new writer to StataWriterUTF8 since no longer version-specific
Improve exception message for unsupported files
Fix small issues in to_stata missed in 118

@bashtage
Copy link
Contributor Author

The rename should be easy now since it isn't released yet.

@simonjayhawkins simonjayhawkins added Enhancement IO Stata read_stata, to_stata labels Jan 13, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good a few minor things

@@ -224,7 +224,7 @@ Other enhancements
- :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` have gained ``ignore_index`` keyword to be able to reset index after sorting (:issue:`30114`)
- :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` have gained ``ignore_index`` keyword to reset index (:issue:`30114`)
- :meth:`DataFrame.drop_duplicates` has gained ``ignore_index`` keyword to reset index (:issue:`30114`)
- Added new writer for exporting Stata dta files in version 118, ``StataWriter118``. This format supports exporting strings containing Unicode characters (:issue:`23573`)
- Added new writer for exporting Stata dta files in version 118 and 119, ``StataWriterUTF8``. This format supports exporting strings containing Unicode characters (:issue:`23573`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already released 1.0 so will want to move to the 1.1 whatsnew when that comes available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that. This is unfortunate. Are deprecation cycles still OK since this renames a class to avoid creating a trivial class for the 119 version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so you want to rename what was added in 1.0 to this right? Maybe OK as an exception if you would just immediately deprecate anyway but cc @TomAugspurger for thoughts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the idea -- it is a very trivial change to enable the new format, but StataWriter118 doesn't make sense when it supports format 119 too.

I've gone down the deprecation path to 2.0 deprecation. TBH I don't think (m)any will use StataWriter118 directly since to_stata contains virtually all of the functionality.

I can revert this if quick deprecation/renaming is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's simpler I think OK to just change since this was just released in the RC and not do a deprecation cycle, but @jreback as well for thoughts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can get a restored version back pretty quickly. I think avoiding deprecation would be good if it isn't too controversial.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, StataWriter118 was new in 1.0? If so, then I think we're OK to change the name for 1.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger yes, it is new in 1.0.0. I have moved back to changing the class name for 1.0.0 with no reference for 1.1.0, and so no deprecation.

>>> writer.write_file()
"""

_encoding = "utf-8"
_dta_version = 118

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you annotate new functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at what I used and see if it looks right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations are not blocking here, nice to have sure

@bashtage bashtage force-pushed the stata-writer-119 branch 2 times, most recently from 699b44f to 287ca23 Compare January 13, 2020 23:45
@bashtage
Copy link
Contributor Author

I can squash/rebase if there is a consensus that targetting 1.0.0 is a good idea.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments; do you have sufficient tests coverage for all of the added cases?

if version == 114:
if convert_strl is not None:
raise ValueError("strl is not supported in format 114")
from pandas.io.stata import StataWriter as statawriter
else:
if version == 117:
from pandas.io.stata import StataWriter117 as statawriter
else:
from pandas.io.stata import StataWriter118 as statawriter
else: # versions 118 and 119
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this just if/elif/else and not a nested if/else

>>> writer.write_file()
"""

_encoding = "utf-8"
_dta_version = 118

def __init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations are not blocking here, nice to have sure

@jreback
Copy link
Contributor

jreback commented Jan 14, 2020

I think ok to do this for 1.0.0

@jreback jreback added this to the 1.0.0 milestone Jan 14, 2020
@jreback
Copy link
Contributor

jreback commented Jan 14, 2020

is there an issue that this would close? if so can you update the top

Add support for writing Stata 119 format files
Improve exception message for unsupported files
Fix small issues in to_stata missed in 118
Add annotations
@bashtage
Copy link
Contributor Author

do you have sufficient tests coverage for all of the added cases?

Coverage of new code (StataWriterUTF8) introduced in 1.0.0 is 100%.

@bashtage
Copy link
Contributor Author

is there an issue that this would close? if so can you update the top

There is no issue open for this. I cross-referenced #23573 which is the closest, but is already closed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. i think its worth providing a fixture for version. ping on green.

@bashtage bashtage requested a review from WillAyd January 14, 2020 14:22
@WillAyd WillAyd merged commit d78c061 into pandas-dev:master Jan 14, 2020
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 14, 2020
@WillAyd
Copy link
Member

WillAyd commented Jan 14, 2020

Thanks @bashtage

simonjayhawkins pushed a commit that referenced this pull request Jan 14, 2020
@bashtage bashtage deleted the stata-writer-119 branch July 28, 2020 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants