-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: #22896, Fixed docstring of to_stata in pandas/core/frame.py #23152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @ryankarlos! Thanks for updating the PR.
Comment last updated on November 20, 2018 at 21:28 Hours UTC |
Ive followed the docstring format here https://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html Ive added in a returns section for to_stata in pandas/core/frame.py, corrected linting issues and modified the examples as some of the variables (mainly for the date example) were not defined and hence failing some of the tests. For some reason, the tests were throwing up linting errors for an unrelated older pull request - but Ive corrected these as well. |
Thanks for the work on this @ryankarlos Can you leave in this PR just the changes in the I also see that there are PEP-8 issues in the examples, and may be they can be simplified a bit. |
Codecov Report
@@ Coverage Diff @@
## master #23152 +/- ##
=======================================
Coverage 92.29% 92.29%
=======================================
Files 161 161
Lines 51493 51493
=======================================
Hits 47524 47524
Misses 3969 3969
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
>>> dates_1 = pd.to_datetime(["2016-05-06","2017-05-07", "2018-05-10"]) | ||
>>> dates_2 = pd.to_datetime(["2016-11-06","2017-11-07", "2018-11-10"]) | ||
>>> df_dates = pd.DataFrame({'date1': dates_1, 'date2': dates_2}) | ||
>>> df_dates.to_stata('./date_data_file.dta', {2 : 'tw'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the {2: 'tw'}
means? I know this is part of the original docstring, but I think we should make the examples in a way that they explain what is going on. I don't know much about stata or this function, and this without explanation is not useful to me.
Also, would be great to simplify this. May be just a date column if that makes sense, without creating variables first, and may be using date_range
to avoid so much information that does not add value. And if we can create a single DataFrame at the beginning that is useful for all functions, that would be much better.
Finally, if we can use something looking more like a real example. I think it gives a better impression and helps understand than a [1, 2, 3]. We've got some examples with animals in pandas/core/generic.py
, I'd use one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tw
is a Stata specific option for displaying time in weeks i believe - i have updated it with a single data frame using the animals examples from pandas/core/generic.py
and excluded the separate example specifically for dates (i agree it does not really add any value).
pandas/core/frame.py
Outdated
|
||
Alternatively you can create an instance of the StataWriter class | ||
|
||
>>> writer = StataWriter('./data_file.dta', data) | ||
>>> StataWriter = pd.io.stata.StataWriter | ||
>>> writer = StataWriter('./data_file.dta', df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use instead (assuming you're using the animals data):
>>> writer = pd.io.stata.StataWriter('animals.dta', df)
2016051
to
a0ca4d7
Compare
@datapythonista Sorry for that, I have undone the changes to the other docstring and only left the |
@datapythonista let me know if there are any updates required |
Looks good, added some comments. |
@datapythonista Did you add comments to the latest commit - as I cannot find anything ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't have a computer right now, and it's tricky to do the reviews from the phone. I think the comments should be added now.
pandas/core/frame.py
Outdated
@@ -1846,12 +1846,15 @@ def to_stata(self, fname, convert_dates=None, write_index=True, | |||
data_label=None, variable_labels=None, version=114, | |||
convert_strl=None): | |||
""" | |||
Export Stata binary dta files. | |||
Converting data frame object to Stata dta format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use DataFrame instead of data frame.
And dta between quotes to be consistent with the next paragraph (or probably better to remove them there).
pandas/core/frame.py
Outdated
Export Stata binary dta files. | ||
Converting data frame object to Stata dta format. | ||
|
||
Writes the Dataframe to a Stata dataset file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame with capital F
pandas/core/frame.py
Outdated
Converting data frame object to Stata dta format. | ||
|
||
Writes the Dataframe to a Stata dataset file. | ||
"dta" files contain a Stata dataset. | ||
|
||
Parameters | ||
---------- | ||
fname : path (string), buffer or path object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types in this line should be Python types. I think in this case should be: str, file descriptor or pathlib.Path
pandas/core/frame.py
Outdated
@@ -1896,6 +1899,10 @@ def to_stata(self, fname, convert_dates=None, write_index=True, | |||
|
|||
.. versionadded:: 0.23.0 | |||
|
|||
Returns | |||
------ | |||
Stata (dta) file or StataWriter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the Python returned type of the function. I don't think it returns anything. If this is the case just remove the section.
pandas/core/frame.py
Outdated
|
||
Examples | ||
-------- | ||
>>> data.to_stata('./data_file.dta') | ||
Converting dataframe with date column to Stata dta file | ||
using the to_stata method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea about stata, but I don't think having a date column and indices adds any value here unless we can explain why.
I'd use a much simpler example. And I'd leave the StataWriter example to that docstring.
@datapythonista Thanks for the comments - incorporated the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @ryankarlos, I've just seen that I didn't submit the comments of the last review (I did it from the phone, and it's tricky).
Besides those, can you edit ci/code_checks.py
and in the part of the doctests, stop excluding this docstring, as they should now pass.
Thanks!
pandas/core/frame.py
Outdated
... 'parrot'], | ||
... 'speed': [350, 18, 361, 15]}).set_index(['date', | ||
... 'animal']) | ||
... 'speed': [350, 18, 361, 15]}) | ||
>>> df.to_stata('animals.dta') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add # doctest: +SKIP
so the test is not run, and the file is not generated and left in the disk.
pandas/core/frame.py
Outdated
then the buffer will not be automatically closed after the file | ||
data has been written. | ||
fname : str, file descriptor or pathlib.Path | ||
String or path to file which needs to be converted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the original description more.
pandas/core/frame.py
Outdated
pandas.io.stata.StataWriter117 : low-level writer for version 117 files | ||
pandas.read_stata : Import Stata data files. | ||
pandas.io.stata.StataWriter : Writer for Stata data files. | ||
pandas.io.stata.StataWriter117 : Writer for version 117 files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pandas.
prefix is unnecessary.
pandas/core/frame.py
Outdated
>>> writer.write_file() | ||
|
||
With dates: | ||
Converting DataFrame to Stata "dta" file using the to_stata method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is obvious given the context, I'd remove it.
2e4c2cb
to
42fec06
Compare
@datapythonista added requested changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Can you add optional
to the data_label
parameter, and the default to version : {114, 117}, default 114
.
I think with that is ready to be merged (if you can also take a look at the CI and make sure it's green, so can merge).
42fec06
to
994f0b9
Compare
@datapythonista Thanks, changes added - all green now |
lgtm, just changed a word. I'll wait for for someone else to merge this, so we have an extra pair of eyes on it. Thanks @ryankarlos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny comments. @bashtage if you can have a look.
d67588e
to
64e02d0
Compare
@jreback All green now |
thanks @ryankarlos |
git diff upstream/master -u -- "*.py" | flake8 --diff
./scripts/validate_docstrings.py pandas.DataFrame.to_stata