-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.DataFrame.to_sparse docstring #20193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @gioiab! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 10, 2018 at 03:48 Hours UTC |
pandas/core/frame.py
Outdated
|
||
Implement the sparse version of the DataFrame meaning that any data matching | ||
a specific value it's omitted in the representation. The sparse DataFrame takes | ||
less memory on disk when pickled and in the Python interpreter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just say its has more efficient storage.
pandas/core/frame.py
Outdated
kind : {'block', 'integer'} | ||
The kind of the SparseIndex tracking where data has been omitted. | ||
The block kind is recommended since it’s more memory efficient: | ||
it tracks just the locations and sizes of the blocks of data that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you break these into 2 bullet points
pandas/core/frame.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.DataFrame.to_dense: converts the DataFrame back to the its dense form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.SparseDataFrame.to_dense
@jreback I've implemented the changes you requested, I'm just waiting the CI to finish. Can you get a final look? |
@jreback the Appveyor build failed, together with at least 50 other ones, all with the same error message: I don't have a retry button to queue another build. Could you please help me on this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. lgtm.
pandas/core/frame.py
Outdated
the fill value: | ||
|
||
- 'block' tracks only the locations and sizes of blocks of data; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no blank lines between cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/frame.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.SparseDataFrame.to_dense : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need pandas. here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to DataFrame.to_dense
instead of pandas.SparseDataFrame.to_dense
: I've built the entire html documentation and the proper hyperlink is generated correctly in this way.
@jreback can I help you in some way in closing this? :) |
@gioiab can you rebase @datapythonista if you'd have a look |
Codecov Report
@@ Coverage Diff @@
## master #20193 +/- ##
=========================================
Coverage ? 91.84%
=========================================
Files ? 153
Lines ? 49275
Branches ? 0
=========================================
Hits ? 45255
Misses ? 4020
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Great changes, I added some comments with formatting things and some ideas.
pandas/core/frame.py
Outdated
See Also | ||
-------- | ||
DataFrame.to_dense : | ||
converts the DataFrame back to the its dense form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you capitalize the first letter and add a period at the end?
I think the description should start in the same line as to_dense
and continue to the next line indented when it doesn't fit.
pandas/core/frame.py
Outdated
- 'block' tracks only the locations and sizes of blocks of data; | ||
- 'integer' keeps an array with all the locations of the data. | ||
|
||
The kind 'block' is recommended since it's more memory efficient. | ||
|
||
Returns | ||
------- | ||
y : SparseDataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of the y
and just leave the type. Also, may be it's a bit redundant in this case, but for consistency I'd add the description of what is returned.
pandas/core/frame.py
Outdated
The kind of the SparseIndex tracking where data is not equal to | ||
the fill value: | ||
|
||
- 'block' tracks only the locations and sizes of blocks of data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the semi-colon at the end is intentional. If you find another docstring with a list, I'd use the same convention.
pandas/core/frame.py
Outdated
>>> df.iloc[:995] = 0. | ||
>>> sdf = df.to_sparse(fill_value=0.) | ||
>>> sdf.density | ||
0.005 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice example. It's just an opinion, feel free to leave it like this, if you think that being somehow more realistic is better, but I'd have something like: pd.DataFrame([np.nan, np.nan, 1., np.nan])
. You can make it sparse with default arguments, then create another example with zeros instead of NaN
and use fill_value
. And if you find a nice way to illustrate the difference, I'd add an example with kind='integer'
.
pandas/core/frame.py
Outdated
|
||
Parameters | ||
---------- | ||
fill_value : float, default NaN | ||
The specific value that should be omitted in the representation. | ||
kind : {'block', 'integer'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is missing.
pandas/core/frame.py
Outdated
- 'block' tracks only the locations and sizes of blocks of data; | ||
- 'integer' keeps an array with all the locations of the data. | ||
|
||
The kind 'block' is recommended since it's more memory efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the same, but in a way that doesn't sound like 'block' is always better. Something like "In most cases, the default 'block' is preferred for being more memory efficient.".
Not a big difference, but I'd prefer to avoid giving the idea that 'integer' is never the best option, which is not true.
Thanks @gioiab for the contribution. And sorry it took a while to merge it. |
* Updates the documentation for pandas.DataFrame.to_sparse. * Minor fixes and adding more real world examples
Updates the docstring for the to_sparse function.
Here is the output of the validation script: