DOC: update the pandas.DataFrame.to_sparse docstring #20193

gioiab · 2018-03-10T16:05:40Z

Updates the docstring for the to_sparse function.

Here is the output of the validation script:


################################################################################
#################### Docstring (pandas.DataFrame.to_sparse) ####################
################################################################################

Convert to SparseDataFrame.

Implement the sparse version of the DataFrame meaning that any data matching
a specific value it's omitted in the representation. The sparse DataFrame takes
less memory on disk when pickled and in the Python interpreter.

Parameters
----------
fill_value : float, default NaN
    The specific value that should be omitted in the representation.
kind : {'block', 'integer'}
    The kind of the SparseIndex tracking where data has been omitted.
    The block kind is recommended since it’s more memory efficient:
    it tracks just the locations and sizes of the blocks of data that
    are not equal to the fill value while the integer kind keeps an
    array with all those locations.

Returns
-------
y : SparseDataFrame

See Also
--------
pandas.DataFrame.to_dense: converts the DataFrame back to the its dense form

Examples
--------

Compressing on the zero value.

>>> df = pd.DataFrame(np.random.randn(1000, 4))
>>> df.iloc[:995] = 0.
>>> sdf = df.to_sparse(fill_value=0.)
>>> sdf.density
0.005

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.to_sparse" correct. :)

pep8speaks · 2018-03-10T16:05:43Z

Hello @gioiab! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 10, 2018 at 03:48 Hours UTC

jreback · 2018-03-10T17:14:25Z

pandas/core/frame.py

+
+        Implement the sparse version of the DataFrame meaning that any data matching
+        a specific value it's omitted in the representation. The sparse DataFrame takes
+        less memory on disk when pickled and in the Python interpreter.


You can just say its has more efficient storage.

jreback · 2018-03-10T17:14:52Z

pandas/core/frame.py

        kind : {'block', 'integer'}
+            The kind of the SparseIndex tracking where data has been omitted.
+            The block kind is recommended since it’s more memory efficient:
+            it tracks just the locations and sizes of the blocks of data that


can you break these into 2 bullet points

jreback · 2018-03-10T17:15:13Z

pandas/core/frame.py

+
+        See Also
+        --------
+        pandas.DataFrame.to_dense: converts the DataFrame back to the its dense form


pandas.SparseDataFrame.to_dense

gioiab · 2018-03-10T18:38:13Z

@jreback I've implemented the changes you requested, I'm just waiting the CI to finish. Can you get a final look?

gioiab · 2018-03-10T23:47:05Z

@jreback the Appveyor build failed, together with at least 50 other ones, all with the same error message: Command executed with exception: Cannot index into a null array.

I don't have a retry button to queue another build. Could you please help me on this? Thanks!

jreback

minor comments. lgtm.

jreback · 2018-03-11T15:07:27Z

pandas/core/frame.py

+            the fill value:
+
+            - 'block' tracks only the locations and sizes of blocks of data;
+


no blank lines between cases

jreback · 2018-03-11T15:07:43Z

pandas/core/frame.py

+
+        See Also
+        --------
+        pandas.SparseDataFrame.to_dense :


don't need pandas. here

I changed this to DataFrame.to_dense instead of pandas.SparseDataFrame.to_dense: I've built the entire html documentation and the proper hyperlink is generated correctly in this way.

gioiab · 2018-03-28T13:00:56Z

@jreback can I help you in some way in closing this? :)

jreback · 2018-04-14T13:50:20Z

@gioiab can you rebase

@datapythonista if you'd have a look

codecov · 2018-04-14T23:29:23Z

Codecov Report

❗ No coverage uploaded for pull request base (master@44691ee). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #20193   +/-   ##
=========================================
  Coverage          ?   91.84%           
=========================================
  Files             ?      153           
  Lines             ?    49275           
  Branches          ?        0           
=========================================
  Hits              ?    45255           
  Misses            ?     4020           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.23% <ø> (?)`
#single	`41.91% <ø> (?)`

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.15% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44691ee...dcdf0bb. Read the comment docs.

datapythonista

Sorry for the late review. Great changes, I added some comments with formatting things and some ideas.

datapythonista · 2018-04-15T12:17:41Z

pandas/core/frame.py

+        See Also
+        --------
+        DataFrame.to_dense :
+            converts the DataFrame back to the its dense form


Can you capitalize the first letter and add a period at the end?

I think the description should start in the same line as to_dense and continue to the next line indented when it doesn't fit.

datapythonista · 2018-04-15T12:18:36Z

pandas/core/frame.py

+            - 'block' tracks only the locations and sizes of blocks of data;
+            - 'integer' keeps an array with all the locations of the data.
+
+            The kind 'block' is recommended since it's more memory efficient.

        Returns
        -------
        y : SparseDataFrame


You can get rid of the y and just leave the type. Also, may be it's a bit redundant in this case, but for consistency I'd add the description of what is returned.

datapythonista · 2018-04-15T12:19:16Z

pandas/core/frame.py

+            The kind of the SparseIndex tracking where data is not equal to
+            the fill value:
+
+            - 'block' tracks only the locations and sizes of blocks of data;


Not sure if the semi-colon at the end is intentional. If you find another docstring with a list, I'd use the same convention.

datapythonista · 2018-04-15T12:23:40Z

pandas/core/frame.py

+        >>> df.iloc[:995] = 0.
+        >>> sdf = df.to_sparse(fill_value=0.)
+        >>> sdf.density
+        0.005


Nice example. It's just an opinion, feel free to leave it like this, if you think that being somehow more realistic is better, but I'd have something like: pd.DataFrame([np.nan, np.nan, 1., np.nan]). You can make it sparse with default arguments, then create another example with zeros instead of NaN and use fill_value. And if you find a nice way to illustrate the difference, I'd add an example with kind='integer'.

datapythonista · 2018-04-15T12:23:54Z

pandas/core/frame.py


        Parameters
        ----------
        fill_value : float, default NaN
+            The specific value that should be omitted in the representation.
        kind : {'block', 'integer'}


The default is missing.

datapythonista · 2018-04-15T12:26:09Z

pandas/core/frame.py

+            - 'block' tracks only the locations and sizes of blocks of data;
+            - 'integer' keeps an array with all the locations of the data.
+
+            The kind 'block' is recommended since it's more memory efficient.


I'd say the same, but in a way that doesn't sound like 'block' is always better. Something like "In most cases, the default 'block' is preferred for being more memory efficient.".

Not a big difference, but I'd prefer to avoid giving the idea that 'integer' is never the best option, which is not true.

datapythonista · 2018-07-10T13:06:32Z

Thanks @gioiab for the contribution. And sorry it took a while to merge it.

* Updates the documentation for pandas.DataFrame.to_sparse. * Minor fixes and adding more real world examples

jreback requested changes Mar 10, 2018

View reviewed changes

jreback added Docs Sparse Sparse Data Type labels Mar 10, 2018

gioiab force-pushed the master branch from dbac372 to aad60ce Compare March 10, 2018 17:55

gioiab force-pushed the master branch from aad60ce to c2cc096 Compare March 11, 2018 13:42

jreback approved these changes Mar 11, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Mar 11, 2018

gioiab force-pushed the master branch from c2cc096 to cada053 Compare March 11, 2018 21:33

jreback removed this from the 0.23.0 milestone Apr 14, 2018

Updates the documentation for pandas.DataFrame.to_sparse.

087f441

gioiab force-pushed the master branch from cada053 to 087f441 Compare April 14, 2018 22:04

datapythonista reviewed Apr 15, 2018

View reviewed changes

datapythonista added 2 commits July 9, 2018 18:08

Merge remote-tracking branch 'upstream/master' into gioiab-master

ae0e96e

Minor fixes and adding more real world examples

dcdf0bb

datapythonista merged commit 1dd05cc into pandas-dev:master Jul 10, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: update the pandas.DataFrame.to_sparse docstring (pandas-dev#20193)

d3ff6b8

* Updates the documentation for pandas.DataFrame.to_sparse. * Minor fixes and adding more real world examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the pandas.DataFrame.to_sparse docstring #20193

DOC: update the pandas.DataFrame.to_sparse docstring #20193

gioiab commented Mar 10, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

jreback Mar 10, 2018

jreback Mar 10, 2018

jreback Mar 10, 2018

gioiab commented Mar 10, 2018

gioiab commented Mar 10, 2018

jreback left a comment

jreback Mar 11, 2018

gioiab Mar 11, 2018

jreback Mar 11, 2018

gioiab Mar 11, 2018

gioiab commented Mar 28, 2018

jreback commented Apr 14, 2018

codecov bot commented Apr 14, 2018 •

edited

Loading

datapythonista left a comment

datapythonista Apr 15, 2018

datapythonista Apr 15, 2018

datapythonista Apr 15, 2018

datapythonista Apr 15, 2018

datapythonista Apr 15, 2018

datapythonista Apr 15, 2018

datapythonista commented Jul 10, 2018

		the fill value:

		- 'block' tracks only the locations and sizes of blocks of data;

DOC: update the pandas.DataFrame.to_sparse docstring #20193

DOC: update the pandas.DataFrame.to_sparse docstring #20193

Conversation

gioiab commented Mar 10, 2018

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on July 10, 2018 at 03:48 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gioiab commented Mar 10, 2018

gioiab commented Mar 10, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gioiab commented Mar 28, 2018

jreback commented Apr 14, 2018

codecov bot commented Apr 14, 2018 • edited Loading

Codecov Report

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Jul 10, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

codecov bot commented Apr 14, 2018 •

edited

Loading