Skip to content

ENH: Add Styler.pipe() method (#23229) #23384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 28, 2018

Conversation

nmusolino
Copy link
Contributor

@nmusolino nmusolino commented Oct 27, 2018

Added Styler.pipe() method. This allows users to easily apply and compose functions that operate on Styler objects, just like the DataFrame.pipe() method does for dataframes.

@pep8speaks
Copy link

pep8speaks commented Oct 27, 2018

Hello @nmusolino! Thanks for updating the PR.

Comment last updated on November 18, 2018 at 02:04 Hours UTC

@codecov
Copy link

codecov bot commented Oct 27, 2018

Codecov Report

Merging #23384 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23384      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51471    51473       +2     
==========================================
+ Hits        47515    47517       +2     
  Misses       3956     3956
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.43% <50%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/style.py 96.71% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e7cf48...726d01d. Read the comment docs.

@gfyoung gfyoung added Enhancement Code Style Code style, linting, code_checks labels Oct 27, 2018
@@ -214,6 +214,7 @@ Other Enhancements
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)
- :meth:`Styler.pipe` method added, to simplify application of user-defined functions that operate on stylers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference the issue number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you should create a mini-section explaining the enhancement in more detail so that end users can understand the benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can do that.

@@ -1222,6 +1222,35 @@ class MyStyler(cls):

return MyStyler

def pipe(self, func, *args, **kwargs):
"""
Apply func(self, *args, **kwargs)
Copy link
Member

@gfyoung gfyoung Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a little more descriptive. Apply func to what? Also, for what purpose?

(yes, as reviewers, we can see what the rationale was from your PR, so just put that rationale into the docstring)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll change the doctoring, and try to provide some additional context/background.

def g(**kwargs):
assert 'styler' in kwargs
return kwargs['styler'].data
assert self.df.style.pipe((g, 'styler')) is self.df
Copy link
Member

@gfyoung gfyoung Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment here explaining why you're using is instead of tm.assert_frame_equal ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add a comment explaining why you're creating this g function in the first place (perhaps a more descriptive function name is needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can rewrite this a bit to simplify it. Essentially, I wanted to test that when pipe() is called with a (callable, string) tuple, it works as advertised.

Copy link
Contributor Author

@nmusolino nmusolino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @gfyoung, for the prompt review and feedback. All of your comments make sense, and I'll try to make the suggested changes.

@@ -214,6 +214,7 @@ Other Enhancements
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)
- :meth:`Styler.pipe` method added, to simplify application of user-defined functions that operate on stylers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can do that.

def g(**kwargs):
assert 'styler' in kwargs
return kwargs['styler'].data
assert self.df.style.pipe((g, 'styler')) is self.df
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can rewrite this a bit to simplify it. Essentially, I wanted to test that when pipe() is called with a (callable, string) tuple, it works as advertised.

@@ -1222,6 +1222,35 @@ class MyStyler(cls):

return MyStyler

def pipe(self, func, *args, **kwargs):
"""
Apply func(self, *args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll change the doctoring, and try to provide some additional context/background.

@nmusolino
Copy link
Contributor Author

nmusolino commented Oct 28, 2018

I have not pushed any changes yet, but this what the pipe() docs look like after local changes:

pandas_styler_pipe_docs_1037

@nmusolino
Copy link
Contributor Author

I'll summarize the changes made since the original push:

  • Simplified the second part of the unit test
  • Added "Notes" and "Examples" section to the docstring. A rendering of the HTML documentation can be found in an earlier comment.
  • Added a "version added" directive to the docstring (caught that one myself).
  • Moved the "whatsnew" entry from a one-liner to a short paragraph/example.

I wasn't able to test how the whatsnew will be rendered.

At this point, the main questions are:

  • Do the notes/examples for the new method look reasonable? I'm open to any suggestions there.
  • Should I leave this PR as five distinct commits, or manually rebase it into one commit? I'm reasonably familiar with different approaches, but don't know what the common practice is for pandas.

@@ -180,6 +180,26 @@ array, but rather an ``ExtensionArray``:
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.

New ``Styler.pipe()`` method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A new method :meth:`pandas.formats.style.Styler.pipe()` was added (:issue:`#23229`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styler as gained a pipe method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments about the docstring.

@@ -1222,6 +1222,76 @@ class MyStyler(cls):

return MyStyler

def pipe(self, func, *args, **kwargs):
"""
Apply func(self, \*args, \*\*kwargs), and return the result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add double backticks around func(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

``callable`` that expects the Styler.
args : iterable, optional
positional arguments passed into ``func``.
kwargs : mapping, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the * before args and kwargs. I'd prefer dict intead of mapping. Parameter descriptions should start with a capital letter.

If you run ./scripts/validate_docstrings.py pandas.Styler.pipe you should get a report with most of the docstring issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. As mentioned above, now validating.


Returns
-------
object : the value returned by ``func``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not always returning a pandas.Styler object?

The description should go in the next line (indented), and without the colon. And it should start by a capital letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is determined by func, the user-provided function.

--------
Styler.apply
Styler.applymap
pandas.DataFrame.pipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas. prefix not needed. Please add a description on why those are relevant in the context of Styler.pipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

... ''' Highlight the indicated element, and its containing row. '''
... cell, row = pd.IndexSlice[row_label, column], pd.IndexSlice[row_label, :]
... return (styler.set_properties(cell, **{'background-color': '#ffffcc'})
... .set_properties(row, **{'background-color': '#ffccc0'}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if you can find a simpler example to illustrate .pipe(). Something like adding quotes around the values, or something as simple as that, so the function does not distract on what .pipe() does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good suggestion. I chose this to illustrate functionality that seems to belong in a function, and to use a function that could sensibly be applied twice.

BUT that's not the clearest example for the docs. I'll use a much simpler function as you suggest.

Copy link
Contributor Author

@nmusolino nmusolino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've made changes as requested. Here is an image of the updated documentation:

screen shot 2018-10-29 at 10 31 36 pm

The pandas.DataFrame links are not resolving, but I think that's because I am only building part of the docs, using python make.py --single api.

@@ -1222,6 +1222,76 @@ class MyStyler(cls):

return MyStyler

def pipe(self, func, *args, **kwargs):
"""
Apply func(self, \*args, \*\*kwargs), and return the result.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

``callable`` that expects the Styler.
args : iterable, optional
positional arguments passed into ``func``.
kwargs : mapping, optional
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. As mentioned above, now validating.


Returns
-------
object : the value returned by ``func``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is determined by func, the user-provided function.

--------
Styler.apply
Styler.applymap
pandas.DataFrame.pipe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

... ''' Highlight the indicated element, and its containing row. '''
... cell, row = pd.IndexSlice[row_label, column], pd.IndexSlice[row_label, :]
... return (styler.set_properties(cell, **{'background-color': '#ffffcc'})
... .set_properties(row, **{'background-color': '#ffccc0'}))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good suggestion. I chose this to illustrate functionality that seems to belong in a function, and to use a function that could sensibly be applied twice.

BUT that's not the clearest example for the docs. I'll use a much simpler function as you suggest.

@@ -180,6 +180,26 @@ array, but rather an ``ExtensionArray``:
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.

New ``Styler.pipe()`` method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A new method :meth:`pandas.formats.style.Styler.pipe()` was added (:issue:`#23229`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -180,6 +180,27 @@ array, but rather an ``ExtensionArray``:
This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.

New ``Styler.pipe()`` method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The dataframe Styler class has gained a :meth:`~pandas.io.formats.style.Styler.pipe` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline before this line.

You should be able to link to Styler class here with :class:`pandas.io.formats.style.Styler`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make those changes. I’ll also (a) add an anchor, like .. _whatsnew_0240.enhancements.styler_pipe: and (b) change the code block style to match everything else in the file.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 1, 2018 via email

@TomAugspurger
Copy link
Contributor

Merge conflict in 0.24.0, if you could update.

@nmusolino
Copy link
Contributor Author

I can resolve the merge conflict and turn the first example into a code-block instead of a doctest. I’ll also clean up the whatsnew entry as discussed earlier. I will push some changes Sunday night or Monday night.

@nmusolino
Copy link
Contributor Author

nmusolino commented Nov 6, 2018

Here is the updated method documentation:

[Large, outdated image removed for brevity]

Here is the output from the validate_docstring.py script:

$ python ./scripts/validate_docstrings.py pandas.io.formats.style.Styler.pipe

################################################################################
############### Docstring (pandas.io.formats.style.Styler.pipe)  ###############
################################################################################

Apply ``func(self, *args, **kwargs)``, and return the result.

.. versionadded:: 0.24.0

Parameters
----------
func : function
    Function to apply to the Styler.
    ``*args``, and ``**kwargs`` are passed into ``func``.
    Alternatively a ``(callable, data_keyword)`` tuple where
    ``data_keyword`` is a string indicating the keyword of
    ``callable`` that expects the Styler.
*args : iterable, optional
    Positional arguments passed into ``func``.
**kwargs : dict, optional
    Dictionary of keyword arguments passed into ``func``.

Returns
-------
result : object
    The value returned by ``func``.

See Also
--------
DataFrame.pipe : Analogous method for DataFrame.
Styler.apply : Apply a function row-wise, column-wise, or table-wise to
    modify the dataframe's styling.

Notes
-----
Like :meth:`DataFrame.pipe`, this method can simplify the
application of several user-defined functions to a styler.  Instead
of writing:

.. code-block:: python

    f(g(df.style.set_precision(3), arg1=a), arg2=b, arg3=c)

users can write:

.. code-block:: python

    (df.style.set_precision(3)
       .pipe(g, arg1=a)
       .pipe(f, arg2=b, arg3=c))

In particular, this allows users to define functions that take a
styler object, along with other parameters, and return the styler after
making styling changes (such as calling :meth:`Styler.apply` or
:meth:`Styler.set_properties`).  Using ``.pipe``, these user-defined
style "transformations" can be interleaved with calls to the built-in
Styler interface.

Examples
--------
>>> def set_standard_formatting(styler):
...     return (styler.set_properties(**{'text-align': 'right'})
...                   .format({'X': '{:.1%}'}))

The user-defined highlight function above can be called within a
sequence of other style modifications:

>>> df = pd.DataFrame({'A': list(range(-1, 4)), 'X': np.arange(0.2, 1.2, 0.2)})
>>> (df.style
...    .set_properties(subset=['X'], **{'background-color': 'yellow'})
...    .pipe(set_standard_formatting)
...    .set_caption("Results with column 'X' highlighted."))

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 65, in pandas.io.formats.style.Styler.pipe
Failed example:
    (df.style
       .set_properties(subset=['X'], **{'background-color': 'yellow'})
       .pipe(set_standard_formatting)
       .set_caption("Results with column 'X' highlighted."))
Expected nothing
Got:
    <pandas.io.formats.style.Styler object at 0x11c982b00>

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, still needs some work.

Make sure that the docstring validation does not have any error (and that the CI is green).

Thanks for working on this!

*args : iterable, optional
Positional arguments passed into ``func``.
**kwargs : dict, optional
Dictionary of keyword arguments passed into ``func``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge args and kwargs in a single line, and you don't need types or optional, as those are always the caseÑ:

*args, **kwargs
   Arguments passed to `func`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested. As a result of these changes, the validate_docstring.py script reports an error, but I agree this is better.

The user-defined highlight function above can be called within a
sequence of other style modifications:

>>> df = pd.DataFrame({'A': list(range(-1, 4)), 'X': np.arange(0.2, 1.2, 0.2)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use something that looks real, no range fuctions, and no a, x, foo columns. We start to have many examples with animal names (e.g. cat, penguin, falcon...). I'd use that for consistency (hopefully at some point we have all the examples with similar data). Just have couple of columns that are percentages, and 2 or 3 rows.

Then rename set_standard_formatting to percentage_format (if I understand correctly, that's what it is, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes here, and fixed the reference in the text, which was incorrect. I changed the function name as you suggested, while trying to make it clear that the function was user-defined and application-specific.

I'll be honest, using animal names in examples seems a little silly to me, but that's up to the pandas developers.

In this case, using numeric data rather than strings made more sense to me, since numeric data presents styling/formatting choices, and special values (min/max/null) can be highlighted.

Copy link
Contributor Author

@nmusolino nmusolino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're reaching the point of diminishing marginal returns to the refinements in this PR, and it's about ready to move across the finish line.

Here's the latest docstring as rendered:

screen shot 2018-11-10 at 11 54 21 am

*args : iterable, optional
Positional arguments passed into ``func``.
**kwargs : dict, optional
Dictionary of keyword arguments passed into ``func``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested. As a result of these changes, the validate_docstring.py script reports an error, but I agree this is better.

The user-defined highlight function above can be called within a
sequence of other style modifications:

>>> df = pd.DataFrame({'A': list(range(-1, 4)), 'X': np.arange(0.2, 1.2, 0.2)})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes here, and fixed the reference in the text, which was incorrect. I changed the function name as you suggested, while trying to make it clear that the function was user-defined and application-specific.

I'll be honest, using animal names in examples seems a little silly to me, but that's up to the pandas developers.

In this case, using numeric data rather than strings made more sense to me, since numeric data presents styling/formatting choices, and special values (min/max/null) can be highlighted.

@TomAugspurger
Copy link
Contributor

@nmusolino a few lines are too long:
#23384 (comment)

return (styler.format({'N': '{:,}', 'X': '{:.1%}'})
.set_properties(**{'text-align': 'right'}))

(df.style.set_precision(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is executed. I think the exception at https://travis-ci.org/pandas-dev/pandas/jobs/453316565#L2108 is from this. I'd recommend defining df in this code block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nmusolino
Copy link
Contributor Author

@TomAugspurger, I've fixed the line length issues, and made sure the "whatsnew" documentation builds correctly, and looks reasonable.

Any other changes you would suggest? And would you like me to squash, rebase, and force-push to clean up the commit history?

An updated docstring rendering can be found below.

screen shot 2018-11-17 at 9 02 47 pm

@nmusolino
Copy link
Contributor Author

@jreback @TomAugspurger , could you please take another look at this? I have rebased twice in order to keep up with changes in the whatsnew file, and I'd like to get this PR ready for completion before rebasing again.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if you could fix the conflict and ping when the CI passes I'll merge.

@nmusolino
Copy link
Contributor Author

@TomAugspurger , I resolved the conflict, and CI is passing. Would you be able to merge?

@TomAugspurger
Copy link
Contributor

Looks good, thanks @nmusolino!

@TomAugspurger TomAugspurger merged commit 66abbc3 into pandas-dev:master Nov 28, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* Add Styler.pipe() method, akin to DataFrame.pipe()
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* Add Styler.pipe() method, akin to DataFrame.pipe()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Styler class should have a pipe() method, akin to DataFrame.pipe()
6 participants