Skip to content

ENH: add a gradient map to background gradient #39930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Apr 8, 2021

Conversation

attack68
Copy link
Contributor

@attack68 attack68 commented Feb 20, 2021

closes #22727
closes #28901

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

A very minor extension to the code and adding one argument gives the function a lot more scope:

Screen Shot 2021-02-20 at 14 04 14

@attack68 attack68 marked this pull request as draft February 20, 2021 12:39
@attack68 attack68 changed the title [WIP] ENH: add a gradient map to background gradient ENH: add a gradient map to background gradient Feb 20, 2021
@attack68 attack68 marked this pull request as ready for review February 20, 2021 14:34
@jreback jreback added Enhancement Styler conditional formatting using DataFrame.style labels Feb 21, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conceptually i guess is ok, but this function is getting now a lot of arguments, If gmap does not interact with the other arguments, then i thin a separate function would be better here.

@attack68
Copy link
Contributor Author

attack68 commented Feb 22, 2021

conceptually i guess is ok, but this function is getting now a lot of arguments, If gmap does not interact with the other arguments, then i thin a separate function would be better here.

For working with color gradients it is quite common to have a gradient map as an input, for example if you have text, or non-quantitative data. I think a second function would repeat a lot of code and a lot of documentation, since you would still want, low high vmin vmax and text_threshold.

Edit: I was also, later, going to propose a text_gradient as well as background_gradient (which colors text instead of cell background). Then I don't think you want 4 methods instead of 2, each with a gmap arg.

@attack68 attack68 requested a review from jreback February 22, 2021 17:25
if gmap is None:
gmap = s.to_numpy(dtype=float)
else:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, shouldn't this just be a DataFrame that is aligned? why go to all of this complication (and its fragile)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all cases the gmap has to match the data shape s if it doesn't it would just raise downstream error. This way, imo the code is simple and concise and provides both flexibility of input and helpful user error feedback.

in the case axis in [0,1] then s, the data, is guaranteed a 1d-series, and you can supply, as gmap either 1d or 2d inputs provided the second dim has length 1 so it can be reshaped:

df = DataFrame([[1,2],[3,4]])
...background_gradient(gmap=[2,4], axis=0)  # 1d list
...background_gradient(gmap=df[1], axis=0)  # series
...background_gradient(gmap=df[[1]], axis=0)  # dataframe
...background_gradient(gmap=[[2],[4]], axis=0)  # 2d list
...background_gradient(gmap=np.array([[2],[4]]), axis=0) # 2d numpy
...background_gradient(gmap=np.array([2,4]), axis=0)  # 1d numpy             ALL WORK

in the case axis is None and s is a DataFrame with 2 cols or more then you need a 2d input (or in edge-case a ravelled 1d input):

....background_gradient(gmap=[[2,2],[4,4]], axis=None)  # 2d list
...background_gradient(gmap=np.array([[2,2],[4,4]]), axis=None)  # 2d numpy
...background_gradient(gmap=df, axis=None)  # dataframe
...background_gradient(gmap=[2,2,4,4], axis=None)  # 1d edge case.

In the case axis is None and the the data is a 1-col dataframe any of the 1d cases will again work, e.g

.background_gradient(gmap=[2,4], subset=[0], axis=None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to for sure align if a dataframe is passed.
ok on 2D. why is accepting 1D ok ? this is just too flexible. I dont' see any advantage to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok how's this: I've split the 1d and 2d cases still compare shape at end with error if they don't match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are missing my point. what does a 2D gradient map mean / do? I don't see any tests that actually use this nor what its supposed to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are tests now for 2D arrays and DataFrames, for minimalist correct functionality, but not to demonstrate purpose.

the purpose of a 2D map is generally improved visualisation of 2D tabular data. If you just shade the background of a dataframe according to data that you can already read in the cell it just looks nice but doesn't convey information. But if you can, optionally, shade a background based on an alternate 2D map (axis=None) then you are providing a visual cross section of information analysis, like my examples given below, and say the performance grid here You can't achieve this with only the 1D (axis=0,1) functionality.

You could technically do heat maps currently but you would have to hide the cell data with CSS. This PR offers an easier solution for something like Look at 9. Heat Map (no text in cell). And you can only shade string cells with a provided quantitative map: something like doing this

@jnothman
Copy link
Contributor

I'm interested in this feature (I've struggled with stylers having to apply to the data in the df, not some transform of it). I'm not convinced by the name gmap. It doesn't map anything, it replaces; and g does not need mentioning.

I think what you mean by the parameter is "surrogate data to use as the gradient function"; indeed, the underlying data is forgotten when a gmap is used. I'd most likely call this parameter values, but generally think it could do with some thought.

I also note that a future iteration might allow gmap (or whatever) to be a callable, enabling application to more generic dataframes.

@attack68
Copy link
Contributor Author

I'm interested in this feature (I've struggled with stylers having to apply to the data in the df, not some transform of it). I'm not convinced by the name gmap. It doesn't map anything, it replaces; and g does not need mentioning.

I understand it to be the map used to render the cmap (color map), as opposed to using the data as the map. I naturally used gmap because gradient map appears in other visual editor software photoshop or Maya, the g was to better distinguish from c. Sometimes in other software what is called the cmap here can be called the gradient map otherwise! The function is called background_gradient so having a gradient map also seemed natural.

That said, I'm not tied to it. If @jreback agrees (or anyone else) I will happily change the name, to a new suggestion.

I also note that a future iteration might allow gmap (or whatever) to be a callable, enabling application to more generic dataframes.

I agree with you on this. I hadn't thought of this or of a use case yet, and it would just complicate the PR so best to leave out initially I think.

@attack68 attack68 added this to the 1.3 milestone Mar 17, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really really try to reduce complexity

subset = self.data.select_dtypes(include=np.number).columns

self.apply(
self._background_gradient,
partial(self._background_gradient, axis=axis),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really non-obvious why this is needed and is adding complexity. what is the reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using axis to determine if apply-generated-object was a DataFrame or Series, I just tested directly and have removed the axis argument, and therefore the partial.

@@ -1512,26 +1585,24 @@ def _background_gradient(
text_color_threshold: float = 0.408,
vmin: Optional[float] = None,
vmax: Optional[float] = None,
gmap: Optional[Union[Sequence, np.ndarray, FrameOrSeries]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need not be a staticmethod which is adding all kind of complexity here, make this a method as you now need to pass axis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a static method before I made modifications. It does not need to be a static method so have taken it module level.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can you merge master, let's try to do this before the refactoring ones

…nt_gmap

# Conflicts:
#	pandas/io/formats/style.py
@attack68
Copy link
Contributor Author

attack68 commented Apr 3, 2021

@jreback pinging have merged master. this is greenish - test failure of ipython directive is unrelated.

@attack68
Copy link
Contributor Author

attack68 commented Apr 7, 2021

@jreback finally got full green on this, and have merged all the recent typing changes in. hopefully we can finalise this and then the refactor

@jreback jreback merged commit 5e6cb1f into pandas-dev:master Apr 8, 2021
@jreback
Copy link
Contributor

jreback commented Apr 8, 2021

thanks @attack68 very nice!

@attack68 attack68 deleted the background_gradient_gmap branch April 9, 2021 05:28
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Styler conditional formatting using DataFrame.style
Projects
None yet
3 participants