Skip to content

ENH: Allow to plot weighted KDEs. #59087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
LucaMingarelli opened this issue Jun 24, 2024 · 3 comments · Fixed by #59337
Closed
1 of 3 tasks

ENH: Allow to plot weighted KDEs. #59087

LucaMingarelli opened this issue Jun 24, 2024 · 3 comments · Fixed by #59337
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@LucaMingarelli
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The current implementation does not currently allow to plot weighted KDEs.

Feature Description

Estimation of the PDF is currently done via scipy.stats.gaussian_kde which allows for a parameter weights.
pandas.DataFrame.plot.kde should accept this parameter as well.

Alternative Solutions

Here allow to pass a parameter weights to scipy.stats.gaussian_kde.

Additional Context

No response

@LucaMingarelli LucaMingarelli added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 24, 2024
@fbourgey
Copy link
Contributor

Hello, I am working on it.

@fbourgey
Copy link
Contributor

I updated the following

https://github.com/fbourgey/pandas/blob/feature-plot-weighted-kde/pandas/plotting/_core.py#L1449
https://github.com/fbourgey/pandas/blob/feature-plot-weighted-kde/pandas/plotting/_matplotlib/hist.py#L266

The code works.

Should we add one example in the function kde with the parameter weights?
Does this function need to be updated as well?

@fbourgey
Copy link
Contributor

The following code gives

s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
ax = s.plot.kde()

Figure_0

Replacing with some weights produces

weights = pd.Series([0.1, 0.0, 0.0, 0.2, 0.3, 0.4, 0.9])
ax = s.plot.kde(weights=weights)

Figure_1

Using a Numpy Array works as well

weights = np.array([0.1, 0.4, 0.0, 0.2, 0.3, 0.4, 0.2])

However, passing a list instead

weights = [0.1, 0.4, 0.0, 0.2, 0.3, 0.4, 0.2]

raises the following error

  File "/Users/florianbourgey/projects/misc/pandas_gaussian_kde.py", line 7, in <module>
    ax = s.plot.kde(weights=weights)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_core.py", line 1567, in kde
    return self(kind="kde", bw_method=bw_method, weights=weights, ind=ind, **kwargs)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_core.py", line 1049, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
    plot_obj.generate()
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/core.py", line 500, in generate
    self._make_plot(fig)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/hist.py", line 168, in _make_plot
    kwds["weights"] = type(self)._get_column_weights(self.weights, i, y)
  File "/Users/florianbourgey/projects/pandas/pandas/plotting/_matplotlib/hist.py", line 202, in _get_column_weights
    weights = weights[~isna(y)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants