Skip to content

Remove all preprocessing from PlotAccessor.__call__ #29412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomAugspurger opened this issue Nov 5, 2019 · 2 comments
Open

Remove all preprocessing from PlotAccessor.__call__ #29412

TomAugspurger opened this issue Nov 5, 2019 · 2 comments
Labels
Needs Discussion Requires discussion from core team before further action Refactor Internal refactoring of code Visualization plotting

Comments

@TomAugspurger
Copy link
Contributor

In #28647, we discovered that PlotAccessor.__call___ does quite a bit of preprocessing of the data before handing it off to the backend.

# The original data structured can be transformed before passed to the
# backend. For example, for DataFrame is common to set the index as the
# `x` parameter, and return a Series with the parameter `y` as values.
data = self._parent.copy()
if isinstance(data, ABCSeries):
kwargs["reuse_plot"] = True
if kind in self._dataframe_kinds:
if isinstance(data, ABCDataFrame):
return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
else:
raise ValueError(
("plot kind {} can only be used for data frames").format(kind)
)
elif kind in self._series_kinds:
if isinstance(data, ABCDataFrame):
if y is None and kwargs.get("subplots") is False:
msg = "{} requires either y column or 'subplots=True'"
raise ValueError(msg.format(kind))
elif y is not None:
if is_integer(y) and not data.columns.holds_integer():
y = data.columns[y]
# converted to series actually. copy to not modify
data = data[y].copy()
data.index.name = y
elif isinstance(data, ABCDataFrame):
data_cols = data.columns
if x is not None:
if is_integer(x) and not data.columns.holds_integer():
x = data_cols[x]
elif not isinstance(data[x], ABCSeries):
raise ValueError("x must be a label or position")
data = data.set_index(x)
if y is not None:
# check if we have y as int or list of ints
int_ylist = is_list_like(y) and all(is_integer(c) for c in y)
int_y_arg = is_integer(y) or int_ylist
if int_y_arg and not data.columns.holds_integer():
y = data_cols[y]
label_kw = kwargs["label"] if "label" in kwargs else False
for kw in ["xerr", "yerr"]:
if kw in kwargs and (
isinstance(kwargs[kw], str) or is_integer(kwargs[kw])
):
try:
kwargs[kw] = data[kwargs[kw]]
except (IndexError, KeyError, TypeError):
pass
# don't overwrite
data = data[y].copy()
if isinstance(data, ABCSeries):
label_name = label_kw or y
data.name = label_name
else:
match = is_list_like(label_kw) and len(label_kw) == len(y)
if label_kw and not match:
raise ValueError(
"label should be list-like and same length as y"
)
label_name = label_kw or data.columns
data.columns = label_name
return plot_backend.plot(data, kind=kind, **kwargs)

Depending on the plot kind and backend, some of this preprocesing may not be appropriate. The backend might want to see the "raw" data. That's not to say we shouldn't do any preprocessing though. We'll need some feedback from backend authors on what should be done before handing it off.

@TomAugspurger TomAugspurger added Visualization plotting Needs Discussion Requires discussion from core team before further action labels Nov 5, 2019
@TomAugspurger
Copy link
Contributor Author

cc @jsignell

@jsignell
Copy link
Contributor

jsignell commented Nov 7, 2019

Thanks for writing this up Tom. I think in particular the issue for hvplot was that the preprocessing step was assuming that the the columns of interest can be known on the pandas side. This doesn't hold true for hvplot because there are additional kwargs (groupby, by, row, ...) that accept column names.

@mroeschke mroeschke added the Refactor Internal refactoring of code label Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action Refactor Internal refactoring of code Visualization plotting
Projects
None yet
Development

No branches or pull requests

3 participants