Skip to content

ENH: plotting methods can unpack labeled data [MOVED TO #4829] #4787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

tacaswell
Copy link
Member

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew
Rocklin and Jake VanderPlas this is a proposal for how to deal with
labeled data in matplotlib.

The approach taken is that if the optional kwarg 'data' is passed in any
other string args/kwargs to the function are replaced be the result
data[k] if the key exists in data, else leave the value as-is.

Fernando made a compelling case that this needs to go in ASAP.

This still needs docs + tests + a bit more thought on how to deal with functions where we do some internal broadcasting (mostly plot). Maybe pass in names as a coma separated list? I would prefer to, long term, simplify the low-level plot and have either the users do the looping or provide higher-level plotting functions which do the looping.

There is the possibility that some of the string args/kwargs we already take may conflict with names in the labeled data (ex ha='center' would not work with a data structure where 'center' in data).

@pzwang expressed concern that we may be painting ourselves into a corner with this API as it is mostly just the difference between

ax.plot('a', 'b', data=LD)

vs

ax.plot(LD['a'], LD['b']) 

The unpacking attempts can be disabled via a rcparam. That could also be implemented as an import time rcparam which disables the decorator all together.

This should work with any data object that supports getitem and returns something that np.asarray works on.

attn @matplotlib/developers @jakevdp @fperez @mrocklin @ellisonbg @pzwang @mwaskom @jreback @andrewcollette

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew
Rocklin and Jake VanderPlas this is a proposal for how to deal with
labeled data in matplotlib.

The approach taken is that if the optional kwarg 'data' is passed in any
other string args/kwargs to the function are replaced be the result
`data[k]` if the key exists in `data`, else leave the value as-is.
@tacaswell tacaswell added this to the next point release milestone Jul 25, 2015
@mwaskom
Copy link

mwaskom commented Jul 25, 2015

Nice!

@jreback
Copy link

jreback commented Jul 25, 2015

+1 on this from my perspective as well. IIRC this is the protocol we discussed to have pandas internally dispatch to matplotlib as well.

@jkseppan
Copy link
Member

This sounds like getting closer to R's plotting functions. With the function call mechanism in R, you can do

plot(x, y**2, data=foo)

and the plot function sees the expressions passed in as arguments and controls how they get evaluated. People will naturally ask for expression support next, and if we use strings as placeholders for values, we'll need to implement a parser for some expression language.

Alternative design: let users pass in sympy symbols or expressions, as in

x, y = symbols('x y')
plot(x, y, data={x: ..., y: ...})

These won't clash with string values of keyword arguments, and the generalization to expressions is simple. To avoid a dependency on sympy, we can deliver a simple version of symbol objects ourselves.

@r-owen
Copy link
Contributor

r-owen commented Jul 25, 2015

Will this work with line attributes? I would love a simple, consistent way to specify per-point colors, marker styles and marker sizes, and being able to specify that data in a record array and associate each keyword sounds like it would do the job very neatly.

— Russell

On Jul 25, 2015, at 1:07 AM, Thomas A Caswell [email protected] wrote:

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew
Rocklin and Jake VanderPlas this is a proposal for how to deal with
labeled data in matplotlib.

The approach taken is that if the optional kwarg 'data' is passed in any
other string args/kwargs to the function are replaced be the result
data[k] if the key exists in data, else leave the value as-is.

Fernando made a compelling case that this needs to go in ASAP.

This still needs docs + tests + a bit more thought on how to deal with functions where we do some internal broadcasting (mostly plot). Maybe pass in names as a coma separated list? I would prefer to, long term, simplify the low-level plot and have either the users do the looping or provide higher-level plotting functions which do the looping.

There is the possibility that some of the string args/kwargs we already take may conflict with names in the labeled data (ex ha='center' would not work with a data structure where 'center' in data).

@pzwang expressed concern that we may be painting ourselves into a corner with this API as it is mostly just the difference between

ax.plot('a', 'b', data=LD)
vs

ax.plot(LD['a'], LD['b'])
The unpacking attempts can be disabled via a rcparam. That could also be implemented as an import time rcparam which disables the decorator all together.

This should work with any data object that supports getitem and returns something that np.asarray works on.

attn @matplotlib/developers @jakevdp @fperez @mrocklin @ellisonbg @pzwang @mwaskom @jreback @andrewcollette

You can view, comment on, or merge this pull request online at:

#4787

Commit Summary

ENH: plotting methods can unpack labeled data
File Changes

M lib/matplotlib/init.py (39)
M lib/matplotlib/axes/_axes.py (46)
M lib/matplotlib/rcsetup.py (5)
Patch Links:

https://github.com/matplotlib/matplotlib/pull/4787.patch
https://github.com/matplotlib/matplotlib/pull/4787.diff

Reply to this email directly or view it on GitHub.

@tacaswell
Copy link
Member Author

@jkseppan You hit the inspiration on the head 😉 The reason that it checks if the place holder is a string instead of just trying [] is that if you pass arrays into get item of data frames you get index related errors rather than KeyError. It might be possible to check if the arg is hashable, but I worry that will do bad things with tuples as input. In anycase, I don't think the current PR locks us into an API where we can't extent the place holders to other types.

I am not super excited about adding that sort of computation into core of mpl. I think it is important to (at the low level) keep computation/analysis logic separated from the plotting logic.

@r-owen There are some limitations to that due to details of the underlying artists work (all markers from plot must be the same shape and color, all markers from scatter must be the same shape (but can scale size, color and rotation). You can use cycler + a for loop + groupby to easily get some of what you want. Function like that are undeniably useful, but how to correctly parameterize and implement that (without pinning our selves to requiring pandas as input and without reimplementing pandas) is not clear yet.

@mwaskom
Copy link

mwaskom commented Jul 25, 2015

Will/could this label axes, and do other smart things with the labels? That is (IMO) a major motivation for this style of invocation.

@jakevdp
Copy link
Contributor

jakevdp commented Jul 25, 2015

Thanks for this @tacaswell! One other feature that would be nice, though it would complicate the logic a bit. I'd love to have the created plot elements be automatically labeled. So then these two things would be equivalent:

plt.plot(data['t'], data['x'], label='x')
plt.plot(data['t'], data['y'], label='y')
plt.legend()

and

plt.plot('t', 'x', data=data)
plt.plot('t', 'y', data=data)
plt.legend()

it would involve inferring which is the y value in any appropriate function, and automatically setting the label= attribute if it's not already set.

if rcParams['unpack_labeled']:
args = tuple(_replacer(data, a) for a in args)
kwargs = dict((k, _replacer(data, v))
for k, v in six.iteritems(kwargs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to implement a positive/negative list of at least kwargs which should be replaced and some which should not replaced?

Like

     @unpack_labeled_data([1], ["labels", "colors"]) # the second arg and two kwargs should be replaced
     def pie(self, x, explode=None, labels=None, colors=None,
             autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1,
             startangle=None, radius=None, counterclock=True,
[...]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakevdp Asked the same question. I didn't go down that route because it is a bit more sutble as you would have to do

@unpack_labeled_data([1, 3, 4], ['x', 'labels', 'colors'])

which now that I type out what it would look like isn't so bad.

Probably will have to special case plot and maybe a few others with overly permissive APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erk, didn't know that :-(

Not sure if that helps, but:

def func(x, y=1):
    print("x: %s, y: %s" % (x,y))
inspect.getargspec(func)
ArgSpec(args=['x', 'y'], varargs=None, keywords=None, defaults=(1,))
  1. only pass in the names of the args in the decorator as replace_names
  2. in the decorator: cache the list of arg names via inspect: cached_names
  3. in the wrapper:
    1. for each arg, use the pos to get the name (cached_names[pos]) and use that name in the replacement if it is in replace_names
    2. process all kwargs like before if they are in replace_names

Unfortunately this won't work for plot_func(*arg, **kwarg) :-(

@tacaswell
Copy link
Member Author

import cycler as cy
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def simple_plot(ax, x, y, **kwargs):
    return ax.plot(x, y, **kwargs)

th = np.linspace(0, 2*np.pi, 128)
df = pd.DataFrame({'sin': np.sin(th), 'cos': np.cos(th),
                   'sin2': .5 * np.sin(2 * th), 'cos2': .5 * np.cos(2 * th)})

def easy_facet(df):
    cycleX = (cy.cycler('x', df.keys()) + cy.cycler('linestyle', ['-', '--', ':', '']))
    cycleY = (cy.cycler('y', df.keys()) + cy.cycler('marker', 'xos*'))
    kw_cycle = cycleX * cycleY

    fig, axes = plt.subplots(len(df.keys()), len(df.keys()), sharex=True, sharey=True,
                             figsize=(10, 10))
    lines = []
    for ax, kwargs in zip(axes.ravel(), kw_cycle):
        ln, = simple_plot(ax, markevery=5, data=df, **kwargs)
        ax.set_title('{x} vs {y}'.format(**kwargs))
        lines.append(ln)


easy_facet(df)

so

@shoyer
Copy link

shoyer commented Jul 25, 2015

This is really a fantastic addition. This simple lookup based approach will couple equally well with other labeled data libraries, e.g., xray.

@mwaskom I don't think there's a clean way to automatically handle axis labeling. The way to get that info from a pandas DataFrame is very pandas specific.

@mwaskom
Copy link

mwaskom commented Jul 25, 2015

@mwaskom I don't think there's a clean way to automatically handle axis labeling. The way to get that info from a pandas DataFrame is very pandas specific.

I don't think I understand why it would specific to the input type. I would think it is just logic that needs to be associated with the particular matplotlib function. In other words, plt.scatter knows that its first arg should label the x axis and the second arg should label the y axis. The label should just be the string passed, i.e.:

ax.scatter("foo", "bar", data=df)

should be the same as

ax.scatter(df.foo, df.bar)
ax.set(xlabel="foo", ylabel="bar")

To be clear I'm not saying matplotlib needs to extract a .name attribute from a vector, just that if semantic names are used to draw the plot, they should end up as labels too.

@tacaswell
Copy link
Member Author

Associating the column names with the axis labels only makes sense for the simplest of the use cases. For example

ax.scatter('x', 'foo', data=df)
ax.scatter('x', 'bar', data=df)
ax.plot('x', 'baz', data=df)

would end up with what ever the last call was setting the axis labels which may not be right. I think it is better to err on the side of making the users be more explicit rather than giving the users something that is wrong.

@tacaswell
Copy link
Member Author

@shoyer That was definitely part of the discussion.

I have also tested that it works with h5py files/groups and dicts of things that quack like arrays.

@jakevdp
Copy link
Contributor

jakevdp commented Jul 25, 2015

Associating the column names with the axis labels only makes sense for the simplest of the use cases.

Pandas solves this by labeling the x axis (which is almost always the same for multiple lines) and making a legend based on the y labels. That's why I suggested above automatically labeling the objects, so that a simple plt.legend() will work as users intend 95% of the time.

@jorisvandenbossche
Copy link

+1 from me as well!
I think it is important to clearly specify (limit) which arguments should do this unpacking (something in the line of what @JanSchulz is suggesting?)

@TomAugspurger
Copy link
Contributor

Big +1 here, this looks great.

The one thing I see, from a pandas perspective, is that we typically plot a column against an index. e.g.

In [10]: df = pd.DataFrame({'A': range(10), 'B': np.arange(10)**2})
In [11]: df.A.plot()

Will plot the A and B columns against the index. With this PR, users would type

>>> ax.plot(x='index', y='A', data=df.reset_index())

It'd be nice to avoid that .reset_index. I don't think that matplotlib should worry about this. That's up to pandas whether df['index_name'] should potentially return the index.

@TomAugspurger
Copy link
Contributor

Associating the column names with the axis labels only makes sense for the simplest of the use cases

ax.scatter('x', 'foo', data=df)
ax.scatter('x', 'bar', data=df)
ax.plot('x', 'baz', data=df)

Pandas does just overwrite the axis label here. It's not ideal, but there is a precedent here. I guess this counts as one of those foot-cannons you mentioned @tacaswell.

@mwaskom
Copy link

mwaskom commented Jul 25, 2015

Pandas does just overwrite the axis label here. It's not ideal, but there is a precedent here.

I think this discussion is why long-form data is better than wide-form data. But I could see the argument that matplotlib should remain agnostic about data format. That said, long-form datasets are probably > 5% of what's out there, so I'm not sure this is accurate:

That's why I suggested above automatically labeling the objects, so that a simple plt.legend() will work as users intend 95% of the time.

kwargs = dict((k, _replacer(data, v))
for k, v in six.iteritems(kwargs))
else:
raise ValueError("Trying to unpack labeled data, but "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the strategy here. What's the point of the rcParam? You don't seem to be letting it turn off the unpacking behavior. You are always popping the data kwarg; and if it is there, the only effect of the rcParam seems to be to generate an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was just too be able to turn this feature off in a guaranteed way. It is better to catch it here and raise rather than letting it fall through to set_data. The other thing i thought about was making this rcparam a define time check and if it is false just return func

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the use case for this. Under what circumstances would a user want to set the rcParam to False?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit nervous about adding this so close to release, if you are not worried I will get rid of this bit of complexity.

@tacaswell
Copy link
Member Author

Where I have landed on all of these issues is:

  • automatic unpacking is in and should stay
  • auto-index aware ax.plot is in and should stay
  • xlabel/ylabel setting should not be done at this API level (more thoughts on that are coming, probably a MEP)
  • I see the reasons to white list what positions/kwargs are replaced, however doing that uniformly across the library is a lot of very subtle work. I would prefer to not do that now and go with the brute force approach at first. Limiting it to not fill replace things it should not later does not seem like an API break to me so I don't think we would be painting our selves into a corner. This also ties back into the above mentioned API related MEP. The main reluctance here is a) the time line to getting a 1.5rc1 tagged and b) rather not do white listing than do it badly. There is a version of this in the code, but I am thinking of ripping it out.
  • Very similar story for extracting a label from the data automatically. There is a commit that takes a pass at this (using a method @jakevdp suggested), but it is not tested or used. As above, I am thinking about ripping this out, but want input from people before I do.

The calculus on the last two points changes greatly if anyone else steps up to work on this.

My goal here is to get a MVP of a labeled data aware API out the door with 1.5, I think dropping some of the safety (the input white listing) and convenience (artist label lookup) is worth getting a version out that we are clear on what the limitations are (if you have a column named 'g' bad things might happen) so we can get it used and see where the limitations/pain points are.

@tacaswell
Copy link
Member Author

And to address @mwaskom comment about long vs wide data, at this level of the API, I think we have to take wide data. There needs to be a layer built on top of this that will be take the long data, do the selection/filtering/aggregation and call out to this layer with wide data.

There is an interesting discussion that needs to happen about what that higher level API should look like.

except AttributeError:
y = np.atleast_1d(y)
return np.arange(y.shape[0], dtype=float), y

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this will handle a dataframe with a MultiIndex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time the code path gets here it should be no bigger than a Series, can you have multi-index on a series?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most definitely. Not sure how to check the type without importing pandas. I guess you could import pandas inside the try block, but that's probably not desirable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling series.index.values on a Series with a MultiIndex will return a 1d array of tuples.

@phobson
Copy link
Member

phobson commented Jul 29, 2015

👍 so excited about this!

@tacaswell
Copy link
Member Author

Responding to @phobson s inline comment:

No MultiIndex support for now.

kwargs['label'] is None)):
if len(args) > label_arg:
try:
kwargs['label'] = args[label_arg].name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use @mwaskom's suggestion of using the text label instead of the .name attribute? That seems safer:

To be clear I'm not saying matplotlib needs to extract a .name attribute from a vector, just that if semantic names are used to draw the plot, they should end up as labels too.

.name will also work with xray, but the smaller we can make the labeled data spec, the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be changed to use either. I was trying to make this work for cases where the user is currently doing plt.plot(df['foo']) which while cutting against my long message is what I was thinking when I wrote this.

@tacaswell
Copy link
Member Author

If it wasn't clear above, if anyone want to take this an run with it (or start from scratch) go for it, I have no personal attachment to this code. If this PR mostly serves to annoy someone enough to do it right I will be happy 😄 .

@westurner
Copy link

.name will also work with xray, but the smaller we can make the labeled data spec, the better.

So, RDF schemas and representation formats (e.g. JSON-LD, CSVW) define metadata fields like 'rdfs:label' (@en) and 'schema:name'.

pandas-dev/pandas#3402 "ENH: Linked Datasets (RDF)"

For linked data, I don't see why there would be a need to create a different format for expressing this metadata.

CSV -> arrays <- metadata (RDF, JSON-LD) [.name, provenance]

[edit]

@westurner
Copy link

... vega visualization grammar also solves for axes labels: https://github.com/vega/vega/wiki

@jorisvandenbossche
Copy link

About the kwarg whitelisting / automatic labeling.

@tacaswell I certainly understand the careful approach of not doing too much in a first iteration. But simply using the provided string key as the label seems like a rather safe thing to do (safer as looking for a .name attribute, and in any case, the user still has to call legend to have it visible, so it does not really do something by default).

This would help a lot in the following case. Suppose this example:

ax.plot('col_x', 'col_y', data=df, label='col_y')

If there is no automatic labeling, you have to provide a label= yourself. But if you simply want this to be the column name, the above code snippet will fail if there is no whitelist on the kwargs for which unpacking happens.
Of course you can slightly alter the name provided to label, but just having the column names in the legend seems like a common use case to me.

@tacaswell
Copy link
Member Author

@jorisvandenbossche Good catch re label getting replaced!

pass
elif label_kwarg in kwargs:
try:
kwargs['label'] = args[label_kwarg].name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be s/args/kwargs/:

kwargs['label'] = kwargs[label_kwarg].name

@jankatins
Copy link
Contributor

Here is an alternative decorator, which uses inspect to get the names of arguments instead of needing to specify both position and names. [updated] If *args or **kwargs is used it needs a full list of argument names (or better at least all args which can be in *args), in the right order.[/].

The main benefit is that you only need to specify three arguments: the "replace_names" list of a args which should be replaced and the "label_namer" and, only if varargs are used: the full list of arguments. I find that this is easier to maintain than using both position and name in all cases.

This version also uses both the label_namer value (aka plot("col1") -> "col1") and if it is available data[x].name

import functools
import six
import inspect

def _replacer(data, key):
    # if key isn't a string don't bother
    if not isinstance(key, six.string_types):
        return key
    # try to use __getitem__
    try:
        return data[key]
    # key does not exist, silently fall back to key
    except KeyError:
        return key

def unpack_labeled_data_names(replace_names=None, label_namer="x", full_argument_names=None):
    """
    A decorator to add a 'data' kwarg to any a function.  The signature
    of the input function must be ::

       def foo(ax, *args, **kwargs)

    so this is suitable for use with Axes methods.
    """
    if replace_names is not None:
        replace_names = set(replace_names)

    def param(func):
        # remove the first "ax" arg
        arg_spec = inspect.getargspec(func)
        if ((arg_spec.keywords is None) and  (arg_spec.varargs is None)):
            arg_names = arg_spec.args[1:]
        else:
            # in this case we need a supplied list of arguments
            if full_argument_names is None:
                raise Exception("Wrapped function uses *args or **kwargs, need full_argument_names!")
            arg_names = full_argument_names[1:]    

        if label_namer:
            if not label_namer in arg_names:
                raise Exception("label namer: no arg with name %s | %s" % (label_namer, arg_names))
            label_namer_pos = arg_names.index(label_namer)
        else:
            label_namer_pos = 9999 # bigger than all "possible" arg lists 

        @functools.wraps(func)
        def inner(ax, *args, **kwargs):
            data = kwargs.pop('data', None)
            xlabel = None                
            if data is not None:
                # save the current label_namer value so that it can be used as a label
                if label_namer_pos < len(args):
                    xlabel = args[label_namer_pos]
                else:
                    xlabel = kwargs.get(label_namer, None)

                if not isinstance(xlabel, six.string_types):
                    xlabel = None

                # A arg is replaced if the arg_name of that position is in replace_names
                try:
                    args = tuple(_replacer(data, a) if arg_names[j] in replace_names else a
                                 for j, a in enumerate(args))
                except IndexError: 
                    raise Exception("Got more args than function expects")

                kwargs = dict((k, _replacer(data, v) if k in replace_names else v)
                    for k, v in six.iteritems(kwargs))
            # replace the label if this func has a label arg and the user didn't set one
            if (("label" in arg_names) and (
                    (arg_names.index("label") < len(args)) or # not in args
                    ('label' not in kwargs or kwargs['label'] is None)) # not in kwargs
               ):
                    if label_namer_pos < len(args):
                        try:
                            kwargs['label'] = args[label_namer_pos].name
                        except AttributeError:
                            kwargs['label'] = xlabel
                    elif label_namer in kwargs:
                        try:
                            kwargs['label'] = kwargs[label_namer].name
                        except AttributeError:
                            kwargs['label'] = xlabel
            return func(ax, *args, **kwargs)
        return inner
    return param

@unpack_labeled_data_names(replace_names=["x","y"])
def plot_func(ax, x, y, ls="x", label=None, w="xyz"):
    return "x: %s, y: %s, ls: %s, w: %s, label: %s" % (list(x),list(y),ls, w, label)

## or 

@unpack_labeled_data_names(replace_names=["x","y"], full_argument_names=["ax", "x", "y", "ls", "label", "w"])
def plot_func(ax, *args, **kwargs):
    all_args = [None, None, "x", None, "xyz"]
    for i, v in enumerate(args):
        all_args[i] = v
    for i, k in enumerate(["x", "y", "ls", "label", "w"]):
        if k in kwargs:
            all_args[i] = kwargs[k]
    x, y, ls, label, w = all_args
    return "x: %s, y: %s, ls: %s, w: %s, label: %s" % (list(x),list(y),ls, w, label)

# Tests (work for both plot_func versions):
assert plot_func(None, "x","y") == "x: ['x'], y: ['y'], ls: x, w: xyz, label: None"
assert plot_func(None, x="x",y="y")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: None"
assert plot_func(None, "x","y", label="")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: "
assert plot_func(None, "x","y", label="text")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: text"
assert plot_func(None, x="x",y="y", label="")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: "
assert plot_func(None, x="x",y="y", label="text")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: text"

data = {"a":[1,2],"b":[8,9]}
assert plot_func(None, "a","b", data=data) == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, x="a",y="b", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, "a","b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, "a","b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"
assert plot_func(None, x="a",y="b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, x="a",y="b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"

import pandas as pd
data = pd.DataFrame({"a":[1,2],"b":[8,9]})
assert plot_func(None, "a","b", data=data) == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, x="a",y="b", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, "a","b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, "a","b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"
assert plot_func(None, x="a",y="b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, x="a",y="b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"

@jkseppan
Copy link
Member

I wrote an expanded version of my earlier comment on the mailing list: http://article.gmane.org/gmane.comp.python.matplotlib.devel/13643

The code I'm referring to is on a branch based on this one: https://github.com/jkseppan/matplotlib/commits/label-with-nonstrings

@tacaswell
Copy link
Member Author

@jkseppan That is very nice, but is out of scope for right now. The goal is to get a MVP out the door for 1.5 in a way that does not paint us into a corner. Having very clear edges of what we will be providing (limited to single-index string labeled tables) is a feature.

This is an interesting enough idea that I think we should not try to rush it in and should probably work closely with the pandas/xray folks to define how that is going to work.

@tacaswell
Copy link
Member Author

@JanSchulz That looks good. label_namer should probably default to 'y' (as this is the label that goes in the legend, not the axes label). I don't see how this works in the case where the function can take label as a kwarg, but it is just passed through blindly.

@jkseppan
Copy link
Member

@tacaswell The problem I'm trying to get at is that strings are overloaded. In the matplotlib API they can mean at least colors (with multiple different syntaxes), line styles, marker styles, and text. I would argue that using strings for yet another purpose is a way of painting ourselves into a corner. While my branch has a longish demo in the test case, it's just an example of what the user could do with the API. The one change I'd like to make to this PR is b4709b3, just the part that inverts the string check to check that the keys aren't numbers or anything unhashable.

Or, if allowing any object is too much, we could provide an abstract base class whose descendants we allow in addition to strings:

class DataKey(object): pass

...
if not isinstance(key, six.string_types + (DataKey,)):
    ...

@jkseppan
Copy link
Member

It would be good to have a test of the new functionality. There's a beginning of a test in d186a80.

@jakevdp
Copy link
Contributor

jakevdp commented Jul 29, 2015

@jkseppan – your approach is interesting, but I think it would be much better suited for an extension library than the core of matplotlib itself. I agree with @tacaswell on his initial approach, though it probably needs some whitelisting mechanism as well.

@jankatins
Copy link
Contributor

In ggplot, x="key" can refer to multiple things, from column names to transformation with column names ("np.log(colname)") to variables in the current scope. This is realised with patsy and evaluation contexts. https://github.com/yhat/ggplot/blob/master/ggplot/ggplot.py#L537

@jankatins
Copy link
Contributor

I've put up a PR with my version of the decorator: #4829

@tacaswell
Copy link
Member Author

Closing in favor of #4829

The above discussion has convinced me that whitelisting is essential and ax.plot is going to need to be special cased to death.

@tacaswell tacaswell closed this Jul 30, 2015
@tacaswell tacaswell changed the title ENH: plotting methods can unpack labeled data ENH: plotting methods can unpack labeled data [MOVED TO #4829] Jul 30, 2015
@tacaswell tacaswell deleted the enh_label_data_round2 branch January 24, 2019 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.