ENH: plotting methods can unpack labeled data [MOVED TO #4829] #4787

tacaswell · 2015-07-25T08:06:54Z

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew
Rocklin and Jake VanderPlas this is a proposal for how to deal with
labeled data in matplotlib.

The approach taken is that if the optional kwarg 'data' is passed in any
other string args/kwargs to the function are replaced be the result
data[k] if the key exists in data, else leave the value as-is.

Fernando made a compelling case that this needs to go in ASAP.

This still needs docs + tests + a bit more thought on how to deal with functions where we do some internal broadcasting (mostly plot). Maybe pass in names as a coma separated list? I would prefer to, long term, simplify the low-level plot and have either the users do the looping or provide higher-level plotting functions which do the looping.

There is the possibility that some of the string args/kwargs we already take may conflict with names in the labeled data (ex ha='center' would not work with a data structure where 'center' in data).

@pzwang expressed concern that we may be painting ourselves into a corner with this API as it is mostly just the difference between

ax.plot('a', 'b', data=LD)

vs

ax.plot(LD['a'], LD['b'])

The unpacking attempts can be disabled via a rcparam. That could also be implemented as an import time rcparam which disables the decorator all together.

This should work with any data object that supports getitem and returns something that np.asarray works on.

attn @matplotlib/developers @jakevdp @fperez @mrocklin @ellisonbg @pzwang @mwaskom @jreback @andrewcollette

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew Rocklin and Jake VanderPlas this is a proposal for how to deal with labeled data in matplotlib. The approach taken is that if the optional kwarg 'data' is passed in any other string args/kwargs to the function are replaced be the result `data[k]` if the key exists in `data`, else leave the value as-is.

mwaskom · 2015-07-25T13:52:21Z

Nice!

jreback · 2015-07-25T14:53:58Z

+1 on this from my perspective as well. IIRC this is the protocol we discussed to have pandas internally dispatch to matplotlib as well.

jkseppan · 2015-07-25T15:04:44Z

This sounds like getting closer to R's plotting functions. With the function call mechanism in R, you can do

plot(x, y**2, data=foo)

and the plot function sees the expressions passed in as arguments and controls how they get evaluated. People will naturally ask for expression support next, and if we use strings as placeholders for values, we'll need to implement a parser for some expression language.

Alternative design: let users pass in sympy symbols or expressions, as in

x, y = symbols('x y')
plot(x, y, data={x: ..., y: ...})

These won't clash with string values of keyword arguments, and the generalization to expressions is simple. To avoid a dependency on sympy, we can deliver a simple version of symbol objects ourselves.

r-owen · 2015-07-25T15:14:10Z

Will this work with line attributes? I would love a simple, consistent way to specify per-point colors, marker styles and marker sizes, and being able to specify that data in a record array and associate each keyword sounds like it would do the job very neatly.

— Russell

On Jul 25, 2015, at 1:07 AM, Thomas A Caswell [email protected] wrote:

After discussions with Brian Granger, Fernando Perez Peter Wang, Matthew
Rocklin and Jake VanderPlas this is a proposal for how to deal with
labeled data in matplotlib.

The approach taken is that if the optional kwarg 'data' is passed in any
other string args/kwargs to the function are replaced be the result
data[k] if the key exists in data, else leave the value as-is.

Fernando made a compelling case that this needs to go in ASAP.

This still needs docs + tests + a bit more thought on how to deal with functions where we do some internal broadcasting (mostly plot). Maybe pass in names as a coma separated list? I would prefer to, long term, simplify the low-level plot and have either the users do the looping or provide higher-level plotting functions which do the looping.

There is the possibility that some of the string args/kwargs we already take may conflict with names in the labeled data (ex ha='center' would not work with a data structure where 'center' in data).

@pzwang expressed concern that we may be painting ourselves into a corner with this API as it is mostly just the difference between

ax.plot('a', 'b', data=LD)
vs

ax.plot(LD['a'], LD['b'])
The unpacking attempts can be disabled via a rcparam. That could also be implemented as an import time rcparam which disables the decorator all together.

This should work with any data object that supports getitem and returns something that np.asarray works on.

attn @matplotlib/developers @jakevdp @fperez @mrocklin @ellisonbg @pzwang @mwaskom @jreback @andrewcollette

You can view, comment on, or merge this pull request online at:

#4787

Commit Summary

ENH: plotting methods can unpack labeled data
File Changes

M lib/matplotlib/init.py (39)
M lib/matplotlib/axes/_axes.py (46)
M lib/matplotlib/rcsetup.py (5)
Patch Links:

https://github.com/matplotlib/matplotlib/pull/4787.patch
https://github.com/matplotlib/matplotlib/pull/4787.diff
—
Reply to this email directly or view it on GitHub.

tacaswell · 2015-07-25T15:51:57Z

@jkseppan You hit the inspiration on the head 😉 The reason that it checks if the place holder is a string instead of just trying [] is that if you pass arrays into get item of data frames you get index related errors rather than KeyError. It might be possible to check if the arg is hashable, but I worry that will do bad things with tuples as input. In anycase, I don't think the current PR locks us into an API where we can't extent the place holders to other types.

I am not super excited about adding that sort of computation into core of mpl. I think it is important to (at the low level) keep computation/analysis logic separated from the plotting logic.

@r-owen There are some limitations to that due to details of the underlying artists work (all markers from plot must be the same shape and color, all markers from scatter must be the same shape (but can scale size, color and rotation). You can use cycler + a for loop + groupby to easily get some of what you want. Function like that are undeniably useful, but how to correctly parameterize and implement that (without pinning our selves to requiring pandas as input and without reimplementing pandas) is not clear yet.

mwaskom · 2015-07-25T15:53:13Z

Will/could this label axes, and do other smart things with the labels? That is (IMO) a major motivation for this style of invocation.

jakevdp · 2015-07-25T16:06:27Z

Thanks for this @tacaswell! One other feature that would be nice, though it would complicate the logic a bit. I'd love to have the created plot elements be automatically labeled. So then these two things would be equivalent:

plt.plot(data['t'], data['x'], label='x')
plt.plot(data['t'], data['y'], label='y')
plt.legend()

and

plt.plot('t', 'x', data=data)
plt.plot('t', 'y', data=data)
plt.legend()

it would involve inferring which is the y value in any appropriate function, and automatically setting the label= attribute if it's not already set.

jankatins · 2015-07-25T16:29:32Z

lib/matplotlib/__init__.py

+            if rcParams['unpack_labeled']:
+                args = tuple(_replacer(data, a) for a in args)
+                kwargs = dict((k, _replacer(data, v))
+                              for k, v in six.iteritems(kwargs))


Wouldn't it make sense to implement a positive/negative list of at least kwargs which should be replaced and some which should not replaced?

Like

@unpack_labeled_data([1], ["labels", "colors"]) # the second arg and two kwargs should be replaced def pie(self, x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=None, radius=None, counterclock=True, [...]

@jakevdp Asked the same question. I didn't go down that route because it is a bit more sutble as you would have to do

@unpack_labeled_data([1, 3, 4], ['x', 'labels', 'colors'])

which now that I type out what it would look like isn't so bad.

Probably will have to special case plot and maybe a few others with overly permissive APIs.

erk, didn't know that :-(

Not sure if that helps, but:

def func(x, y=1): print("x: %s, y: %s" % (x,y)) inspect.getargspec(func) ArgSpec(args=['x', 'y'], varargs=None, keywords=None, defaults=(1,))

only pass in the names of the args in the decorator as replace_names

in the decorator: cache the list of arg names via inspect: cached_names

in the wrapper:

for each arg, use the pos to get the name (cached_names[pos]) and use that name in the replacement if it is in replace_names

process all kwargs like before if they are in replace_names

Unfortunately this won't work for plot_func(*arg, **kwarg) :-(

tacaswell · 2015-07-25T16:35:34Z

import cycler as cy
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def simple_plot(ax, x, y, **kwargs):
    return ax.plot(x, y, **kwargs)

th = np.linspace(0, 2*np.pi, 128)
df = pd.DataFrame({'sin': np.sin(th), 'cos': np.cos(th),
                   'sin2': .5 * np.sin(2 * th), 'cos2': .5 * np.cos(2 * th)})

def easy_facet(df):
    cycleX = (cy.cycler('x', df.keys()) + cy.cycler('linestyle', ['-', '--', ':', '']))
    cycleY = (cy.cycler('y', df.keys()) + cy.cycler('marker', 'xos*'))
    kw_cycle = cycleX * cycleY

    fig, axes = plt.subplots(len(df.keys()), len(df.keys()), sharex=True, sharey=True,
                             figsize=(10, 10))
    lines = []
    for ax, kwargs in zip(axes.ravel(), kw_cycle):
        ln, = simple_plot(ax, markevery=5, data=df, **kwargs)
        ax.set_title('{x} vs {y}'.format(**kwargs))
        lines.append(ln)


easy_facet(df)

shoyer · 2015-07-25T17:18:43Z

This is really a fantastic addition. This simple lookup based approach will couple equally well with other labeled data libraries, e.g., xray.

@mwaskom I don't think there's a clean way to automatically handle axis labeling. The way to get that info from a pandas DataFrame is very pandas specific.

mwaskom · 2015-07-25T18:00:11Z

@mwaskom I don't think there's a clean way to automatically handle axis labeling. The way to get that info from a pandas DataFrame is very pandas specific.

I don't think I understand why it would specific to the input type. I would think it is just logic that needs to be associated with the particular matplotlib function. In other words, plt.scatter knows that its first arg should label the x axis and the second arg should label the y axis. The label should just be the string passed, i.e.:

ax.scatter("foo", "bar", data=df)

should be the same as

ax.scatter(df.foo, df.bar)
ax.set(xlabel="foo", ylabel="bar")

To be clear I'm not saying matplotlib needs to extract a .name attribute from a vector, just that if semantic names are used to draw the plot, they should end up as labels too.

tacaswell · 2015-07-25T18:20:13Z

Associating the column names with the axis labels only makes sense for the simplest of the use cases. For example

ax.scatter('x', 'foo', data=df)
ax.scatter('x', 'bar', data=df)
ax.plot('x', 'baz', data=df)

would end up with what ever the last call was setting the axis labels which may not be right. I think it is better to err on the side of making the users be more explicit rather than giving the users something that is wrong.

tacaswell · 2015-07-25T18:23:18Z

@shoyer That was definitely part of the discussion.

I have also tested that it works with h5py files/groups and dicts of things that quack like arrays.

jakevdp · 2015-07-25T18:34:05Z

Associating the column names with the axis labels only makes sense for the simplest of the use cases.

Pandas solves this by labeling the x axis (which is almost always the same for multiple lines) and making a legend based on the y labels. That's why I suggested above automatically labeling the objects, so that a simple plt.legend() will work as users intend 95% of the time.

jorisvandenbossche · 2015-07-25T19:09:14Z

+1 from me as well!
I think it is important to clearly specify (limit) which arguments should do this unpacking (something in the line of what @JanSchulz is suggesting?)

TomAugspurger · 2015-07-25T21:25:09Z

Big +1 here, this looks great.

The one thing I see, from a pandas perspective, is that we typically plot a column against an index. e.g.

In [10]: df = pd.DataFrame({'A': range(10), 'B': np.arange(10)**2})
In [11]: df.A.plot()

Will plot the A and B columns against the index. With this PR, users would type

>>> ax.plot(x='index', y='A', data=df.reset_index())

It'd be nice to avoid that .reset_index. I don't think that matplotlib should worry about this. That's up to pandas whether df['index_name'] should potentially return the index.

TomAugspurger · 2015-07-25T21:29:38Z

Associating the column names with the axis labels only makes sense for the simplest of the use cases

ax.scatter('x', 'foo', data=df)
ax.scatter('x', 'bar', data=df)
ax.plot('x', 'baz', data=df)

Pandas does just overwrite the axis label here. It's not ideal, but there is a precedent here. I guess this counts as one of those foot-cannons you mentioned @tacaswell.

mwaskom · 2015-07-25T22:02:52Z

Pandas does just overwrite the axis label here. It's not ideal, but there is a precedent here.

I think this discussion is why long-form data is better than wide-form data. But I could see the argument that matplotlib should remain agnostic about data format. That said, long-form datasets are probably > 5% of what's out there, so I'm not sure this is accurate:

That's why I suggested above automatically labeling the objects, so that a simple plt.legend() will work as users intend 95% of the time.

efiring · 2015-07-25T23:24:35Z

lib/matplotlib/__init__.py

+                kwargs = dict((k, _replacer(data, v))
+                              for k, v in six.iteritems(kwargs))
+            else:
+                raise ValueError("Trying to unpack labeled data, but "


I don't understand the strategy here. What's the point of the rcParam? You don't seem to be letting it turn off the unpacking behavior. You are always popping the data kwarg; and if it is there, the only effect of the rcParam seems to be to generate an exception.

The idea was just too be able to turn this feature off in a guaranteed way. It is better to catch it here and raise rather than letting it fall through to set_data. The other thing i thought about was making this rcparam a define time check and if it is false just return func

I still don't see the use case for this. Under what circumstances would a user want to set the rcParam to False?

I am a bit nervous about adding this so close to release, if you are not worried I will get rid of this bit of complexity.

Try to grab `y.index` before returning `np.arange(len(y))`

If white list is provided, only try to replace those.

Not tested or used.

tacaswell · 2015-07-29T03:45:04Z

Where I have landed on all of these issues is:

automatic unpacking is in and should stay
auto-index aware ax.plot is in and should stay
xlabel/ylabel setting should not be done at this API level (more thoughts on that are coming, probably a MEP)
I see the reasons to white list what positions/kwargs are replaced, however doing that uniformly across the library is a lot of very subtle work. I would prefer to not do that now and go with the brute force approach at first. Limiting it to not fill replace things it should not later does not seem like an API break to me so I don't think we would be painting our selves into a corner. This also ties back into the above mentioned API related MEP. The main reluctance here is a) the time line to getting a 1.5rc1 tagged and b) rather not do white listing than do it badly. There is a version of this in the code, but I am thinking of ripping it out.
Very similar story for extracting a label from the data automatically. There is a commit that takes a pass at this (using a method @jakevdp suggested), but it is not tested or used. As above, I am thinking about ripping this out, but want input from people before I do.

The calculus on the last two points changes greatly if anyone else steps up to work on this.

My goal here is to get a MVP of a labeled data aware API out the door with 1.5, I think dropping some of the safety (the input white listing) and convenience (artist label lookup) is worth getting a version out that we are clear on what the limitations are (if you have a column named 'g' bad things might happen) so we can get it used and see where the limitations/pain points are.

tacaswell · 2015-07-29T04:24:26Z

And to address @mwaskom comment about long vs wide data, at this level of the API, I think we have to take wide data. There needs to be a layer built on top of this that will be take the long data, do the selection/filtering/aggregation and call out to this layer with wide data.

There is an interesting discussion that needs to happen about what that higher level API should look like.

phobson · 2015-07-29T04:25:43Z

lib/matplotlib/cbook.py

+    except AttributeError:
+        y = np.atleast_1d(y)
+        return np.arange(y.shape[0], dtype=float), y
+


Not sure how this will handle a dataframe with a MultiIndex

By the time the code path gets here it should be no bigger than a Series, can you have multi-index on a series?

Most definitely. Not sure how to check the type without importing pandas. I guess you could import pandas inside the try block, but that's probably not desirable.

Calling series.index.values on a Series with a MultiIndex will return a 1d array of tuples.

phobson · 2015-07-29T04:26:22Z

👍 so excited about this!

tacaswell · 2015-07-29T04:28:11Z

Responding to @phobson s inline comment:

No MultiIndex support for now.

shoyer · 2015-07-29T05:04:03Z

lib/matplotlib/__init__.py

+                    kwargs['label'] is None)):
+                if len(args) > label_arg:
+                    try:
+                        kwargs['label'] = args[label_arg].name


Why not use @mwaskom's suggestion of using the text label instead of the .name attribute? That seems safer:

To be clear I'm not saying matplotlib needs to extract a .name attribute from a vector, just that if semantic names are used to draw the plot, they should end up as labels too.

.name will also work with xray, but the smaller we can make the labeled data spec, the better.

It should probably be changed to use either. I was trying to make this work for cases where the user is currently doing plt.plot(df['foo']) which while cutting against my long message is what I was thinking when I wrote this.

tacaswell · 2015-07-29T05:12:44Z

If it wasn't clear above, if anyone want to take this an run with it (or start from scratch) go for it, I have no personal attachment to this code. If this PR mostly serves to annoy someone enough to do it right I will be happy 😄 .

westurner · 2015-07-29T05:18:45Z

.name will also work with xray, but the smaller we can make the labeled data spec, the better.

So, RDF schemas and representation formats (e.g. JSON-LD, CSVW) define metadata fields like 'rdfs:label' (@en) and 'schema:name'.

pandas-dev/pandas#3402 "ENH: Linked Datasets (RDF)"

For linked data, I don't see why there would be a need to create a different format for expressing this metadata.

CSV -> arrays <- metadata (RDF, JSON-LD) [.name, provenance]

[edit]

https://w3c.github.io/csvw/
https://wrdrd.com/docs/consulting/knowledge-engineering#prov #csvw #rdf #semantic-web-standards

westurner · 2015-07-29T05:23:49Z

... vega visualization grammar also solves for axes labels: https://github.com/vega/vega/wiki

jorisvandenbossche · 2015-07-29T09:47:02Z

About the kwarg whitelisting / automatic labeling.

@tacaswell I certainly understand the careful approach of not doing too much in a first iteration. But simply using the provided string key as the label seems like a rather safe thing to do (safer as looking for a .name attribute, and in any case, the user still has to call legend to have it visible, so it does not really do something by default).

This would help a lot in the following case. Suppose this example:

ax.plot('col_x', 'col_y', data=df, label='col_y')

If there is no automatic labeling, you have to provide a label= yourself. But if you simply want this to be the column name, the above code snippet will fail if there is no whitelist on the kwargs for which unpacking happens.
Of course you can slightly alter the name provided to label, but just having the column names in the legend seems like a common use case to me.

tacaswell · 2015-07-29T11:35:43Z

@jorisvandenbossche Good catch re label getting replaced!

jankatins · 2015-07-29T12:26:43Z

lib/matplotlib/__init__.py

+                        pass
+                elif label_kwarg in kwargs:
+                    try:
+                        kwargs['label'] = args[label_kwarg].name


This should probably be s/args/kwargs/:

kwargs['label'] = kwargs[label_kwarg].name

jankatins · 2015-07-29T13:25:28Z

Here is an alternative decorator, which uses inspect to get the names of arguments instead of needing to specify both position and names. [updated] If *args or **kwargs is used it needs a full list of argument names (or better at least all args which can be in *args), in the right order.[/].

The main benefit is that you only need to specify three arguments: the "replace_names" list of a args which should be replaced and the "label_namer" and, only if varargs are used: the full list of arguments. I find that this is easier to maintain than using both position and name in all cases.

This version also uses both the label_namer value (aka plot("col1") -> "col1") and if it is available data[x].name

import functools
import six
import inspect

def _replacer(data, key):
    # if key isn't a string don't bother
    if not isinstance(key, six.string_types):
        return key
    # try to use __getitem__
    try:
        return data[key]
    # key does not exist, silently fall back to key
    except KeyError:
        return key

def unpack_labeled_data_names(replace_names=None, label_namer="x", full_argument_names=None):
    """
    A decorator to add a 'data' kwarg to any a function.  The signature
    of the input function must be ::

       def foo(ax, *args, **kwargs)

    so this is suitable for use with Axes methods.
    """
    if replace_names is not None:
        replace_names = set(replace_names)

    def param(func):
        # remove the first "ax" arg
        arg_spec = inspect.getargspec(func)
        if ((arg_spec.keywords is None) and  (arg_spec.varargs is None)):
            arg_names = arg_spec.args[1:]
        else:
            # in this case we need a supplied list of arguments
            if full_argument_names is None:
                raise Exception("Wrapped function uses *args or **kwargs, need full_argument_names!")
            arg_names = full_argument_names[1:]    

        if label_namer:
            if not label_namer in arg_names:
                raise Exception("label namer: no arg with name %s | %s" % (label_namer, arg_names))
            label_namer_pos = arg_names.index(label_namer)
        else:
            label_namer_pos = 9999 # bigger than all "possible" arg lists 

        @functools.wraps(func)
        def inner(ax, *args, **kwargs):
            data = kwargs.pop('data', None)
            xlabel = None                
            if data is not None:
                # save the current label_namer value so that it can be used as a label
                if label_namer_pos < len(args):
                    xlabel = args[label_namer_pos]
                else:
                    xlabel = kwargs.get(label_namer, None)

                if not isinstance(xlabel, six.string_types):
                    xlabel = None

                # A arg is replaced if the arg_name of that position is in replace_names
                try:
                    args = tuple(_replacer(data, a) if arg_names[j] in replace_names else a
                                 for j, a in enumerate(args))
                except IndexError: 
                    raise Exception("Got more args than function expects")

                kwargs = dict((k, _replacer(data, v) if k in replace_names else v)
                    for k, v in six.iteritems(kwargs))
            # replace the label if this func has a label arg and the user didn't set one
            if (("label" in arg_names) and (
                    (arg_names.index("label") < len(args)) or # not in args
                    ('label' not in kwargs or kwargs['label'] is None)) # not in kwargs
               ):
                    if label_namer_pos < len(args):
                        try:
                            kwargs['label'] = args[label_namer_pos].name
                        except AttributeError:
                            kwargs['label'] = xlabel
                    elif label_namer in kwargs:
                        try:
                            kwargs['label'] = kwargs[label_namer].name
                        except AttributeError:
                            kwargs['label'] = xlabel
            return func(ax, *args, **kwargs)
        return inner
    return param

@unpack_labeled_data_names(replace_names=["x","y"])
def plot_func(ax, x, y, ls="x", label=None, w="xyz"):
    return "x: %s, y: %s, ls: %s, w: %s, label: %s" % (list(x),list(y),ls, w, label)

## or 

@unpack_labeled_data_names(replace_names=["x","y"], full_argument_names=["ax", "x", "y", "ls", "label", "w"])
def plot_func(ax, *args, **kwargs):
    all_args = [None, None, "x", None, "xyz"]
    for i, v in enumerate(args):
        all_args[i] = v
    for i, k in enumerate(["x", "y", "ls", "label", "w"]):
        if k in kwargs:
            all_args[i] = kwargs[k]
    x, y, ls, label, w = all_args
    return "x: %s, y: %s, ls: %s, w: %s, label: %s" % (list(x),list(y),ls, w, label)

# Tests (work for both plot_func versions):
assert plot_func(None, "x","y") == "x: ['x'], y: ['y'], ls: x, w: xyz, label: None"
assert plot_func(None, x="x",y="y")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: None"
assert plot_func(None, "x","y", label="")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: "
assert plot_func(None, "x","y", label="text")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: text"
assert plot_func(None, x="x",y="y", label="")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: "
assert plot_func(None, x="x",y="y", label="text")  == "x: ['x'], y: ['y'], ls: x, w: xyz, label: text"

data = {"a":[1,2],"b":[8,9]}
assert plot_func(None, "a","b", data=data) == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, x="a",y="b", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, "a","b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, "a","b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"
assert plot_func(None, x="a",y="b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, x="a",y="b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"

import pandas as pd
data = pd.DataFrame({"a":[1,2],"b":[8,9]})
assert plot_func(None, "a","b", data=data) == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, x="a",y="b", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: a"
assert plot_func(None, "a","b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, "a","b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"
assert plot_func(None, x="a",y="b", label="", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: "
assert plot_func(None, x="a",y="b", label="text", data=data)  == "x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text"

jkseppan · 2015-07-29T15:09:51Z

I wrote an expanded version of my earlier comment on the mailing list: http://article.gmane.org/gmane.comp.python.matplotlib.devel/13643

The code I'm referring to is on a branch based on this one: https://github.com/jkseppan/matplotlib/commits/label-with-nonstrings

tacaswell · 2015-07-29T15:41:03Z

@jkseppan That is very nice, but is out of scope for right now. The goal is to get a MVP out the door for 1.5 in a way that does not paint us into a corner. Having very clear edges of what we will be providing (limited to single-index string labeled tables) is a feature.

This is an interesting enough idea that I think we should not try to rush it in and should probably work closely with the pandas/xray folks to define how that is going to work.

tacaswell · 2015-07-29T15:47:05Z

@JanSchulz That looks good. label_namer should probably default to 'y' (as this is the label that goes in the legend, not the axes label). I don't see how this works in the case where the function can take label as a kwarg, but it is just passed through blindly.

jkseppan · 2015-07-29T16:47:26Z

@tacaswell The problem I'm trying to get at is that strings are overloaded. In the matplotlib API they can mean at least colors (with multiple different syntaxes), line styles, marker styles, and text. I would argue that using strings for yet another purpose is a way of painting ourselves into a corner. While my branch has a longish demo in the test case, it's just an example of what the user could do with the API. The one change I'd like to make to this PR is b4709b3, just the part that inverts the string check to check that the keys aren't numbers or anything unhashable.

Or, if allowing any object is too much, we could provide an abstract base class whose descendants we allow in addition to strings:

class DataKey(object): pass

...
if not isinstance(key, six.string_types + (DataKey,)):
    ...

jkseppan · 2015-07-29T16:49:05Z

It would be good to have a test of the new functionality. There's a beginning of a test in d186a80.

jakevdp · 2015-07-29T18:23:06Z

@jkseppan – your approach is interesting, but I think it would be much better suited for an extension library than the core of matplotlib itself. I agree with @tacaswell on his initial approach, though it probably needs some whitelisting mechanism as well.

jankatins · 2015-07-29T19:27:55Z

In ggplot, x="key" can refer to multiple things, from column names to transformation with column names ("np.log(colname)") to variables in the current scope. This is realised with patsy and evaluation contexts. https://github.com/yhat/ggplot/blob/master/ggplot/ggplot.py#L537

jankatins · 2015-07-30T18:29:38Z

I've put up a PR with my version of the decorator: #4829

tacaswell · 2015-07-30T19:15:22Z

Closing in favor of #4829

The above discussion has convinced me that whitelisting is essential and ax.plot is going to need to be special cased to death.

tacaswell added the status: needs review label Jul 25, 2015

tacaswell added this to the next point release milestone Jul 25, 2015

jankatins reviewed Jul 25, 2015
View reviewed changes

efiring reviewed Jul 25, 2015
View reviewed changes

tacaswell added 4 commits July 26, 2015 02:49

ENH: Make implicit x in plot pandas aware

1cf0f38

Try to grab `y.index` before returning `np.arange(len(y))`

ENH: add white-list of args/kwargs to relpace

676eeb4

If white list is provided, only try to replace those.

MNT: remove unused rcparam

9e6e7b4

MNT: python 2.6 does not support set literals

5ace465

tacaswell mentioned this pull request Jul 28, 2015

Fix unit support with plot and pint #4803

Merged

ENH: pass at decorator which extracts a label

e90d859

Not tested or used.

phobson reviewed Jul 29, 2015
View reviewed changes

shoyer reviewed Jul 29, 2015
View reviewed changes

jankatins reviewed Jul 29, 2015
View reviewed changes

FIX: fix typo, slightly rename variables

6646545

FIX: fix yet more typos

4059b44

jankatins mentioned this pull request Jul 30, 2015

ENH: plotting methods can unpack labeled data #4829

Merged

14 tasks

tacaswell closed this Jul 30, 2015

tacaswell removed the status: needs review label Jul 30, 2015

tacaswell changed the title ~~ENH: plotting methods can unpack labeled data~~ ENH: plotting methods can unpack labeled data [MOVED TO #4829] Jul 30, 2015

tacaswell deleted the enh_label_data_round2 branch January 24, 2019 04:06

ENH: plotting methods can unpack labeled data [MOVED TO #4829] #4787

ENH: plotting methods can unpack labeled data [MOVED TO #4829] #4787

Conversation

tacaswell commented Jul 25, 2015

mwaskom commented Jul 25, 2015

jreback commented Jul 25, 2015

jkseppan commented Jul 25, 2015

r-owen commented Jul 25, 2015

tacaswell commented Jul 25, 2015

mwaskom commented Jul 25, 2015

jakevdp commented Jul 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Jul 25, 2015

shoyer commented Jul 25, 2015

mwaskom commented Jul 25, 2015

tacaswell commented Jul 25, 2015

tacaswell commented Jul 25, 2015

jakevdp commented Jul 25, 2015

jorisvandenbossche commented Jul 25, 2015

TomAugspurger commented Jul 25, 2015

TomAugspurger commented Jul 25, 2015

mwaskom commented Jul 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Jul 29, 2015

tacaswell commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phobson commented Jul 29, 2015

tacaswell commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Jul 29, 2015

westurner commented Jul 29, 2015

westurner commented Jul 29, 2015

jorisvandenbossche commented Jul 29, 2015

tacaswell commented Jul 29, 2015

Choose a reason for hiding this comment

jankatins commented Jul 29, 2015

jkseppan commented Jul 29, 2015

tacaswell commented Jul 29, 2015

tacaswell commented Jul 29, 2015

jkseppan commented Jul 29, 2015

jkseppan commented Jul 29, 2015

jakevdp commented Jul 29, 2015

jankatins commented Jul 29, 2015

jankatins commented Jul 30, 2015

tacaswell commented Jul 30, 2015