VIS: added ability to plot DataFrames and Series with errorbars #5638

r-b-g-b · 2013-12-04T00:10:22Z

Addresses some of the concerns in issue #3796. New code allows the DataFrame and Series Line plots and Bar plot functions to include errorbars using xerr and yerr keyword arguments to DataFrame/Series.plot(). It supports specifying x and y errorbars as 1. a separate list/numpy/Series, 2. a DataFrame with the same column names as the plotting DataFrame. For example, using method 2 looks like this:

df = pd.DataFrame({'x':[1, 2, 3], 'y':[3, 2, 1]})
df_xerr = pd.DataFrame({'x':[0.6, 0.2, 0.3], 'y':[0.4, 0.5, 0.6]})
df_yerr = pd.DataFrame({'x':[0.5, 0.4, 0.6], 'y':[0.3, 0.7, 0.4]})

df.plot(xerr=df_xerr, yerr=df_yerr)

This is my first contribution. I tried to follow the contribution guidelines as best I could, but let me know if anything needs work!

jreback · 2013-12-04T00:35:36Z

need a test that tests passing invalid error bars (u raise in the code - just need to exercise that)
also if invalid data is passed for series would it raise as well?

TomAugspurger · 2013-12-04T01:36:20Z

pandas/tools/plotting.py

+        the presence of errorbar keywords.
+        '''
+        if (('yerr' in self.kwds) and (self.kwds['yerr'] is not None)) or \
+            (('xerr' in self.kwds) and (self.kwds['xerr'] is not None)):


Is this equivalent to if self.kwds.get('yerr') or self.kwds.get('xerr')?

Hmm, if I understand correctly, you're saying you could accomplish this more cleanly with:

yerr = self.kwds.get('yerr') xerr = self.kwds.get('xerr') if yerr is None and xerr is None: plotf = self.plt.Axes.plot plotf_name = 'plot' else: plotf = self.plt.Axes.errorbar plotf_name = 'errorbar'

r-b-g-b · 2013-12-04T02:28:21Z

@jreback I added some tests to make sure the right exceptions were being raised when invalid error arguments were passed.

I tested two cases where xerr/yerr arguments were:

different lengths from the plotted series (ValueError)
the wrong data type (TypeError)

Were these the kinds of cases you had in mind?
Both of these errors are raised in the underlying matplotlib code-- do you think it's necessary to catch them in the pandas plotting code? Also, are there any other tests you'd like to see?

r-b-g-b · 2013-12-06T23:23:14Z

Just wanted to check in and see if there was anything you all thought needed work, since I'll have some time over the weekend to spend on this. Thanks!

TomAugspurger · 2013-12-06T23:42:52Z

I'll look at this more closely tomorrow.

Is this supposed to work?

In [26]: df
Out[26]: 
     x   y  error
0    0  12    0.4
1    1  11    0.4
2    2  10    0.4
3    3   9    0.4
4    4   8    0.4
5    5   7    0.4
6    6   6    0.4
7    7   5    0.4
8    8   4    0.4
9    9   3    0.4
10  10   2    0.4
11  11   1    0.4

[12 rows x 3 columns]

In [27]: df.plot(yerr='error')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-27-476acf361e57> in <module>()
----> 1 df.plot(yerr='error')

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in plot_frame(frame, x, y, subplots, sharex, sharey, use_index, figsize, grid, legend, rot, ax, style, title, xlim, ylim, logx, logy, xticks, yticks, kind, sort_columns, fontsize, secondary_y, **kwds)
   1820                              secondary_y=secondary_y, **kwds)
   1821 
-> 1822     plot_obj.generate()
   1823     plot_obj.draw()
   1824     if subplots:

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in generate(self)
    876         self._compute_plot_data()
    877         self._setup_subplots()
--> 878         self._make_plot()
    879         self._post_plot_logic()
    880         self._adorn_subplots()

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in _make_plot(self)
   1369                         kwds['yerr'] = yerr[label]
   1370                 elif yerr is not None:
-> 1371                     kwds['yerr'] = yerr[i]
   1372 
   1373                 if isinstance(xerr, DataFrame):

IndexError: list index out of range

It plots the error bars for one of the Series. You may want to raise a TypeError here if you think thats ambiguous, or just apply the same error bars to each. I also think that yerr and xerr should accept a dict of column names, where the key is the particular series and the value is the column of errors to apply to that series.

TomAugspurger · 2013-12-06T23:49:52Z

In this case:

In [66]: df
Out[66]: 
     x   y  error
0    0  12    0.4
1    1  11    0.4
2    2  10    0.4
3    3   9    0.4
4    4   8    0.4
5    5   7    0.4
6    6   6    0.4
7    7   5    0.4
8    8   4    0.4
9    9   3    0.4
10  10   2    0.4
11  11   1    0.4

[12 rows x 3 columns]

In [67]: df_err
Out[67]: 
      x  y
0   0.2  2
1   0.2  2
2   0.2  2
3   0.2  2
4   0.2  2
5   0.2  2
6   0.2  2
7   0.2  2
8   0.2  2
9   0.2  2
10  0.2  2
11  0.2  2

[12 rows x 2 columns]

How do you decide to use just df_err['y'] for the bars (I think that's what you're doing; I don't think the xs are being covered).

Also, I'm thinking about a good way to accept asymmetric error bars. A sequence of tuples or two arrays of the same length (you might already handle this). More feedback tomorrow hopefully!

r-b-g-b · 2013-12-09T02:19:18Z

Yes, I think broadcasting one error column to all of the data columns should be an option -- it should be doable by adding to the _parse_errorbars function.

It does prevent the user from being able to plot some data with error bars and some without. But in those cases, they can use the "key-matched error DataFrame" -- if a label is not present in that column, the data will be plotted without error bars. Overall, I like this method since it is most explicit and least prone to unintended consequences, I think.

And yes, being able to pass an error dict is a good idea. I changed the code to implement this by taking advantage of the syntactic similarity of DataFrames and dicts (e.g. if you have type(df) is DataFrame and type(d) is dict, you can do 'x' in df.keys() and 'x' in d.keys() and df['x'] and d['x'], and both will work. (although I don't know if this is considered good/safe coding practice, thoughts?).

I decided not to include x errors for bar plot because I don't think I've ever seen one with x errors, but you're probably right that they should be included. (I hadn't considered barh, and also, who am I to say you shouldn't have x error bars on vertical bar plots?)

As for asymmetrical error bars, I was thinking of implementing something like yerr_upper/yerr_lower since then you could organize it using error DataFrames/dicts like before, but it gets a bit messy. I'll give the method you suggested a shot -- thanks for your help!

r-b-g-b · 2013-12-09T21:14:26Z

I added some documentation as well, but I only have a loose grasp of Sphinx, so I could have mangled it a bit (@TomAugspurger, I pretty much copied the structure from your hexbin-plot commit). Let me know if it needs work!

TomAugspurger · 2014-02-02T16:27:55Z

doc/source/release.rst

@@ -221,6 +221,11 @@ Improvements to existing features
    MultiIndex and Hierarchical Rows. Set the ``merge_cells`` to ``False`` to
    restore the previous behaviour.  (:issue:`5254`)
  - The FRED DataReader now accepts multiple series (:issue`3413`)
+  - DataFrame/Series .plot() functions support plotting with error bars by 


This will need to be moved to the .14 section once that is created.

TomAugspurger · 2014-02-02T17:38:24Z

Sorry it took so long to get back to this; it fell of my radar. Most of my comments are inline.

I need a bit longer to look at your changes to get_plot_function.

Also there's a lot of repeated code. There are blocks that do something for yerr and then do something for xerr. See if you can factor those into their own function or closure.

Same thing for _parser_error_bars and parse_error_bars_for_sereis. It would be better to do as much of that in one place as possible. I haven't had a close look yet, but I'll see if there's a way to reduce the repetition.

Thanks for doing this. It's a pretty tricky API to work out, but I think it looks pretty good so far.

jreback · 2014-02-16T21:47:51Z

@gibbonorbiter can you rebase this so we can take a look?

r-b-g-b · 2014-02-17T21:47:36Z

Thanks for the comments so far. I think it's possible to do away with _parse_errorbars_for_series. I had been using it when calling plot_series, but all of the use cases can be dealt with in the _parse_errorbars call that takes place in the _make_plot function.

One case where it is still useful is a pretty specific case: calling df.plot() (uses plot_frame), and specifying both the y-data and yerr/xerr data with strings (or ints) indicating columns from that DataFrame (y and yerr keyword arguments). The problem is that only the y-data column is passed onto the resulting plot_series call, the rest of the DataFrame does not make it through, so downstream code does not have access to the error column(s). So either the error columns need to be broken out of the DataFrame before the call to plot_series (that's what _parse_errorbars_for_series was doing, but it could be done without farming out to a special function), or I might have to make some more drastic changes to the code. Or this functionality could be abandoned, the user would just have to say yerr=df['the_err_column'] instead of yerr='the_err_column'. Not too much to ask. At this point, I agree that _parse_errorbars_for_series should go, and we should find a workaround.

I did manage to find a SublimeText plugin that highlights and removes trailing whitespace, thanks for pointing that out, I had no idea :)

Also, I did a rebase but I'm still a little weak on the git-fu, hopefully I didn't mangle it too badly.

jreback · 2014-03-09T15:05:02Z

@gibbonorbiter can you rebase....

r-b-g-b · 2014-03-10T21:53:29Z

There were some hairy merge issues with the docs, so I omitted those commits until the code checks out. Let me know if I need to change anything!

jreback · 2014-03-10T21:55:35Z

@TomAugspurger can u review when u have a chance
thanks

jreback · 2014-03-11T01:28:26Z

@gibbonorbiter also pls squash this down to a smaller number of commits as well

jreback · 2014-03-11T01:30:53Z

going to need an entry in release.rst (under improvements), and at least a 1-liner in v0.14.0.txt. optional would be to include a graphic of this (if you think it would materially add to the whatsnew). And pls add a small section in the plotting.rst (here I would put an example though). You can doc in this PR.

r-b-g-b · 2014-03-11T01:32:58Z

@jreback sounds good. would you prefer to have it squashed down to just one commit with a title like "VIS: added ability to plot DataFrames and Series with errorbars"?

jreback · 2014-03-11T02:16:59Z

a small number is fine since you made a lot of changes. (1 ok too!) usually I try to do them logically, e.g. tests in 1, changes in another, docs in another. but usually too much work to do that.

TomAugspurger · 2014-03-11T13:46:27Z

@gibbonorbiter from the example in your original post:

It looks like the ylim isn't being adjusted. You can document that and leave it as another issue if you want (It's probably impossible to come up with a perfect solution, so maybe make a note and let the user pick the limits?)

TomAugspurger · 2014-03-11T13:49:47Z

I'm compiling a list of what should / shouldn't work as far as the types of df and xerr/yerr go. Let me know if I'm missing anything.

List of Supported APIs

self is df, other is *err.

self is a DataFrame
a. *err is a DataFrame with matching columns, values are widths
b. *err is a str, indicating a columin in df
c. *err is a dict, keys matching df.columns, values are widths
d. *err is a DataFrame with partially columns, Fails with TypeError: unsupported operand type(s) for -: 'numpy.int64' and 'str'
after plotting the errorbars / line for the one that does match. This should either raise or pass, plotting line + error bars for matches and just lines for the non-matchings.
e. *err is a DataFrame with no matching columns. Fails with
TypeError: unsupported operand type(s) for -: 'numpy.int64' and 'str'
f. self is an MxN DataFrame, other is an MxN array of values (unlabeled). Fails with ValueError: In safezip, len(args[0])=3 but len(args[1])=2. It's OK that this doesn't work. May want to catch (if possible) and report a better error.
h. self is an MxN DataFrame, other is an NxM array. Plots correctly.
g. self is an MxN DataFrame, other is an Nx2xM array. plots asymmetrical error bars.
self is a Series
a. *err is a DataFrame, self.name is set, matches col in other, values are widths
b. *err is a DataFrame, self.name is set, doesn't match col in other (Type error (adding str + int) right now, should raise ValueError?)
c. *err is a DataFrame, self.name is None, same behavior is 2b.
d. *err is a Series with other.name set, self.name is None, the name on other is ignored (this is probably ok.)
e. *err is a Series with other.name set, self.name is set, but dones't match other.name. The names are ignored and it works (probably ok?)
f. self.name is set or not, *err is arraylike (works).
g. self is M, and *err is an NxM array. This plots, but I'm not sure what values. Just the first row of `*err?

Concerns

What to do about the other axis? What if only some of the index labels match (we should only use those).
When self is a Series, other is a DataFrame, and no columns match: TypeError: unsupported operand type(s) for -: 'numpy.int64' and 'str'
Differing lengths: raises ValueError: In safezip, len(args[0])=3 but len(args[1])=2. Catch earlier (style thing maybe?).
Maybe it would be better (easier to implement) to be strict about having names always match?
I would ignore asymmetrical error bars for now. That's a good enhancement for the future, but you probably want to get this merged first (I want it merged; doing error bars manually sucks). (EDIT: Looks like you support it already, I added cases 1h and 1g above).

My biggest concern is the first one, checking the index labels.

I'm going to look into the code now.

TomAugspurger · 2014-03-11T15:44:57Z

pandas/tools/plotting.py

+            str: the name of the column within the plotted DataFrame
+        '''
+
+        error_dim = error_dim[0]


What values does error_dim take other than x and y?

Since error_dim will only ever "x" and "y" you shouldn't need that line. "x"[0] is the same as "x".

TomAugspurger · 2014-03-11T15:56:02Z

If we do want to respect the index labels as well when *err is a Series or DataFrame, you can use err.reindex_axis(self.index) or something like that. Example:

In [199]: df
Out[199]: 
          0         1         2
0 -0.890001 -0.943281  0.432457
1 -1.923215  0.585109 -1.007599
2  0.142306 -0.138677 -0.161226
3 -0.318436  1.002733 -1.801434
4  1.151089  0.035163 -0.778506

[5 rows x 3 columns]

In [200]: yerr
Out[200]: 
1    0.432457
2    1.007599
3    0.161226
4    1.801434
5    0.778506
Name: 2, dtype: float64

In this case, the error bars would only be plotted for index labels [1, 2, 3, 4]. 0 and 5 would have no error bars. You could use err.reindex_axis(self.index).fillna(0). Then everyone has error bars, just some have length 0 (that might work, haven't tested).

r-b-g-b · 2014-03-13T22:04:10Z

Thanks for the comments @TomAugspurger. I just pushed some changes allowing for specifying errors for only a subset of the columns and I'll work on the rest as I get some time. And yes, error_dim can only be x and y. I just have it there to avoid duplicating code for the x and y errors, but do you have a better way in mind?

TomAugspurger · 2014-03-14T13:13:50Z

This is looking really good. For this index label matching, I think something like

        def match_labels(data, err):
            err = err.reindex_axis(data.index).fillna(0)
            return err

        if isinstance(err_kwd, dict):
            err = err_kwd

        if isinstance(err_kwd, DataFrame):
            err = err_kwd
            err = match_labels(self.data, err)

        # Series of error values
        elif isinstance(err_kwd, Series):
            # broadcast error series across data
            err = np.atleast_2d(err_kwd.values)
            err = match_labels(self.data, err)
            err = np.tile(err, (self.nseries, 1))

will work.

r-b-g-b · 2014-03-17T19:13:28Z

Thanks for that nice fix @TomAugspurger. It looks like plt.errorbar can also support np.nan entries, and handles them by not drawing any errorbars at all, so I took out the .fillna(0). Still seems to work fine, but did you have something else in mind when you added that?

TomAugspurger · 2014-03-17T20:18:23Z

That's good that it handles NaNs. That's the outcome that I wanted.

This looks like its about there. Could you add a bit of documentation stuff?

a oneline note in doc/source/releate.rst
a bit more in doc/source/v0.14.0.txt, including an example if you want
an example in doc/source/visualization.rst as a new subsection under "Basic plotting: plot".

Then we can get this merged!

r-b-g-b · 2014-03-17T22:39:57Z

Ok, just added some basic documentation. I tried to render them using doc/make.py and it looks like it worked, but you might want to look it over since I've never done it before. Let me know if you'd like anything changed!

TomAugspurger · 2014-03-18T00:57:27Z

Looks like some other commits got into the PR. Did you merge master into your branch? Could you revert back to before that, then rebase on top of master? Let me know if you have any issues.

r-b-g-b · 2014-03-18T01:57:50Z

Shoot, I might need some git guidance to fix this one. Can you explain in a little more detail what I should do?

TomAugspurger · 2014-03-18T02:11:03Z

Sure thing. Did you do a git merge upstrem/master, or git rebase upstream/master or git pull?

Also, do a git reflog and pass the top 5 or so in here. We're going to reset back to where you were before merging in the other commits.

Then do a rebase:

git rebase -i upstream/master

change the picks to squash or fixup. You may have already done this. Just get it down to 1 or 2 picks for the actual code changes and the docs.

TomAugspurger · 2014-03-18T02:25:50Z

Pretty much you'll want to take the hash of your last good commit. reset to that with

git reset --hard <hash>

Then do the git rebase -i. Then force push your branch, with git push -f origin.

VIS: added ability to plot DataFrames and Series with errorbars

TomAugspurger · 2014-03-18T17:42:14Z

@gibbonorbiter Looks good. Thanks for submitting this!

r-b-g-b · 2014-03-19T18:40:30Z

Thank you all for your help!

TomAugspurger reviewed Dec 4, 2013
View reviewed changes

TomAugspurger mentioned this pull request Dec 4, 2013

New plots for pandas yhat/ggpy#114

Closed

TomAugspurger reviewed Feb 2, 2014
View reviewed changes

jreback added Enhancement labels Feb 16, 2014

TomAugspurger reviewed Mar 11, 2014
View reviewed changes

TomAugspurger mentioned this pull request Mar 15, 2014

ENH/VIS: Dataframe bar plot can now accept width and pos keywords #6644

Closed

VIS: added ability to plot DataFrames and Series with errorbars

765b3e0

jreback mentioned this pull request Mar 18, 2014

DOC: Mention date and time formatting available throughout MPL #6660

Merged

TomAugspurger pushed a commit that referenced this pull request Mar 18, 2014

Merge pull request #5638 from gibbonorbiter/master

08e0a96

VIS: added ability to plot DataFrames and Series with errorbars

TomAugspurger merged commit 08e0a96 into pandas-dev:master Mar 18, 2014

sinhrks mentioned this pull request Mar 27, 2014

BUG: MPLPlot cannot make loglog keyword worked #6722

Merged

sinhrks mentioned this pull request Apr 7, 2014

ENH: Scatter plot now supports errorbar #6834

Merged

toddrjen mentioned this pull request Apr 17, 2014

ENH: Aggregate to data with error range #6898

Closed

Zaharid mentioned this pull request May 1, 2014

DF Plots: Error bars don't allow to set styles #7023

Open

VIS: added ability to plot DataFrames and Series with errorbars #5638

VIS: added ability to plot DataFrames and Series with errorbars #5638

Conversation

r-b-g-b commented Dec 4, 2013

jreback commented Dec 4, 2013

TomAugspurger Dec 4, 2013

Choose a reason for hiding this comment

r-b-g-b Dec 4, 2013

Choose a reason for hiding this comment

r-b-g-b commented Dec 4, 2013

r-b-g-b commented Dec 6, 2013

TomAugspurger commented Dec 6, 2013

TomAugspurger commented Dec 6, 2013

r-b-g-b commented Dec 9, 2013

r-b-g-b commented Dec 9, 2013

TomAugspurger Feb 2, 2014

Choose a reason for hiding this comment

TomAugspurger commented Feb 2, 2014

jreback commented Feb 16, 2014

r-b-g-b commented Feb 17, 2014

jreback commented Mar 9, 2014

r-b-g-b commented Mar 10, 2014

jreback commented Mar 10, 2014

jreback commented Mar 11, 2014

jreback commented Mar 11, 2014

r-b-g-b commented Mar 11, 2014

jreback commented Mar 11, 2014

TomAugspurger commented Mar 11, 2014

TomAugspurger commented Mar 11, 2014

List of Supported APIs

Concerns

TomAugspurger Mar 11, 2014

Choose a reason for hiding this comment

TomAugspurger Mar 13, 2014

Choose a reason for hiding this comment

TomAugspurger commented Mar 11, 2014

r-b-g-b commented Mar 13, 2014

TomAugspurger commented Mar 14, 2014

r-b-g-b commented Mar 17, 2014

TomAugspurger commented Mar 17, 2014

r-b-g-b commented Mar 17, 2014

TomAugspurger commented Mar 18, 2014

r-b-g-b commented Mar 18, 2014

TomAugspurger commented Mar 18, 2014

TomAugspurger commented Mar 18, 2014

TomAugspurger commented Mar 18, 2014

r-b-g-b commented Mar 19, 2014