-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
VIS: added ability to plot DataFrames and Series with errorbars #5638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
need a test that tests passing invalid error bars (u raise in the code - just need to exercise that) |
the presence of errorbar keywords. | ||
''' | ||
if (('yerr' in self.kwds) and (self.kwds['yerr'] is not None)) or \ | ||
(('xerr' in self.kwds) and (self.kwds['xerr'] is not None)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to if self.kwds.get('yerr') or self.kwds.get('xerr')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if I understand correctly, you're saying you could accomplish this more cleanly with:
yerr = self.kwds.get('yerr')
xerr = self.kwds.get('xerr')
if yerr is None and xerr is None:
plotf = self.plt.Axes.plot
plotf_name = 'plot'
else:
plotf = self.plt.Axes.errorbar
plotf_name = 'errorbar'
@jreback I added some tests to make sure the right exceptions were being raised when invalid error arguments were passed. I tested two cases where xerr/yerr arguments were:
Were these the kinds of cases you had in mind? |
Just wanted to check in and see if there was anything you all thought needed work, since I'll have some time over the weekend to spend on this. Thanks! |
I'll look at this more closely tomorrow. Is this supposed to work? In [26]: df
Out[26]:
x y error
0 0 12 0.4
1 1 11 0.4
2 2 10 0.4
3 3 9 0.4
4 4 8 0.4
5 5 7 0.4
6 6 6 0.4
7 7 5 0.4
8 8 4 0.4
9 9 3 0.4
10 10 2 0.4
11 11 1 0.4
[12 rows x 3 columns]
In [27]: df.plot(yerr='error')
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-27-476acf361e57> in <module>()
----> 1 df.plot(yerr='error')
/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in plot_frame(frame, x, y, subplots, sharex, sharey, use_index, figsize, grid, legend, rot, ax, style, title, xlim, ylim, logx, logy, xticks, yticks, kind, sort_columns, fontsize, secondary_y, **kwds)
1820 secondary_y=secondary_y, **kwds)
1821
-> 1822 plot_obj.generate()
1823 plot_obj.draw()
1824 if subplots:
/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in generate(self)
876 self._compute_plot_data()
877 self._setup_subplots()
--> 878 self._make_plot()
879 self._post_plot_logic()
880 self._adorn_subplots()
/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/tools/plotting.py in _make_plot(self)
1369 kwds['yerr'] = yerr[label]
1370 elif yerr is not None:
-> 1371 kwds['yerr'] = yerr[i]
1372
1373 if isinstance(xerr, DataFrame):
IndexError: list index out of range It plots the error bars for one of the Series. You may want to raise a |
In this case: In [66]: df
Out[66]:
x y error
0 0 12 0.4
1 1 11 0.4
2 2 10 0.4
3 3 9 0.4
4 4 8 0.4
5 5 7 0.4
6 6 6 0.4
7 7 5 0.4
8 8 4 0.4
9 9 3 0.4
10 10 2 0.4
11 11 1 0.4
[12 rows x 3 columns]
In [67]: df_err
Out[67]:
x y
0 0.2 2
1 0.2 2
2 0.2 2
3 0.2 2
4 0.2 2
5 0.2 2
6 0.2 2
7 0.2 2
8 0.2 2
9 0.2 2
10 0.2 2
11 0.2 2
[12 rows x 2 columns] How do you decide to use just Also, I'm thinking about a good way to accept asymmetric error bars. A sequence of tuples or two arrays of the same length (you might already handle this). More feedback tomorrow hopefully! |
Yes, I think broadcasting one error column to all of the data columns should be an option -- it should be doable by adding to the It does prevent the user from being able to plot some data with error bars and some without. But in those cases, they can use the "key-matched error DataFrame" -- if a label is not present in that column, the data will be plotted without error bars. Overall, I like this method since it is most explicit and least prone to unintended consequences, I think. And yes, being able to pass an error dict is a good idea. I changed the code to implement this by taking advantage of the syntactic similarity of DataFrames and dicts (e.g. if you have I decided not to include x errors for bar plot because I don't think I've ever seen one with x errors, but you're probably right that they should be included. (I hadn't considered barh, and also, who am I to say you shouldn't have x error bars on vertical bar plots?) As for asymmetrical error bars, I was thinking of implementing something like yerr_upper/yerr_lower since then you could organize it using error DataFrames/dicts like before, but it gets a bit messy. I'll give the method you suggested a shot -- thanks for your help! |
I added some documentation as well, but I only have a loose grasp of Sphinx, so I could have mangled it a bit (@TomAugspurger, I pretty much copied the structure from your hexbin-plot commit). Let me know if it needs work! |
@@ -221,6 +221,11 @@ Improvements to existing features | |||
MultiIndex and Hierarchical Rows. Set the ``merge_cells`` to ``False`` to | |||
restore the previous behaviour. (:issue:`5254`) | |||
- The FRED DataReader now accepts multiple series (:issue`3413`) | |||
- DataFrame/Series .plot() functions support plotting with error bars by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be moved to the .14 section once that is created.
Sorry it took so long to get back to this; it fell of my radar. Most of my comments are inline. I need a bit longer to look at your changes to Also there's a lot of repeated code. There are blocks that do something for Same thing for Thanks for doing this. It's a pretty tricky API to work out, but I think it looks pretty good so far. |
@gibbonorbiter can you rebase this so we can take a look? |
Thanks for the comments so far. I think it's possible to do away with One case where it is still useful is a pretty specific case: calling I did manage to find a SublimeText plugin that highlights and removes trailing whitespace, thanks for pointing that out, I had no idea :) Also, I did a rebase but I'm still a little weak on the git-fu, hopefully I didn't mangle it too badly. |
@gibbonorbiter can you rebase.... |
There were some hairy merge issues with the docs, so I omitted those commits until the code checks out. Let me know if I need to change anything! |
@TomAugspurger can u review when u have a chance |
@gibbonorbiter also pls squash this down to a smaller number of commits as well |
going to need an entry in release.rst (under improvements), and at least a 1-liner in v0.14.0.txt. optional would be to include a graphic of this (if you think it would materially add to the whatsnew). And pls add a small section in the plotting.rst (here I would put an example though). You can doc in this PR. |
@jreback sounds good. would you prefer to have it squashed down to just one commit with a title like "VIS: added ability to plot DataFrames and Series with errorbars"? |
a small number is fine since you made a lot of changes. (1 ok too!) usually I try to do them logically, e.g. tests in 1, changes in another, docs in another. but usually too much work to do that. |
I'm compiling a list of what should / shouldn't work as far as the types of List of Supported APIs
Concerns
My biggest concern is the first one, checking the index labels. I'm going to look into the code now. |
str: the name of the column within the plotted DataFrame | ||
''' | ||
|
||
error_dim = error_dim[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values does error_dim
take other than x
and y
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since error_dim will only ever "x"
and "y"
you shouldn't need that line. "x"[0]
is the same as "x"
.
If we do want to respect the index labels as well when
In this case, the error bars would only be plotted for index labels |
Thanks for the comments @TomAugspurger. I just pushed some changes allowing for specifying errors for only a subset of the columns and I'll work on the rest as I get some time. And yes, |
This is looking really good. For this index label matching, I think something like def match_labels(data, err):
err = err.reindex_axis(data.index).fillna(0)
return err
if isinstance(err_kwd, dict):
err = err_kwd
if isinstance(err_kwd, DataFrame):
err = err_kwd
err = match_labels(self.data, err)
# Series of error values
elif isinstance(err_kwd, Series):
# broadcast error series across data
err = np.atleast_2d(err_kwd.values)
err = match_labels(self.data, err)
err = np.tile(err, (self.nseries, 1)) will work. |
Thanks for that nice fix @TomAugspurger. It looks like |
That's good that it handles NaNs. That's the outcome that I wanted. This looks like its about there. Could you add a bit of documentation stuff?
Then we can get this merged! |
Ok, just added some basic documentation. I tried to render them using |
Looks like some other commits got into the PR. Did you merge master into your branch? Could you revert back to before that, then rebase on top of master? Let me know if you have any issues. |
Shoot, I might need some git guidance to fix this one. Can you explain in a little more detail what I should do? |
Sure thing. Did you do a Also, do a Then do a
change the |
Pretty much you'll want to take the hash of your last good commit. reset to that with
Then do the |
VIS: added ability to plot DataFrames and Series with errorbars
@gibbonorbiter Looks good. Thanks for submitting this! |
Thank you all for your help! |
close #3796
Addresses some of the concerns in issue #3796. New code allows the DataFrame and Series Line plots and Bar plot functions to include errorbars using
xerr
andyerr
keyword arguments toDataFrame/Series.plot()
. It supports specifying x and y errorbars as 1. a separate list/numpy/Series, 2. a DataFrame with the same column names as the plotting DataFrame. For example, using method 2 looks like this:This is my first contribution. I tried to follow the contribution guidelines as best I could, but let me know if anything needs work!