Skip to content

ENH/VIS: Area plot is now supported by kind='area'. #6656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 1, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Mar 17, 2014

Area plot is added to plotting method. The AreaPlot class is created as a subclass of LinePlot, thus it works also in time series.

By default, area plot is being stacked. When area plot is not stacked (stacked=False), alpha value is set to 0.5 to show overlapped area if not configured specifically. As a side benefit, line plot also can be stacked by specifying stacked=True (disabled by default). Different from stacked bar plot, I don't know good visualization for positive/negative mixed data. Thus, input must be all positive or all negative when stacked=True. I'll try to implement it if there is a good way. Also, area plot doesn't support logy and loglog plot because filling area starts from 0.

Note: Area plot's legend is implemented based on the answer described in:
http://stackoverflow.com/questions/14534130/legend-not-showing-up-in-matplotlib-stacked-area-plot

Example:
figure_1


def _get_plot_function(self):
if self.logy or self.loglog:
raise ValueError("Log-scales are not supported in area plot")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log-x scales are supported right? You could change this to "Log-y scales are not supported in area plot".

When I try a logx=True I get a segfault actually (on the matplotlib side).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the error message. logx=True should work, and I've added a test case.

@TomAugspurger
Copy link
Contributor

You can put an example in doc/source/visualization.rst if you want to show off your work.

One thing case that your implementation doesn't allow is stacked with mixed positive and negatives, where every value in a column are either all positive or all negative. I think this would make sense visually. We can leave that for a future issue if you want.

@sinhrks
Copy link
Member Author

sinhrks commented Mar 29, 2014

Thanks. I've added examples in visualization.rst, and answered above comments.

Area plot
~~~~~~~~~~~~~~~~~~~

*New in .14* You can create area plots with ``Series.plot`` and ``DataFrame.plot`` by passing ``kind='area'``. Area plots are stacked by default. To produce stacked area plot, all the dataframe values are either positive or negative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the New in .14 to .. versionadded:: 0.14 on a seperate line?

@sinhrks
Copy link
Member Author

sinhrks commented Mar 29, 2014

Sure. Done it.

There are other version descriptions which uses italic in 455, 468, 487th lines. Shall I modify it as the same manner?

~~~~~~~~~~~~~~~~~~~

.. versionadded:: 0.14
You can create area plots with ``Series.plot`` and ``DataFrame.plot`` by passing ``kind='area'``. Area plots are stacked by default. To produce stacked area plot, all the dataframe values are either positive or negative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small detail, there also has to be a blank line here between versionadded and the necxt paragraph

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed.

@sinhrks
Copy link
Member Author

sinhrks commented Mar 29, 2014

Failed in test_frame_groupby_plot_boxplot which is not modified. Maybe the same issue as #6670? I'll hook Travis again.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

@sinhrks can you rebase

@TomAugspurger @jorisvandenbossche ok with this?

@sinhrks
Copy link
Member Author

sinhrks commented Apr 6, 2014

Rebased. Could you check #6678 first, because I'll apply the same fix to area plot.

@jreback jreback added this to the 0.14.0 milestone Apr 6, 2014
@@ -1933,6 +2036,8 @@ def plot_frame(frame=None, x=None, y=None, subplots=False, sharex=True,
klass = ScatterPlot
elif kind == 'hexbin':
klass = HexBinPlot
elif kind == 'area':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add 'area' to the set of valid values for kind in the docstring for plot_frame and plot_series?

@TomAugspurger
Copy link
Contributor

The way you've done it makes gives some strange results if someone does multiple area plots on the same axis (I can't think of a reason why anyone would want to do this, but who knows).

In [64]: df
Out[64]: 
     0   1   2
0    0   1   3
1    1   2   7
2    1   6   8
3    5  10   8
4    5  11   8
5    9  15  11
6    9  19  15
7   12  20  17
8   15  20  19
9   16  25  20
10  17  27  21
11  21  30  25
12  21  30  28
13  25  31  30
14  25  32  34
15  28  34  35
16  29  38  38
17  30  38  42
18  34  38  44
19  38  38  48

[20 rows x 3 columns]

In [65]: df2 = df + 10

In [66]: ax = df.plot(kind='area')

In [67]: df2.plot(kind='area', ax=ax)
Out[67]: <matplotlib.axes.AxesSubplot at 0x10f603250>

The original shading is removed:
area

Honestly, I think I'm OK with this. Unless someone can think of any reason where multiple, independently stacked datasets on the same axes makes sense.

@@ -150,6 +150,7 @@ API Changes

- ``DataFrame.sort`` now places NaNs at the beginning or end of the sort according to the ``na_position`` parameter. (:issue:`3917`)

<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge errer here. You can remove while rebasing.

@TomAugspurger
Copy link
Contributor

That's all my comments. The biggest one is deciding how to handle NaNs. Thoughts? line leaves gaps (which doesn't really make sense for area, does it?), scatter and hexbin excludes those rows with NaNs (which is an option), bar essentially draws a bar of length 0 (which isn't an option)

@sinhrks
Copy link
Member Author

sinhrks commented Apr 26, 2014

Thanks for all your comments.

Multiple area plots on single axes:

Actually it is not deleted. When staked=True, alpha value 1.0 is used because it was considered each area will not be overlapped. This makes previous area hidden when plotting new area on the same axes. One option is always use alpha=0.5 in area plot unless otherwise specified.

If you pass alpha explicitly, you can see all the area are drawn.

import numpy as np
import matplotlib.pyplot as plt
import pandas.util.testing as tm
t_df = DataFrame(np.random.rand(6, 3),
               index=tm.makeDateIndex(k=6), columns=['a', 'b', 'c'])
t_df2 = DataFrame(np.random.rand(6, 3),
               index=tm.makeDateIndex(k=6), columns=['x', 'y', 'z'])

fig, axes = plt.subplots(1, 1, figsize=(14, 8))
plt.subplots_adjust(top=0.9, bottom=0.2, left=0.15, right=0.9, hspace=0.5)
t_df.plot(kind='area', ax=axes, alpha=0.5)
t_df2.plot(kind='area', ax=axes, alpha=0.5)

figure_1

Handling NaN:

It looks MPL stackplot doesn't care NaN (right figure). I think automatically filling by 0 is preferable (left figure). How about the following spec?:

  • Line plot without stacking: skip NaN and split lines (no change).
  • Line plot with stacking: fill NaN with 0 (for stacking purpose)
  • Area plot: fill NaN with 0 (for stacking/filling purpose)
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2)

x = range(6)
y11 = [1, 1, 0, 1, 1, 1]
y12 = [2, 2, 2, 0, 2, 2]
axes[0].stackplot(x, y11, y12)

y21 = [1, 1, np.nan, 1, 1, 1]
y22 = [2, 2, 2, np.nan, 2, 2]
axes[1].stackplot(x, y21, y22)

figure_2

@sinhrks
Copy link
Member Author

sinhrks commented Apr 26, 2014

Also, I noticed that initial description was incorrect. Each line/area can be stacked on positive/negative direction separately if each columns are all either positive or negative. No need to all the data is either positive or negative. I've changed the logic, and will add some tests/docs for this.

Different from stacked bar plot, I don't know good visualization for positive/negative mixed data. Thus, input must be all positive or all negative when stacked=True.

import numpy as np
import pandas.util.testing as tm

df = pd.DataFrame({'a': np.random.rand(6),
                'b': np.random.rand(6),
                'c': - np.random.rand(6),
                'd': - np.random.rand(6)},
               index=tm.makeDateIndex(k=6))
df.plot(kind='area', stacked=True)

figure_3

@sinhrks
Copy link
Member Author

sinhrks commented Apr 26, 2014

I've added tests for fixed NaN handling and stacking, and modified doc. I think remainings are:

  • What the default alpha should be.
  • Whether NaN handling described above looks OK.


.. versionadded:: 0.14

You can create area plots with ``Series.plot`` and ``DataFrame.plot`` by passing ``kind='area'``. Area plots are stacked by default. To produce stacked area plot, each columns must be either all positive or negative values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: columns -> column. Also maybe change "must be either" to "must have either" and "all positive or negative" to "all positive or all negative"

@TomAugspurger
Copy link
Contributor

I think a default alpha of 1 when stacked and .5 when not stacked is good.

Your NaN handling looks good too, especially since it's consistent with how matplotlib does it.

Good catch on the all positive / all negative checks by column.

I'm reviewing now. Should be able to merge this today.

.. ipython:: python

@savefig area_plot_unstacked.png
df.plot(kind='area', stacked=False);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a note here that NaNs are filled to zero by default, and that if you don't want that you should fill or drop the NaNs before plotting. (No need for an example with nans)

@@ -1256,6 +1406,14 @@ def test_partially_invalid_plot_data(self):
with tm.assertRaises(TypeError):
df.plot(kind=kind)

# area plot doesn't support positive/negative mixed data
kinds = ['area']
df = DataFrame(rand(10, 2), dtype=object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here with using a seed known to produce a mixed frame.

@TomAugspurger
Copy link
Contributor

@sinhrks ping me when you rebase.

@jreback
Copy link
Contributor

jreback commented May 1, 2014

@sinhrks @TomAugspurger

@TomAugspurger
Copy link
Contributor

Looks like there was a Yahoo failure, I restarted the build.

@jreback
Copy link
Contributor

jreback commented May 1, 2014

this is fine, I already fixed master

jreback added a commit that referenced this pull request May 1, 2014
ENH/VIS: Area plot is now supported by kind='area'.
@jreback jreback merged commit 614e273 into pandas-dev:master May 1, 2014
@sinhrks sinhrks deleted the area_pr2 branch May 2, 2014 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants