From 77c0639c7a1d309fd4e7bfcb7da155f51f9b531b Mon Sep 17 00:00:00 2001 From: Joseph Damiba Date: Mon, 13 Jan 2020 10:00:32 -0500 Subject: [PATCH 1/4] update to content --- doc/python/box-plots.md | 84 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 83 insertions(+), 1 deletion(-) diff --git a/doc/python/box-plots.md b/doc/python/box-plots.md index 118b780ed4b..b72af409e58 100644 --- a/doc/python/box-plots.md +++ b/doc/python/box-plots.md @@ -71,6 +71,49 @@ fig = px.box(df, x="time", y="total_bill", points="all") fig.show() ``` +### Choosing The Algorithm For Computing Quartiles + +By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on http://www.amstat.org/publications/jse/v14n3/langford.html for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. + +The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not includes the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half. + +The *inclusive* algorithm also uses the median to divide the ordered dataset into two halves, but if the sample is odd, it includes the median in both halves. Q1 is then the median of the lower half and Q3 the median of the upper half. + +```python +import plotly.express as px + +df = px.data.tips() + +fig = px.box(df, x="day", y="total_bill", color="smoker") +fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default +fig.show() +``` + +#### Difference Between Quartile Algorithms +It can sometimes be difficult to see the difference between the linear, inclusive, and exclusive algorithms for computing quartiles. In the following example, the same dataset is visualized using each of the three different quartile computation algorithms. + +```python +import plotly.express as px +import pandas as pd + +data = [1,2,3,4,5,6,7,8,9] +df = pd.DataFrame(dict( + linear=data, + inclusive=data, + exclusive=data +)).melt(var_name="quartilemethod") + + +fig = px.box(df, y="value", + facet_col="quartilemethod", boxmode="overlay", color="quartilemethod") + +fig.update_traces(quartilemethod="linear", col=1) +fig.update_traces(quartilemethod="inclusive", col=2) +fig.update_traces(quartilemethod="exclusive", col=3) + +fig.show() +``` + #### Styled box plot For the interpretation of the notches, see https://en.wikipedia.org/wiki/Box_plot#Variations. @@ -124,7 +167,7 @@ fig.add_trace(go.Box(x=x1)) fig.show() ``` -### Box Plot That Displays the Underlying Data +### Box Plot That Displays The Underlying Data ```python import plotly.graph_objects as go @@ -138,6 +181,45 @@ fig = go.Figure(data=[go.Box(y=[0, 1, 1, 2, 3, 5, 8, 13, 21], fig.show() ``` +### Choosing The Algorithm For Computing Quartiles + +```python +import plotly.graph_objects as go + +data = [1, 2, 3, 4, 5, 6, 7, 8, 9] + +fig = go.Figure() +fig.add_trace(go.Box(y=data, quartilemethod="linear", name="Linear Quartile Mode")) +fig.add_trace(go.Box(y=data, quartilemethod="inclusive", name="Inclusive Quartile Mode")) +fig.add_trace(go.Box(y=data, quartilemethod="exclusive", name="Exclusive Quartile Mode")) +fig.show() +``` + +### Box Plot With Precomputed Quartiles + +You can specify precomputed quartile attributes rather than using a built-in quartile computation algorithm. + +This could be useful if you have already pre-computed those values or if you need to use a different algorithm than the ones provided. + +```python +import plotly.graph_objects as go + +fig = go.Figure() + +fig.add_trace(go.Box(y=[ + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] + ], name="Precompiled Quartiles")) + +fig.update_traces(q1=[ 1, 2, 3 ], median=[ 4, 5, 6 ], + q3=[ 7, 8, 9 ], lowerfence=[-1, 0, 1], + upperfence=[5, 6, 7], mean=[ 2.2, 2.8, 3.2 ], + sd=[ 0.2, 0.4, 0.6 ], notchspan=[ 0.2, 0.4, 0.6 ] ) + +fig.show() +``` + ### Colored Box Plot ```python From 98dcce4b517b11b3bc026f6d421c2859cbc55a49 Mon Sep 17 00:00:00 2001 From: Joseph Damiba Date: Mon, 13 Jan 2020 10:05:43 -0500 Subject: [PATCH 2/4] fix typo --- doc/python/box-plots.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/python/box-plots.md b/doc/python/box-plots.md index b72af409e58..e17be71bfcc 100644 --- a/doc/python/box-plots.md +++ b/doc/python/box-plots.md @@ -73,9 +73,9 @@ fig.show() ### Choosing The Algorithm For Computing Quartiles -By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on http://www.amstat.org/publications/jse/v14n3/langford.html for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. +By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. -The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not includes the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half. +The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not include the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half. The *inclusive* algorithm also uses the median to divide the ordered dataset into two halves, but if the sample is odd, it includes the median in both halves. Q1 is then the median of the lower half and Q3 the median of the upper half. From 301e9293778a379c9acd91a2afa3303e1029ff34 Mon Sep 17 00:00:00 2001 From: Joseph Damiba Date: Mon, 13 Jan 2020 13:22:12 -0500 Subject: [PATCH 3/4] adding reference links --- doc/python/box-plots.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/python/box-plots.md b/doc/python/box-plots.md index e17be71bfcc..70db89291b3 100644 --- a/doc/python/box-plots.md +++ b/doc/python/box-plots.md @@ -73,7 +73,9 @@ fig.show() ### Choosing The Algorithm For Computing Quartiles -By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. +By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) and [https://en.wikipedia.org/wiki/Quartile](https://en.wikipedia.org/wiki/Quartile) for more details). + +However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not include the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half. @@ -181,7 +183,9 @@ fig = go.Figure(data=[go.Box(y=[0, 1, 1, 2, 3, 5, 8, 13, 21], fig.show() ``` -### Choosing The Algorithm For Computing Quartiles +### Modifying The Algorithm For Computing Quartiles + +For an explanation of how each algorithm works, see [Choosing The Algorithm For Computing Quartiles](#choosing-the-algorithm-for-computing-quartiles). ```python import plotly.graph_objects as go From dbffa39aeda3bc69cae5d719bf661e79ac8f41f5 Mon Sep 17 00:00:00 2001 From: Joseph Damiba Date: Mon, 13 Jan 2020 13:36:08 -0500 Subject: [PATCH 4/4] wording fixup --- doc/python/box-plots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/python/box-plots.md b/doc/python/box-plots.md index 70db89291b3..3dc798f51db 100644 --- a/doc/python/box-plots.md +++ b/doc/python/box-plots.md @@ -73,7 +73,7 @@ fig.show() ### Choosing The Algorithm For Computing Quartiles -By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) and [https://en.wikipedia.org/wiki/Quartile](https://en.wikipedia.org/wiki/Quartile) for more details). +By default, quartiles for box plots are computed using the `linear` method (for more about linear interpolation, see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) and [https://en.wikipedia.org/wiki/Quartile](https://en.wikipedia.org/wiki/Quartile) for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles.