Skip to content

Update box plots doc with precomputed quartiles and quartilemethod #2063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 13, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion doc/python/box-plots.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,49 @@ fig = px.box(df, x="time", y="total_bill", points="all")
fig.show()
```

### Choosing The Algorithm For Computing Quartiles

By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By linear algorithm method, did you mean a method using linear interpolation between consecutive values? I'm not sure I understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself am not quite sure what makes the default algorithm linear in this context. Perhaps your suggestion is correct?

However, as linear is the attribute name used to refer to default algorithm, I felt that it was safe to refer to it as a linear algorithm in these docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so how about "using the linear method (for linear interpolation, see #10...)"?

Copy link
Author

@jdamiba jdamiba Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

By default, quartiles for box plots are computed using the linear method (for more about linear interpolation, see #10 listed on http://www.amstat.org/publications/jse/v14n3/langford.html and https://en.wikipedia.org/wiki/Quartile for more details).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a reference, you could also link to the wikipedia page https://en.wikipedia.org/wiki/Quartile which describes all three algorithms and is more likely to stay in the future. But the amstat page is useful for other algorithms so you could have both references.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a link to the wiki page.


The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not include the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half.

The *inclusive* algorithm also uses the median to divide the ordered dataset into two halves, but if the sample is odd, it includes the median in both halves. Q1 is then the median of the lower half and Q3 the median of the upper half.

```python
import plotly.express as px

df = px.data.tips()

fig = px.box(df, x="day", y="total_bill", color="smoker")
fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default
fig.show()
```

#### Difference Between Quartile Algorithms
It can sometimes be difficult to see the difference between the linear, inclusive, and exclusive algorithms for computing quartiles. In the following example, the same dataset is visualized using each of the three different quartile computation algorithms.

```python
import plotly.express as px
import pandas as pd

data = [1,2,3,4,5,6,7,8,9]
df = pd.DataFrame(dict(
linear=data,
inclusive=data,
exclusive=data
)).melt(var_name="quartilemethod")


fig = px.box(df, y="value",
facet_col="quartilemethod", boxmode="overlay", color="quartilemethod")

fig.update_traces(quartilemethod="linear", col=1)
fig.update_traces(quartilemethod="inclusive", col=2)
fig.update_traces(quartilemethod="exclusive", col=3)

fig.show()
```

#### Styled box plot

For the interpretation of the notches, see https://en.wikipedia.org/wiki/Box_plot#Variations.
Expand Down Expand Up @@ -124,7 +167,7 @@ fig.add_trace(go.Box(x=x1))
fig.show()
```

### Box Plot That Displays the Underlying Data
### Box Plot That Displays The Underlying Data

```python
import plotly.graph_objects as go
Expand All @@ -138,6 +181,45 @@ fig = go.Figure(data=[go.Box(y=[0, 1, 1, 2, 3, 5, 8, 13, 21],
fig.show()
```

### Choosing The Algorithm For Computing Quartiles

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the explanations given above, using the internal anchor link? (it would probably be "/python/box-plots/#difference-between-quartile-algorithms#)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a link to the explanations above.

```python
import plotly.graph_objects as go

data = [1, 2, 3, 4, 5, 6, 7, 8, 9]

fig = go.Figure()
fig.add_trace(go.Box(y=data, quartilemethod="linear", name="Linear Quartile Mode"))
fig.add_trace(go.Box(y=data, quartilemethod="inclusive", name="Inclusive Quartile Mode"))
fig.add_trace(go.Box(y=data, quartilemethod="exclusive", name="Exclusive Quartile Mode"))
fig.show()
```

### Box Plot With Precomputed Quartiles

You can specify precomputed quartile attributes rather than using a built-in quartile computation algorithm.

This could be useful if you have already pre-computed those values or if you need to use a different algorithm than the ones provided.

```python
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Box(y=[
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
], name="Precompiled Quartiles"))

fig.update_traces(q1=[ 1, 2, 3 ], median=[ 4, 5, 6 ],
q3=[ 7, 8, 9 ], lowerfence=[-1, 0, 1],
upperfence=[5, 6, 7], mean=[ 2.2, 2.8, 3.2 ],
sd=[ 0.2, 0.4, 0.6 ], notchspan=[ 0.2, 0.4, 0.6 ] )

fig.show()
```

### Colored Box Plot

```python
Expand Down