-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Update box plots doc with precomputed quartiles and quartilemethod #2063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,6 +71,49 @@ fig = px.box(df, x="time", y="total_bill", points="all") | |
fig.show() | ||
``` | ||
|
||
### Choosing The Algorithm For Computing Quartiles | ||
|
||
By default, quartiles for box plots are computed using a linear algorithm method (see #10 listed on [http://www.amstat.org/publications/jse/v14n3/langford.html](http://www.amstat.org/publications/jse/v14n3/langford.html) for more details). However, you can also choose to use an `exclusive` or an `inclusive` algorithm to compute quartiles. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a reference, you could also link to the wikipedia page https://en.wikipedia.org/wiki/Quartile which describes all three algorithms and is more likely to stay in the future. But the amstat page is useful for other algorithms so you could have both references. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, added a link to the wiki page. |
||
|
||
The *exclusive* algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not include the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half. | ||
|
||
The *inclusive* algorithm also uses the median to divide the ordered dataset into two halves, but if the sample is odd, it includes the median in both halves. Q1 is then the median of the lower half and Q3 the median of the upper half. | ||
|
||
```python | ||
import plotly.express as px | ||
|
||
df = px.data.tips() | ||
|
||
fig = px.box(df, x="day", y="total_bill", color="smoker") | ||
fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default | ||
fig.show() | ||
``` | ||
|
||
#### Difference Between Quartile Algorithms | ||
It can sometimes be difficult to see the difference between the linear, inclusive, and exclusive algorithms for computing quartiles. In the following example, the same dataset is visualized using each of the three different quartile computation algorithms. | ||
|
||
```python | ||
import plotly.express as px | ||
import pandas as pd | ||
|
||
data = [1,2,3,4,5,6,7,8,9] | ||
df = pd.DataFrame(dict( | ||
linear=data, | ||
inclusive=data, | ||
exclusive=data | ||
)).melt(var_name="quartilemethod") | ||
|
||
|
||
fig = px.box(df, y="value", | ||
facet_col="quartilemethod", boxmode="overlay", color="quartilemethod") | ||
|
||
fig.update_traces(quartilemethod="linear", col=1) | ||
fig.update_traces(quartilemethod="inclusive", col=2) | ||
fig.update_traces(quartilemethod="exclusive", col=3) | ||
|
||
fig.show() | ||
``` | ||
|
||
#### Styled box plot | ||
|
||
For the interpretation of the notches, see https://en.wikipedia.org/wiki/Box_plot#Variations. | ||
|
@@ -124,7 +167,7 @@ fig.add_trace(go.Box(x=x1)) | |
fig.show() | ||
``` | ||
|
||
### Box Plot That Displays the Underlying Data | ||
### Box Plot That Displays The Underlying Data | ||
|
||
```python | ||
import plotly.graph_objects as go | ||
|
@@ -138,6 +181,45 @@ fig = go.Figure(data=[go.Box(y=[0, 1, 1, 2, 3, 5, 8, 13, 21], | |
fig.show() | ||
``` | ||
|
||
### Choosing The Algorithm For Computing Quartiles | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you link to the explanations given above, using the internal anchor link? (it would probably be "/python/box-plots/#difference-between-quartile-algorithms#) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, added a link to the explanations above. |
||
```python | ||
import plotly.graph_objects as go | ||
|
||
data = [1, 2, 3, 4, 5, 6, 7, 8, 9] | ||
|
||
fig = go.Figure() | ||
fig.add_trace(go.Box(y=data, quartilemethod="linear", name="Linear Quartile Mode")) | ||
fig.add_trace(go.Box(y=data, quartilemethod="inclusive", name="Inclusive Quartile Mode")) | ||
fig.add_trace(go.Box(y=data, quartilemethod="exclusive", name="Exclusive Quartile Mode")) | ||
fig.show() | ||
``` | ||
|
||
### Box Plot With Precomputed Quartiles | ||
|
||
You can specify precomputed quartile attributes rather than using a built-in quartile computation algorithm. | ||
|
||
This could be useful if you have already pre-computed those values or if you need to use a different algorithm than the ones provided. | ||
|
||
```python | ||
import plotly.graph_objects as go | ||
|
||
fig = go.Figure() | ||
|
||
fig.add_trace(go.Box(y=[ | ||
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], | ||
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], | ||
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] | ||
], name="Precompiled Quartiles")) | ||
|
||
fig.update_traces(q1=[ 1, 2, 3 ], median=[ 4, 5, 6 ], | ||
q3=[ 7, 8, 9 ], lowerfence=[-1, 0, 1], | ||
upperfence=[5, 6, 7], mean=[ 2.2, 2.8, 3.2 ], | ||
sd=[ 0.2, 0.4, 0.6 ], notchspan=[ 0.2, 0.4, 0.6 ] ) | ||
|
||
fig.show() | ||
``` | ||
|
||
### Colored Box Plot | ||
|
||
```python | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By
linear algorithm method
, did you mean a method using linear interpolation between consecutive values? I'm not sure I understand.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I myself am not quite sure what makes the default algorithm
linear
in this context. Perhaps your suggestion is correct?However, as
linear
is the attribute name used to refer to default algorithm, I felt that it was safe to refer to it as alinear
algorithm in these docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so how about "using the
linear
method (for linear interpolation, see #10...)"?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
By default, quartiles for box plots are computed using the
linear
method (for more about linear interpolation, see #10 listed on http://www.amstat.org/publications/jse/v14n3/langford.html and https://en.wikipedia.org/wiki/Quartile for more details).