diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 406b36694..a7c5df7b4 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -36,142 +36,396 @@ tags: [hide-output] !pip install quantecon !pip install --upgrade yfinance ``` +We run the following code to prepare for the lecture: + +```{code-cell} ipython +%matplotlib inline +import matplotlib.pyplot as plt +plt.rcParams["figure.figsize"] = (11, 5) #set default figure size +import numpy as np +import quantecon as qe +from scipy.stats import norm +import yfinance as yf +import pandas as pd +from pandas.plotting import register_matplotlib_converters +register_matplotlib_converters() +``` ## Overview -Most commonly used probability distributions in classical statistics and -the natural sciences have either bounded support or light tails. +In this section we give some motivation for the lecture. -When a distribution is light-tailed, extreme observations are rare and -draws tend not to deviate too much from the mean. +### Introduction: Light Tails -Having internalized these kinds of distributions, many researchers and -practitioners use rules of thumb such as "outcomes more than four or five -standard deviations from the mean can safely be ignored." +Most commonly used probability distributions in classical statistics and +the natural sciences have "light tails." -However, some distributions encountered in economics have far more probability -mass in the tails than distributions like the normal distribution. +To explain this concept, let's look first at examples. -With such **heavy-tailed** distributions, what would be regarded as extreme -outcomes for someone accustomed to thin tailed distributions occur relatively -frequently. +The classic example is the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution), which has density + +$$ f(x) = \frac{1}{\sqrt{2\pi}\sigma} + \exp\left( -\frac{(x-\mu)^2}{2 \sigma^2} \right) +$$ -Examples of heavy-tailed distributions observed in economic and financial -settings include +on the real line $\mathbb R = (-\infty, \infty)$. -* the income distributions and the wealth distribution (see, e.g., {cite}`pareto1896cours`, {cite}`benhabib2018skewed`), -* the firm size distribution ({cite}`axtell2001zipf`, {cite}`gabaix2016power`}), -* the distribution of returns on holding assets over short time horizons ({cite}`mandelbrot1963variation`, {cite}`rachev2003handbook`), and -* the distribution of city sizes ({cite}`rozenfeld2011area`, {cite}`gabaix2016power`). +The two parameters $\mu$ and $\sigma$ are the mean and standard deviation +respectively. -These heavy tails turn out to be important for our understanding of economic outcomes. +As $x$ deviates from $\mu$, the value of $f(x)$ goes to zero extremely +quickly. -As one example, the heaviness of the tail in the wealth distribution is one -natural measure of inequality. +We can see this when we plot the density and show a histogram of observations, +as with the following code (which assumes $\mu=0$ and $\sigma=1$). -It matters for taxation and redistribution -policies, as well as for flow-on effects for productivity growth, business -cycles, and political economy +```{code-cell} ipython +fig, ax = plt.subplots() +X = norm.rvs(size=1_000_000) +ax.hist(X, bins=40, alpha=0.4, label='histogram', density=True) +x_grid = np.linspace(-4, 4, 400) +ax.plot(x_grid, norm.pdf(x_grid), label='density') +ax.legend() +plt.show() +``` -* see, e.g., {cite}`acemoglu2002political`, {cite}`glaeser2003injustice`, {cite}`bhandari2018inequality` or {cite}`ahn2018inequality`. +Notice how -This lecture formalizes some of the concepts introduced above and reviews the -key ideas. +* the density's tails converge quickly to zero in both directions and +* even with 1,000,000 draws, we get no very large or very small observations. -Let's start with some imports: +We can see the last point more clearly by executing ```{code-cell} ipython -%matplotlib inline -import matplotlib.pyplot as plt -plt.rcParams["figure.figsize"] = (11, 5) #set default figure size -import numpy as np -import quantecon as qe +X.min(), X.max() ``` -The following two lines can be added to avoid an annoying FutureWarning, and prevent a specific compatibility issue between pandas and matplotlib from causing problems down the line: +Here's another view of draws from the same distribution: -```{code-cell} ipython -from pandas.plotting import register_matplotlib_converters -register_matplotlib_converters() +```{code-cell} python3 +n = 2000 +fig, ax = plt.subplots() +data = norm.rvs(size=n) +ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4) +ax.vlines(list(range(n)), 0, data, lw=0.2) +ax.set_ylim(-15, 15) +ax.set_xlabel('$i$') +ax.set_ylabel('$X_i$', rotation=0) +plt.show() ``` -## Visual Comparisons +We have plotted each individual draw $X_i$ against $i$. -One way to build intuition on the difference between light and heavy tails is -to plot independent draws and compare them side-by-side. +None are very large or very small. -### A Simulation +In other words, extreme observations are rare and draws tend not to deviate +too much from the mean. -The figure below shows a simulation. (You will be asked to replicate it in -the exercises.) +As a result, many statisticians and econometricians +use rules of thumb such as "outcomes more than four or five +standard deviations from the mean can safely be ignored." -The top two subfigures each show 120 independent draws from the normal distribution, which is light-tailed. -The bottom subfigure shows 120 independent draws from [the Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution), which is heavy-tailed. +### When Are Light Tails Valid? -(light_heavy_fig1)= -```{figure} /_static/lecture_specific/heavy_tails/light_heavy_fig1.png +Distributions that rarely generate extreme values are called light-tailed. -``` +For example, human height is light-tailed. -In the top subfigure, the standard deviation of the normal distribution is 2, -and the draws are clustered around the mean. +Yes, it's true that we see some very tall people. -In the middle subfigure, the standard deviation is increased to 12 and, as expected, the amount of dispersion rises. +* For example, basketballer [Sun Mingming](https://en.wikipedia.org/wiki/Sun_Mingming) is 2.32 meters tall -The bottom subfigure, with the Cauchy draws, shows a -different pattern: tight clustering around the mean for the great majority of -observations, combined with a few sudden large deviations from the mean. +But have you ever heard of someone who is 20 meters tall? Or 200? Or 2000? -This is typical of a heavy-tailed distribution. +Have you ever wondered why not? + +After all, there are 8 billion people in the world! + +In essence, the reason we don't see such draws is that the distribution of +human high has very light tails. + +In fact human height is approximately normally distributed. + + +### Returns on Assets -### Heavy Tails in Asset Returns -Next let's look at some financial data. +But now we have to ask: does economic data always look like this? + +Let's look at some financial data first. Our aim is to plot the daily change in the price of Amazon (AMZN) stock for -the period from 1st January 2015 to 1st November 2019. +the period from 1st January 2015 to 1st July 2022. This equates to daily returns if we set dividends aside. The code below produces the desired plot using Yahoo financial data via the `yfinance` library. ```{code-cell} python3 -import yfinance as yf -import pandas as pd +s = yf.download('AMZN', '2015-1-1', '2022-7-1')['Adj Close'] +r = s.pct_change() + +fig, ax = plt.subplots() + +ax.plot(r, linestyle='', marker='o', alpha=0.5, ms=4) +ax.vlines(r.index, 0, r.values, lw=0.2) +ax.set_ylabel('returns', fontsize=12) +ax.set_xlabel('date', fontsize=12) + +plt.show() +``` + +This data looks different to the draws from the normal distribution. + +Several of observations are quite extreme. -s = yf.download('AMZN', '2015-1-1', '2019-11-1')['Adj Close'] +We get a similar picture if we look at other assets, such as Bitcoin +```{code-cell} python3 +s = yf.download('BTC-USD', '2015-1-1', '2022-7-1')['Adj Close'] r = s.pct_change() fig, ax = plt.subplots() ax.plot(r, linestyle='', marker='o', alpha=0.5, ms=4) ax.vlines(r.index, 0, r.values, lw=0.2) - ax.set_ylabel('returns', fontsize=12) ax.set_xlabel('date', fontsize=12) plt.show() ``` -Five of the 1217 observations are more than 5 standard -deviations from the mean. +The histogram also looks different to the histogram of the normal +distribution: + + +```{code-cell} python3 +fig, ax = plt.subplots() +ax.hist(r, bins=60, alpha=0.4, label='bitcoin returns', density=True) +ax.set_xlabel('returns', fontsize=12) +plt.show() +``` + +If we look at higher frequency returns data (e.g., tick-by-tick), we often see even more +extreme observations. + +See, for example, {cite}`mandelbrot1963variation` or {cite}`rachev2003handbook`. + + +### Other Data + +The data we have just seen is said to be "heavy-tailed". + +With heavy-tailed distributions, extreme outcomes occur relatively +frequently. + +(A more careful definition is given below) + +Importantly, there are many examples of heavy-tailed distributions +observed in economic and financial settings include + +For example, the income and the wealth distributions are heavy-tailed (see, e.g., {cite}`pareto1896cours`, {cite}`benhabib2018skewed`). + +* You can imagine this: most people have low or modest wealth but some people + are extremely rich. + +The firm size distribution is also heavy-tailed ({cite}`axtell2001zipf`, {cite}`gabaix2016power`}). + +* You can imagine this too: most firms are small but some firms are enormous. + +The distribution of town and city sizes is heavy-tailed ({cite}`rozenfeld2011area`, {cite}`gabaix2016power`). + +* Most towns and cities are small but some are very large. + + +### Why Should We Care? + +Heavy tails are common in economic data but does that mean they are important? + +The answer to this question is affirmative! + +When distributions are heavy-tailed, we need to think carefully about issues +like + +* diversification and risk +* forecasting +* taxation (across a heavy-tailed income distribution), etc. + +We return to these points below. + + + +## Visual Comparisons + +Let's do some more visual comparisons to help us build intuition on the +difference between light and heavy tails. + + +The figure below shows a simulation. (You will be asked to replicate it in +the exercises.) + +The top two subfigures each show 120 independent draws from the normal +distribution, which is light-tailed. + +The bottom subfigure shows 120 independent draws from [the Cauchy +distribution](https://en.wikipedia.org/wiki/Cauchy_distribution), which is +heavy-tailed. + +(light_heavy_fig1)= +```{figure} /_static/lecture_specific/heavy_tails/light_heavy_fig1.png + +``` + +In the top subfigure, the standard deviation of the normal distribution is 2, +and the draws are clustered around the mean. + +In the middle subfigure, the standard deviation is increased to 12 and, as +expected, the amount of dispersion rises. + +The bottom subfigure, with the Cauchy draws, shows a different pattern: tight +clustering around the mean for the great majority of observations, combined +with a few sudden large deviations from the mean. + +This is typical of a heavy-tailed distribution. + + + +(cltail)= +## Classifying Tail Properties + +To keep our discussion precise, we need some definitions concerning tail +properties. + +We will focus our attention on the right hand tails of +nonnegative random variables and their distributions. + +The definitions for +left hand tails are very similar and we omit them to simplify the exposition. + +### Light and Heavy Tails + +A distribution $F$ with density $f$ on $\mathbb R_+$ is called **heavy-tailed** if + +```{math} +:label: defht + +\int_0^\infty \exp(tx) f(x) dx = \infty \; \text{ for all } t > 0. +``` + +We say that a nonnegative random variable $X$ is **heavy-tailed** if its density is heavy-tailed. + +This is equivalent to stating that its **moment generating function** $m(t) := +\mathbb E \exp(t X)$ is infinite for all $t > 0$. + +For example, the [log-normal +distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) is +heavy-tailed because its moment generating function is infinite everywhere on +$(0, \infty)$. + +A distribution $F$ on $\mathbb R_+$ is called **light-tailed** if it is not heavy-tailed. + +A nonnegative random variable $X$ is **light-tailed** if its distribution $F$ is light-tailed. + +For example, every random variable with bounded support is light-tailed. (Why?) + +As another example, if $X$ has the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution), with cdf $F(x) = 1 - \exp(-\lambda x)$ for some $\lambda > 0$, then its moment generating function is + +$$ m(t) = \frac{\lambda}{\lambda - t} \quad \text{when } t < \lambda $$ + +In particular, $m(t)$ is finite whenever $t < \lambda$, so $X$ is light-tailed. + +One can show that if $X$ is light-tailed, then all of its +[moments](https://en.wikipedia.org/wiki/Moment_(mathematics)) are finite. + +Conversely, if some moment is infinite, then $X$ is heavy-tailed. + +The latter condition is not necessary, however. + +For example, the lognormal distribution is heavy-tailed but every moment is finite. + + + +### Pareto Tails + +One specific class of heavy-tailed distributions has been found repeatedly in +economic and social phenomena: the class of so-called power laws. + +Specifically, given $\alpha > 0$, a nonnegative random variable $X$ is said to +have a **Pareto tail** with **tail index** $\alpha$ if + +```{math} +:label: plrt + +\lim_{x \to \infty} x^\alpha \, \mathbb P\{X > x\} = c. +``` + +The limit {eq}`plrt` implies the existence of positive constants $b$ and $\bar x$ such that $\mathbb P\{X > x\} \geq b x^{- \alpha}$ whenever $x \geq \bar x$. + +The implication is that $\mathbb P\{X > x\}$ converges to zero no faster than $x^{-\alpha}$. + +In some sources, a random variable obeying {eq}`plrt` is said to have a **power law tail**. + +One example is the [Pareto distribution](https://en.wikipedia.org/wiki/Pareto_distribution). + +If $X$ has the Pareto distribution, then there are positive constants $\bar x$ +and $\alpha$ such that + +```{math} +:label: pareto + +\mathbb P\{X > x\} = +\begin{cases} + \left( \bar x/x \right)^{\alpha} + & \text{ if } x \geq \bar x + \\ + 1 + & \text{ if } x < \bar x +\end{cases} +``` + +It is easy to see that $\mathbb P\{X > x\}$ satisfies {eq}`plrt`. + +Thus, in line with the terminology, Pareto distributed random variables have a Pareto tail. + + +### Rank-Size Plots + +One graphical technique for investigating Pareto tails and power laws is the so-called **rank-size plot**. + +This kind of figure plots log size against log rank of the population (i.e., +location in the population when sorted from smallest to largest). + +Often just the largest 5% or 10% of observations are plotted. + +For a sufficiently large number of draws from a Pareto distribution, the plot +generates a straight line. For distributions with thinner tails, the data +points are concave. + +A discussion of why this occurs can be found in {cite}`nishiyama2004estimation`. + +The figure below provides one example, using simulated data. + +The rank-size plots shows draws from three different distributions: folded normal, chi-squared with 1 degree of freedom and Pareto. + +The Pareto sample produces a straight line, while the lines produced by the other samples are concave. + +You are asked to reproduce this figure in the exercises. + +(rank_size_fig1)= +```{figure} /_static/lecture_specific/heavy_tails/rank_size_fig1.png + +``` -Overall, the figure is suggestive of heavy tails, -although not to the same degree as the Cauchy distribution the -figure above. -If, however, one takes tick-by-tick data rather -daily data, the heavy-tailedness of the distribution increases further. ## Failure of the LLN One impact of heavy tails is that sample averages can be poor estimators of the underlying mean of the distribution. -To understand this point better, recall {doc}`our earlier discussion ` of the Law of Large Numbers, which considered IID $X_1, -\ldots, X_n$ with common distribution $F$ +To understand this point better, recall {doc}`our earlier discussion ` +of the Law of Large Numbers, which considered IID $X_1, \ldots, X_n$ with common distribution $F$ If $\mathbb E |X_i|$ is finite, then the sample mean $\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i$ satisfies @@ -182,9 +436,9 @@ the sample mean $\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i$ satisfies \mathbb P \left\{ \bar X_n \to \mu \text{ as } n \to \infty \right\} = 1 ``` -where $\mu := \mathbb E X_i = \int x F(x)$ is the common mean of the sample. +where $\mu := \mathbb E X_i = \int x F(dx)$ is the common mean of the sample. -The condition $\mathbb E | X_i | = \int |x| F(x) < \infty$ holds +The condition $\mathbb E | X_i | = \int |x| F(dx) < \infty$ holds in most cases but can fail if the distribution $F$ is very heavy tailed. For example, it fails for the Cauchy distribution. @@ -209,7 +463,7 @@ for n in range(1, N): sample_mean[n] = np.mean(data[:n]) # Plot -ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar X_n$') +ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar{X}_n$') ax.plot(range(N), np.zeros(N), 'k--', lw=0.5) ax.legend() @@ -251,117 +505,80 @@ Thus, in the case of the Cauchy distribution, the sample mean itself has the ver In particular, the sequence $\bar X_n$ does not converge to any point. -(cltail)= -## Classifying Tail Properties -To keep our discussion precise, we need some definitions concerning tail -properties. -We will focus our attention on the right hand tails of -nonnegative random variables and their distributions. -The definitions for -left hand tails are very similar and we omit them to simplify the exposition. +## Why Do Heavy Tails Matter? -### Light and Heavy Tails +We have now seen that -A distribution $F$ on $\mathbb R_+$ is called **heavy-tailed** if +1. heavy tails are frequent in economics and +2. the Law of Large Numbers fails when tails are very heavy. -```{math} -:label: defht +But what about in the real world? Do heavy tails matter? -\int_0^\infty \exp(tx) F(dx) = \infty \; \text{ for all } t > 0. -``` +Let's briefly discuss why they do. -We say that a nonnegative random variable $X$ is **heavy-tailed** if its distribution $F(x) := \mathbb P\{X \leq x\}$ is heavy-tailed. -This is equivalent to stating that its **moment generating function** -$m(t) := \mathbb E \exp(t X)$ is infinite for all $t > 0$. +### Diversification -* For example, the lognormal distribution is heavy-tailed because its - moment generating function is infinite everywhere on $(0, \infty)$. +One of the most important ideas in investing is using diversification to +reduce risk. -A distribution $F$ on $\mathbb R_+$ is called **light-tailed** if it is not heavy-tailed. +This is a very old idea --- consider, for example, the expression "don't put all your eggs in one basket". -A nonnegative random variable $X$ is **light-tailed** if its distribution $F$ is light-tailed. +To illustrate, consider an investor with one dollar of wealth and a choice over +$n$ assets with payoffs $X_1, \ldots, X_n$. -* Example: Every random variable with bounded support is light-tailed. (Why?) -* Example: If $X$ has the exponential distribution, with cdf $F(x) = 1 - \exp(-\lambda x)$ for some $\lambda > 0$, then its moment generating function is finite whenever $t < \lambda$. Hence $X$ is light-tailed. +Suppose that returns on distinct assets are +independent and each return has mean $\mu$ and variance $\sigma^2$. -One can show that if $X$ is light-tailed, then all of its moments are finite. +If the investor puts all wealth in one asset, say, then the expected payoff of the +portfolio is $\mu$ and the variance is $\sigma^2$. -The contrapositive is that if some moment is infinite, then $X$ is heavy-tailed. +If instead the investor puts share $1/n$ of her wealth in each asset, then the portfolio payoff is -The latter condition is not necessary, however. +$$ Y_n = \sum_{i=1}^n \frac{X_i}{n} = \frac{1}{n} \sum_{i=1}^n X_i. $$ -* Example: the lognormal distribution is heavy-tailed but every moment is finite. +Try computing the mean and variance. -### Pareto Tails +You will find that -One specific class of heavy-tailed distributions has been found repeatedly in -economic and social phenomena: the class of so-called power laws. +* The mean is unchanged at $\mu$, while +* the variance of the portfolio has fallen to $\sigma^2 / n$. -Specifically, given $\alpha > 0$, a nonnegative random variable $X$ is said to have a **Pareto tail** with **tail index** $\alpha$ if +Diversification reduces risk, as expected. -```{math} -:label: plrt +But there is a hidden assumption here: the variance is of returns is finite. -\lim_{x \to \infty} x^\alpha \, \mathbb P\{X > x\} = c. -``` +If the distribution is heavy-tailed and the variance is infinite, then this +logic is incorrect. -Evidently {eq}`plrt` implies the existence of positive constants $b$ and $\bar x$ such that $\mathbb P\{X > x\} \geq b x^{- \alpha}$ whenever $x \geq \bar x$. +For example, we saw above that if every $X_i$ is Cauchy, then so is $Y_n$. -The implication is that $\mathbb P\{X > x\}$ converges to zero no faster than $x^{-\alpha}$. +This means that diversification doesn't help at all! -In some sources, a random variable obeying {eq}`plrt` is said to have a **power law tail**. -The primary example is the **Pareto distribution**, which has distribution +### Fiscal Policy -```{math} -:label: pareto +The heaviness of the tail in the wealth distribution matters for taxation and redistribution policies. -F(x) = -\begin{cases} - 1 - \left( \bar x/x \right)^{\alpha} - & \text{ if } x \geq \bar x - \\ - 0 - & \text{ if } x < \bar x -\end{cases} -``` +The same is true for the income distribution. -for some positive constants $\bar x$ and $\alpha$. +For example, the heaviness of the tail of the income distribution helps +determine how much revenue a given tax policy will raise. -It is easy to see that if $X \sim F$, then $\mathbb P\{X > x\}$ satisfies {eq}`plrt`. -Thus, in line with the terminology, Pareto distributed random variables have a Pareto tail. -### Rank-Size Plots +### Other Implications -One graphical technique for investigating Pareto tails and power laws is the so-called **rank-size plot**. +There are in fact many important implications for heavy tails. -This kind of figure plots -log size against log rank of the population (i.e., location in the population -when sorted from smallest to largest). +For example, heavy tails in income and wealth affect productivity growth, business cycles, and political economy. -Often just the largest 5 or 10% of observations are plotted. +For further reading, see, for example, {cite}`acemoglu2002political`, {cite}`glaeser2003injustice`, {cite}`bhandari2018inequality` or {cite}`ahn2018inequality`. -For a sufficiently large number of draws from a Pareto distribution, the plot generates a straight line. For distributions with thinner tails, the data points are concave. -A discussion of why this occurs can be found in {cite}`nishiyama2004estimation`. - -The figure below provides one example, using simulated data. - -The rank-size plots shows draws from three different distributions: folded normal, chi-squared with 1 degree of freedom and Pareto. - -The Pareto sample produces a straight line, while the lines produced by the other samples are concave. - -You are asked to reproduce this figure in the exercises. - -(rank_size_fig1)= -```{figure} /_static/lecture_specific/heavy_tails/rank_size_fig1.png - -``` ## Exercises @@ -496,7 +713,7 @@ plt.show() ``` -```{solution} ht_ex2 +```{solution-start} ht_ex2 :class: dropdown Let $X$ have a Pareto tail with tail index $\alpha$ and let $F$ be its cdf. @@ -524,6 +741,9 @@ Since $r \geq \alpha$, we have $\mathbb E X^r = \infty$. ``` +```{solution-end} +``` + ```{solution-start} ht_ex3 :class: dropdown ``` @@ -710,4 +930,4 @@ Looking at the output of the code, our main conclusion is that the Pareto assumption leads to a lower mean and greater dispersion. ```{solution-end} -``` \ No newline at end of file +``` diff --git a/lectures/lln_clt.md b/lectures/lln_clt.md index 190174651..9aff31886 100644 --- a/lectures/lln_clt.md +++ b/lectures/lln_clt.md @@ -263,7 +263,7 @@ for ax in axes: # Plot ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5) - axlabel = '$\\bar X_n$ for $X_i \sim$' + name + axlabel = '$\\bar{X}_n$ for $X_i \sim$' + name ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel) m = distribution.mean() ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')